Get the job you really want.
Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in NYC, NY
Fintech • Analytics
The Site Reliability Engineer will manage production monitoring, incident response, and enhance automation using various tools. They will ensure observability and participate in SRE process improvements.
Top Skills:
AWSCucumberDatadog ApmDatadog DbmDynamoDBEc2EcsElkJavaJenkinsPagerdutyPlaywrightRdsS3Secrets ManagerSeleniumServicenowSplunkSpring Boot
Edtech • Machine Learning • Mobile • Other • Software
As a Senior Site Reliability Engineer at Duolingo, you'll improve system reliability and scalability, collaborate with teams, and consult on infrastructure design.
Top Skills:
DockerGoJavaKotlinKubernetesMesosNomadPython
Artificial Intelligence • Software
As a Senior/Staff Network Reliability Engineer, you'll optimize and maintain Fluidstack's network platform, ensuring performance and reliability for AI and HPC workloads. Responsibilities include tuning networking protocols, deploying and validating switches, automating telemetry, conducting root-cause analyses, and collaborating with vendors.
Top Skills:
BgpDpdkEbpfEvpnGeneveGoPythonRdmaRustTcp/IpVxlanXdp
Software
As a Site Reliability Engineer, you'll build and maintain infrastructure for ML models, automate processes, and collaborate cross-functionally.
Top Skills:
Circle CiCloudFormationElk StackGithub ActionsGitlab CiGrafanaJenkinsKubernetesOpentelemetryPrometheusPulumiTerraform
Information Technology • Legal Tech
The Senior Technology Site Reliability Engineer is responsible for maintaining and optimizing infrastructure and applications, ensuring reliability and performance while automating processes and collaborating with teams.
Top Skills:
AWSChefDatadogGoGrafanaJavaPrometheusPuppetPythonSaltTerraform
Marketing Tech
The Cloud Reliability Engineer develops, configures, and deploys cloud tools, enhances applications, ensures observability, and participates in on-call rotations.
Top Skills:
AWSCi/CdDockerGithub ActionsGoGoogle BigqueryGCPKubernetesLinuxPythonSQLTerraform
Fintech • Mobile • Software
The Staff Site Reliability Engineer will design and manage AWS infrastructure, optimize Kubernetes operations, automate workflows, and troubleshoot systems for improved reliability and performance.
Top Skills:
AWSCi/CdDatadogDockerEksGithub ActionsGoKafkaKubernetesNginxPrivatelinkPythonTerraformTransit GatewayVpc
Financial Services
As a Site Reliability Engineer, you'll optimize and manage cloud infrastructure, implement automation, and maintain system reliability for a global financial platform.
Top Skills:
AWSGCPGoHelmKubernetesLinuxPythonTerraform
Artificial Intelligence
As an Applied AI Engineer, you will onboard customers, deploy AI solutions, work on complex projects, and provide technical guidance. You'll contribute to open-source projects and communicate effectively with stakeholders.
Top Skills:
AnsibleAWSAzureDockerGCPKubernetesPythonTerraform
Artificial Intelligence • Productivity • Software • Automation
As a Staff Site Reliability Engineer at Zapier, you will lead reliability strategies to enhance observability, mentor engineers, and drive adoption of reliability practices. You will design for scale, influence organizational culture, and integrate AI tools into workflows for improved performance.
Top Skills:
ArgocdAWSDatadogGitlabGoGrafanaKafkaKubernetesOpensearchPrometheusPythonRedisSentryTerraformTypescript
Reposted 25 Days AgoSaved
Easy Apply
Easy Apply
Cloud • Security • Software • Cybersecurity • Automation
As a Senior Site Reliability Engineer at GitLab, you will automate and manage the lifecycle of GitLab environments, ensuring reliability and scalability while leading incident responses and architectural decisions.
Top Skills:
AnsibleAWSElkGCPGoGrafanaKubernetesPrometheusRubyTerraform
Artificial Intelligence • Cybersecurity
The Database Reliability Engineer will ensure database availability, performance, scalability, and security across AWS, collaborating with application and security teams.
Top Skills:
AWSCrossplaneDatadogGitlab Ci/CdKubernetesNoSQLOpensearchPostgresTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Software
As an AI Site Reliability Researcher, you will ensure the scalability and reliability of our AI platform, design systems for observability, manage deployments, and develop CI/CD pipelines for hybrid environments.
Top Skills:
AIAWSDatadogGCPGrafanaKubernetesOpentelemetryPrometheusSreTerraform
AdTech • Marketing Tech • Analytics
The Staff SRE DevOps Engineer will manage customer applications, improve system reliability, collaborate on architecture discussions, and support infrastructure needs across teams.
Top Skills:
AWSBashDatadogDockerKafkaKibanaKubernetesLinuxPostgresPythonRedshiftSparkTerraform
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer (SRE) at NVIDIA is responsible for designing, building, and maintaining large-scale production systems, focusing on reliability and efficiency, automation, and continuous improvement.
Top Skills:
ContainersGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby
Reposted 39 Minutes AgoSaved
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer will design, implement, and maintain an observability platform, ensuring reliability and performance while supporting production systems and optimizing operational practices.
Top Skills:
DockerGoGrafanaKubernetesLinuxNetworkingOpenstackOpentelemetryPerlPrometheusPythonRuby
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Design and maintain large scale Kubernetes clusters, ensuring reliability through monitoring, automation and incident response.
Top Skills:
DockerGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Principal Staff SRE will lead initiatives in building and optimizing core infrastructure services on-prem and cloud, deploying and managing services at scale, and improving performance with automation and monitoring tools.
Top Skills:
DhcpDnsEbpfGoLdapLinuxNtpPythonTerraformXdp
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The Senior Site Reliability Engineer will manage deployments, operations, and incident handling for large-scale AI GPU platforms while ensuring high performance and resilience in configurations.
Top Skills:
C++KubernetesLinuxPython
Hardware • Machine Learning • Security • Software
The Site Reliability Engineer will manage software deployment for IoT devices, improve observability, maintain dashboards, automate processes, and collaborate on incident responses.
Top Skills:
AnsibleAWSBashC/C++DatadogGrafanaGroovyJavaJavaScriptNoSQLPostgresPrometheusPythonRSigmaSQLTerraform
Artificial Intelligence • Cloud • Fintech • Machine Learning • Mobile • Software
The Staff Site Reliability Engineer will design, implement, and optimize infrastructure for AI services, ensure reliability and performance, and drive automation and observability excellence across engineering teams.
Top Skills:
AzureAzure DevopsDockerElk StackGithub ActionsGrafanaKubernetesMimirPostgresPrometheusSQL ServerTeamcityTerraform
Security • Cybersecurity
Lead DevOps initiatives to enhance microservice infrastructure, mentor engineering teams, and manage production issues, with a focus on security and automation.
Top Skills:
AWSCircleCIGithub ActionsGoJenkinsKubernetesPythonTerraform
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
The role involves architecting and operating large-scale observability systems, designing resilient telemetry pipelines, automating operations, and leading incident responses while collaborating with various teams.
Top Skills:
ElasticsearchFlinkGoJaegerKafkaLokiMimirOpensearchOpentelemetryPrometheusPythonSparkTempoThanos
Consulting
The SRE Engineer will manage application support, automate solutions, improve system stability, and partner with various teams for application deployment, focusing on infrastructure management and SRE principles.
Top Skills:
AirflowApacheClouderaDatabasesGrafanaHadoopKafkaKubernetesLinuxNginxOpenshiftPrometheusPythonRedisShellUnix
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills:
BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Top NYC Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs in NYC
.NET Developer Jobs in NYC
Android Developer Jobs in NYC
C# Jobs in NYC
C++ Jobs in NYC
DevOps Jobs in NYC
Engineering Manager Jobs in NYC
Front End Developer Jobs in NYC
Golang Jobs in NYC
Hardware Engineer Jobs in NYC
iOS Developer Jobs in NYC
Java Developer Jobs in NYC
Javascript Jobs in NYC
Linux Jobs in NYC
Perl Jobs in NYC
PHP Developer Jobs in NYC
Python Jobs in NYC
QA Jobs in NYC
Ruby Jobs in NYC
Sales Engineer Jobs in NYC
Salesforce Developer Jobs in NYC
Scala Jobs in NYC
Artificial Intelligence Jobs in NYC
Artificial Intelligence Engineer Jobs in NYC
AWS Engineer Jobs in NYC
Backend Engineer Jobs in NYC
DevOps Engineer Jobs in NYC
Director of Engineering Jobs in NYC
Engineering Jobs in NYC
Full Stack Engineer Jobs in NYC
Infrastructure Engineer Jobs in NYC
Lead Software Engineer Jobs in NYC
Network Engineer Jobs in NYC
Platform Engineer Jobs in NYC
Principal Architect Jobs in NYC
Principal Engineer Jobs in NYC
Principal Software Engineer Jobs in NYC
Quality Assurance Automation Engineer Jobs in NYC
Reliability Engineer Jobs in NYC
Senior Backend Engineer Jobs in NYC
Senior Cloud Engineer Jobs in NYC
Senior Full-Stack Engineer Jobs in NYC
Senior Platform Engineer Jobs in NYC
Senior Python Engineer Jobs in NYC
Senior Site Reliability Engineer Jobs in NYC
Solutions Architect Jobs in NYC
Solutions Engineer Jobs in NYC
Staff Engineer Jobs in NYC
Staff Software Engineer Jobs in NYC
Systems Engineer Jobs in NYC
Vice President of Engineering Jobs in NYC
All Filters
Total selected ()
No Results
No Results




























