Job Title, Company or Keyword

Maximum of 25 job preferences reached.

Top Reliability Engineer Jobs in NYC, NY

DraftKings

Senior Lead Database Reliability Engineer

Reposted YesterdaySaved

Remote or Hybrid

New York, NY

168K-210K Annually

Senior level

168K-210K Annually

Senior level

Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics

Lead reliability, scalability, and operational excellence of large-scale database platforms across cloud and on-prem. Build automation-first database infrastructure (Kubernetes operators, IaC, GitOps), drive monitoring/SLOs, incident leadership, performance and cost optimization, and partner with application teams on safe schema/migration practices. Mentor engineers and evaluate AI-assisted workflows to improve productivity and reliability.

Top Skills: AerospikeArgocdAuroraClaudeCloud SqlCursorDatabase OperatorsEksFluxcdGithub CopilotGitopsGkeGoKubernetesMcpMongoDBMySQLPersistent VolumesPostgresPulumiPythonRedisScylladbStatefulsetsTerraform

Palantir Technologies

Forward Deployed Reliability Engineer

Reposted 11 Days AgoSaved

Hybrid

New York, NY

110K-147K Annually

Entry level

110K-147K Annually

Entry level

Artificial Intelligence • Software

As a Forward Deployed Reliability Engineer, you ensure stability of workflows, resolve issues swiftly, automate tasks, and drive product improvements through collaboration and documentation.

Top Skills: JavaPythonSparkSQL

Palantir Technologies

Product Reliability Engineer - Defense

Reposted 11 Days AgoSaved

Hybrid

New York, NY

96K-140K Annually

Mid level

96K-140K Annually

Mid level

Artificial Intelligence • Software

As a Product Reliability Engineer, you'll ensure service health and performance, tackle outages, and enhance code stability while improving observability and resilience in complex systems.

Top Skills: CSSDjangoFlaskGoHTMLJavaJavaScriptPrometheusPythonRubyRuby On Rails

Grow Therapy

Senior Platform Reliability Engineer

Reposted 22 Days AgoSaved

Hybrid

New York, NY

182K-250K Annually

Senior level

182K-250K Annually

Senior level

Healthtech • Social Impact • Software

Define and scale reliability practices across the company by creating SLO/SLA frameworks, improving observability, evolving incident response, building self-service tooling and scorecards, and driving cross-team adoption to enable teams to build and operate reliable production systems at scale.

Top Skills: AWSDatadogEksKubernetesPostgresTerraform

Hebbia

Software Engineer, Site Reliability

Reposted YesterdaySaved

In-Office

New York, NY

160K-300K Annually

Senior level

160K-300K Annually

Senior level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI

As a Site Reliability Engineer, you'll design and improve critical production systems, lead incident response, and enhance observability while embedding with product teams to ensure reliability and performance at scale.

Top Skills: AWSC++Ci/CdGoPythonRust

MongoDB

Site Reliability Engineer 3

Reposted YesterdaySaved

Easy Apply

Hybrid

New York, NY

Easy Apply

111K-218K Annually

Mid level

111K-218K Annually

Mid level

Big Data • Cloud • Software • Database

The Site Reliability Engineer designs and builds infrastructure for a global cloud service, implements automation, and optimizes system performance while managing on-call operations.

Top Skills: AWSDnsGCPHTTPKubernetesLinuxAzureProgramming LanguagesTls

Cox Enterprises

Sr Software Engineer - Reliability Engineering

2 Days AgoSaved

Hybrid

New York, NY

122K-203K Annually

Senior level

122K-203K Annually

Senior level

Artificial Intelligence • Automotive • Greentech • Information Technology • Machine Learning • Software • Cybersecurity

Design, build, and operate reliable infrastructure and observability tooling across the stack. Own IaC, incident response, monitoring, cost optimization, and production systems. Mentor engineers, participate in on-call rotations, run postmortems, and drive reliability and operational excellence for a high-volume platform.

Top Skills: AthenaAuroraAws Ec2Ci/CdDistributed TracingDockerDynamoDBGoJavaKubernetesLambdaLinuxLoggingMetricsNew RelicPrometheusPythonRdsS3SplunkTerraformVpcsWindows

Cisco ThousandEyes

Senior Site Reliability Engineer (FedRAMP) - ThousandEyes

Reposted 3 Days AgoSaved

Hybrid

New York, NY

147K-278K Annually

Senior level

147K-278K Annually

Senior level

Cloud • Software

Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.

Top Skills: AWSGoKubernetesPuppetPythonTerraform

JPMorganChase

Lead Site Reliability Engineer

4 Days AgoSaved

Hybrid

New York, NY

Senior level

Financial Services

Lead SRE for Sales Execution platforms responsible for stability, availability, resiliency, incident leadership, RCA, and operational maturity. Partner with Front Office, Product, Development, and Infrastructure to drive SRE adoption, observability, automation, and AI-assisted reliability workflows while mentoring engineers and owning outcomes for business-critical services.

Top Skills: AnsibleAWSAzureCi/CdContainersDynatraceEnterprise-Authorized AiGCPGeneosGrafanaItilKubernetesMicroservicesOpenshiftPowershellPythonShellSplunkTerraform

MongoDB

Site Reliability Engineer (Senior or Staff), Atlas

Reposted 6 Days AgoSaved

Easy Apply

Remote or Hybrid

New York, NY

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.

Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls

Zscaler

Staff Production Engineer (SRE) (Federal)

Reposted 7 Days AgoSaved

Easy Apply

Remote or Hybrid

New York, NY

Easy Apply

119K-170K Annually

Senior level

119K-170K Annually

Senior level

Cloud • Information Technology • Security • Software • Cybersecurity

As a Staff Site Reliability Engineer, you'll oversee Zscaler production data center services, optimize code, and ensure cloud service availability and performance. Collaborate with cross-functional teams to improve processes and resolve escalated issues.

Top Skills: BashDnsFirewallsGrafanaHTTPIcmpLoad BalancingNagiosOsi ModelPrometheusPythonTcp/Ip

Legora

Senior Site Reliability Engineer

Reposted 7 Days AgoSaved

In-Office

New York, NY

237K-321K Annually

3 Days AgoSaved

In-Office

New York, NY

150K-170K Annually

Mid level

150K-170K Annually

Mid level

Artificial Intelligence • Logistics • Robotics • Software

Own reliability across cloud, edge, and on-site deployments. Build observability, monitoring, and alerting. Define incident response and on-call processes, improve deployment workflows, diagnose infra/network/distributed-system issues, and make deployments repeatable and scalable.

Top Skills: AWSAzureGCPGrafanaKafkaKubernetesLinuxOpentelemetryPrometheusRtspSecure TunnelsVpnWebrtc

Claryo, Inc.

Integration Reliability Engineer

3 Days AgoSaved

In-Office

New York, NY

150K-170K Annually

Mid level

150K-170K Annually

Mid level

Artificial Intelligence • Logistics • Robotics • Software

Own reliability of distributed systems across cloud (Kubernetes), edge, and on-site deployments. Build observability, monitoring, alerting, and incident response processes. Improve deployment workflows, diagnose infra/networking/distributed issues, and partner with engineering to prevent recurrence and scale repeatable deployments in imperfect real-world environments.

Top Skills: AWSAzureContainerized SystemsGCPGrafanaKafkaKubernetesLinuxNetworkingOpentelemetryPrometheusRtspSecure TunnelsVpnsWebrtc

Okta

Senior Database Reliability Engineer (DBRE)

Reposted 4 Days AgoSaved

In-Office

New York, NY

160K-220K Annually

Senior level

160K-220K Annually

Senior level

Cloud

The role involves designing, optimizing, and maintaining PostgreSQL and MySQL databases, ensuring high availability, reliability, and performance for mission-critical systems, while automating operational tasks and responding to incidents.

Top Skills: AnsibleAWSDatadogGCPGoGrafanaKubernetesMySQLPostgresPrometheusPythonTerraform

MongoDB

Senior Site Reliability Engineer, Fleet Management

Reposted 9 Days AgoSaved

Easy Apply

Remote or Hybrid

New York, NY

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Develop and maintain Kubernetes runtime environments, support developers, resolve critical issues, and participate in on-call rotations for production systems.

Top Skills: AWSAzureCert-ManagerCorednsCrdsCriCsiGatekeeperGCPGoHelmKubernetesKustomizeOperatorsPythonTerraform

Federal Reserve System

Cloud AWS Support Reliability Engineer (SRE)

5 Days AgoSaved

In-Office

New York, NY

160K-230K Annually

Mid level

160K-230K Annually

Mid level

Fintech • Payments • Financial Services

Operate and maintain AWS infrastructure and CI/CD pipelines, deploy containerized applications, write Terraform IaC, implement monitoring/observability, troubleshoot incidents, resolve security vulnerabilities, participate in on-call rotation, and produce operational/runbook documentation to ensure resilient cloud platform delivery.

Top Skills: AgileAlbAmazon Web Services (Aws)Aws IamAws LambdaCloudbeaverCloudwatchDevsecopsDirect ConnectDnsEc2EcsEfsEksElbGitlab CiGlueGrafanaHelmJavaJbossJIRANginxOktaOracle RdsPythonRds PostgresRoute53S3ScalaSnsTerraformTomcatTransit GatewayVpc

NBCUniversal

Staff Software Engineer (SAP BTP SRE Lead)

Reposted 11 Days AgoSaved

Remote or Hybrid

New York, NY

130K-170K Annually

Senior level

130K-170K Annually

Senior level

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

Oversee operational support of SAP BTP CPI applications, manage incidents, lead support specialists, and collaborate on architecture and governance for finance processes.

Top Skills: Abap ProxiesAemCapmCloud ConnectorCloud FoundryEdge Integration CellIdocJSONMessage QueuesOauthOdataRestSAMLSap BtpSfapiSftpSoapXML

Cohere Health

Site Reliability Engineer ll

Reposted 2 Days AgoSaved

Easy Apply

Remote

New York, NY

Easy Apply

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

Operate and maintain AWS-hosted MERN applications and large-scale data workflows. Manage serverless and Spark-based pipelines, perform incident response and on-call duties, engineer automation to eliminate operational toil, ensure HIPAA/SOC2/HITRUST compliance, build observability and lead blameless post-mortems.

Top Skills: Amazon EcsAmazon EksAmazon EmrAthenaAws GlueAws LambdaAws SnsAws SqsCloudwatchEc2IamJavaScriptMernMySQLNode.jsOpentofuPysparkPythonRabbitMQTerraformTypescriptVpc

JPMorganChase

Lead Site Reliability Engineer

12 Days AgoSaved

Hybrid

New York, NY

Senior level

Financial Services

Lead SRE responsible for non-functional requirements, reliability, resiliency, security, monitoring, automation, and SRE adoption. Partner with engineering and stakeholders, run blameless post-incident reviews, coach engineers, scale SRE practices, and integrate enterprise AI into incident triage and operational workflows while ensuring guardrails and data sensitivity controls.

Top Skills: Ci/CdDockerEcsEnterprise AiGitlabGoGraphQLJavaScriptJenkinsKafkaKubernetesOpentelemetryPythonTerraform

JPMorganChase

Site Reliability Engineer III- Production Management

12 Days AgoSaved

Hybrid

New York, NY

Mid level

Financial Services

Operate and improve production reliability for critical services by building automation, monitoring, and runbooks. Triage incidents, reduce MTTR, improve observability, partner with engineering for root-cause fixes, and apply validated enterprise AI-assisted tools to support SRE workflows and reduce toil.

Top Skills: .NetAWSCi/CdDatadogEnterprise-Authorized AiJavaKafkaMqPythonSpring Boot