Top Reliability Engineer Jobs in NYC, NY

2 Days AgoSaved
Hybrid
New York, NY
182K-250K Annually
Senior level
182K-250K Annually
Senior level
Healthtech • Social Impact • Software
Define and scale reliability practices across the company by creating SLO/SLA frameworks, improving observability, evolving incident response, building self-service tooling and scorecards, and driving cross-team adoption to enable teams to build and operate reliable production systems at scale.
Top Skills: AWSDatadogEksKubernetesPostgresTerraform
17 Days AgoSaved
Hybrid
New York, NY
110K-147K Annually
Entry level
110K-147K Annually
Entry level
Artificial Intelligence • Information Technology • Software
As a Forward Deployed Reliability Engineer, you ensure stability of workflows, resolve issues swiftly, automate tasks, and drive product improvements through collaboration and documentation.
Top Skills: JavaPythonSparkSQL
17 Days AgoSaved
Hybrid
New York, NY
96K-140K Annually
Mid level
96K-140K Annually
Mid level
Artificial Intelligence • Information Technology • Software
As a Product Reliability Engineer, you'll ensure service health and performance, tackle outages, and enhance code stability while improving observability and resilience in complex systems.
Top Skills: CSSDjangoFlaskGoHTMLJavaJavaScriptPrometheusPythonRubyRuby On Rails
YesterdaySaved
Easy Apply
Remote or Hybrid
New York, NY
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.
Top Skills: AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls
YesterdaySaved
Hybrid
New York, NY
197K-246K Annually
Mid level
197K-246K Annually
Mid level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical, second-line oversight of SRE and cloud engineering practices. Perform deep-dive risk analyses of cloud architectures, resiliency, CI/CD, observability, and Gen AI integrations. Produce data-driven risk findings, mitigation recommendations, and executive-facing reports while partnering with first-line engineers and leadership to ensure robust controls and operational reliability.
Top Skills: AWSAzureCi/CdCloud-NativeContainerizationDatadogElkGCPGenerative AiKubernetesPagerdutyPrometheusSplunk
Reposted 3 Days AgoSaved
Easy Apply
In-Office
New York, NY
Easy Apply
100K-250K Annually
Mid level
100K-250K Annually
Mid level
Fintech • Payments • Financial Services
The role involves improving system reliability, building automation, debugging issues, collaborating across teams, and mentoring engineers, focusing on creating a reliable financial ecosystem.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKubernetesRustTerraform
Reposted 3 Days AgoSaved
Easy Apply
Hybrid
New York, NY
Easy Apply
187K-240K Annually
Senior level
187K-240K Annually
Senior level
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
The role involves developing AI-assisted product experiences for Datadog by building systems for chat, remediations, and codefixes, alongside collaboration with cross-functional teams to enhance user outcomes.
Top Skills: Ai Coding ToolsGoKubernetesLlm-Based Systems
Reposted 4 Days AgoSaved
In-Office
New York, NY
125K-350K Annually
Mid level
125K-350K Annually
Mid level
Information Technology • Software • Financial Services • Quantitative Trading
The Site Reliability Engineer will provide support and diagnose issues within a real-time, distributed environment, focusing on large-scale application and infrastructure management, with basic required skills in UNIX/Linux, networking, SQL, and scripting languages.
Top Skills: BashPythonSQLTcp/IpUdpUnix/Linux
Reposted 4 Days AgoSaved
Hybrid
New York, NY
205K-225K Annually
Senior level
205K-225K Annually
Senior level
Artificial Intelligence • Fintech • Payments • Social Impact • Analytics • Financial Services • Automation
As a Senior SRE, you'll ensure reliable and scalable systems, develop observability solutions and infrastructure as code, and lead incident response efforts.
Top Skills: AWSCloudFormationDatadogElkPrometheusTerraform
5 Days AgoSaved
Hybrid
New York, NY
209K-286K Annually
Senior level
209K-286K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical risk advisory for SRE and cloud-native engineering, assess resiliency, SLIs/SLOs, CI/CD, and observability, perform independent risk reviews, drive AI/automation adoption, and deliver executive-facing risk reporting and remediation guidance.
Top Skills: AutomationAWSAzureCi/CdCloud-Native ArchitecturesContainerizationDatadogElkGCPGen AiObservabilityPagerdutyPrometheusSplunk
Reposted 5 Days AgoSaved
Easy Apply
Hybrid
New York, NY
Easy Apply
179K-212K Annually
Senior level
179K-212K Annually
Senior level
Healthtech • Pharmaceutical • Telehealth
As a Senior Site Reliability Engineer, you'll ensure production system reliability, design resilient infrastructures, and improve operational excellence while collaborating with cross-functional teams.
Top Skills: AWSDatadogEksElasticacheGoPulumiPythonRdsRoute53S3Terraform
Reposted 5 Days AgoSaved
In-Office
New York, NY
160K-300K Annually
Senior level
160K-300K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
As a Site Reliability Engineer, you'll design and improve critical production systems, lead incident response, and enhance observability while embedding with product teams to ensure reliability and performance at scale.
Top Skills: AWSC++Ci/CdGoPythonRust
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
6 Days AgoSaved
In-Office
New York, NY
182K-242K Annually
Senior level
182K-242K Annually
Senior level
Cloud • Information Technology • Machine Learning
Own, build, and operate production reliability tooling and systems across the cloud stack. Lead projects to improve availability, scalability, automation, observability, and incident response. Ship production services in Python/Go, participate on-call, reduce toil through automation, and maintain long-lived platform frameworks.
Top Skills: Cloud-NativeGoGpu-Accelerated InfrastructureKubernetesMetricsPythonSlos/SlisStructured LogsTracing
Reposted 6 Days AgoSaved
Easy Apply
Hybrid
New York, NY
Easy Apply
111K-218K Annually
Mid level
111K-218K Annually
Mid level
Big Data • Cloud • Software • Database
The Site Reliability Engineer designs and builds infrastructure for a global cloud service, implements automation, and optimizes system performance while managing on-call operations.
Top Skills: AWSDnsGCPHTTPKubernetesLinuxAzureProgramming LanguagesTls
Reposted 7 Days AgoSaved
Easy Apply
Remote or Hybrid
New York, NY
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills: AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
2 Days AgoSaved
In-Office
New York, NY
Senior level
Senior level
Enterprise Web • Fintech • Software • Financial Services
Own and build quality systems across the platform: design automated unit, integration, and E2E tests; own CI/CD quality gates; build regression, performance, and load testing; ensure data integrity and SOC 2 compliance around PII and encryption; define release certification and quality metrics; surface risks in architecture reviews; and lead engineering teams to improve testing practices and release confidence.
Top Skills: ArtilleryBullCi/CdClaudeCopilotCursorCypressE2EGithub ActionsJavaScriptJestK6KafkaLocustMongoDBPlaywrightRedisSnowflakeTypescript
Reposted 8 Days AgoSaved
Hybrid
New York, NY
147K-278K Annually
Senior level
147K-278K Annually
Senior level
Cloud • Software
Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.
Top Skills: AWSGoKubernetesPuppetPythonTerraform
Reposted 8 Days AgoSaved
In-Office
New York, NY
175K-275K Annually
Expert/Leader
175K-275K Annually
Expert/Leader
Artificial Intelligence • Cloud • Enterprise Web • Natural Language Processing • Software • App development • Automation
Design and implement large-scale distributed systems that integrate AI safely and reliably, focusing on infrastructure, observability, and security.
Top Skills: Cloud NetworkingContainersDistributed SystemsEvent Driven RuntimesKedaKnativeKubernetesMulti Cloud ArchitectureOperating SystemsScalability
Reposted 10 Days AgoSaved
In-Office
New York, NY
161K-284K Annually
Senior level
161K-284K Annually
Senior level
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
As a Senior Site Reliability Engineer, you will enhance platform reliability, lead incident management, and drive AI-driven improvements in operational workflows.
Top Skills: Amazon Web ServicesDatadogDynamoDBEnvoyEvent Driven ArchitecturesGrpcHTTPIstioJSONKotlinKubernetesLaunchdarklyModern JavaMySQLProtocol BuffersTerraformVitess
Reposted YesterdaySaved
Easy Apply
Remote
New York, NY
Easy Apply
150K-200K Annually
Senior level
150K-200K Annually
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
As a Site Reliability Engineer, you will ensure system stability and resilience, define reliability standards, and automate operational processes while collaborating cross-functionally to improve performance and reduce incidents.
Top Skills: BashCi/CdDockerGoGrafanaKubernetesLinuxPrometheusPython
Reposted YesterdaySaved
Remote
New York, NY
223K-302K Annually
Expert/Leader
223K-302K Annually
Expert/Leader
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills: Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
2 Days AgoSaved
Easy Apply
Remote or Hybrid
New York, NY
Easy Apply
200K-230K Annually
Senior level
200K-230K Annually
Senior level
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
Reposted 11 Days AgoSaved
Easy Apply
Remote or Hybrid
New York, NY
Easy Apply
127K-249K Annually
Senior level
127K-249K Annually
Senior level
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills: AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Reposted 12 Days AgoSaved
Easy Apply
Hybrid
New York, NY
Easy Apply
182K-220K Annually
Senior level
182K-220K Annually
Senior level
Healthtech • Pharmaceutical • Telehealth
As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of production systems, drive incident response, and collaborate with cross-functional teams on best practices for resilience and observability.
Top Skills: AWSDatadogEksElasticacheGoPulumiPythonRdsRoute53S3Terraform
Reposted 4 Days AgoSaved
Easy Apply
Remote or Hybrid
New York, NY
Easy Apply
Internship
Internship
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills: AnsibleAws EcsKubernetesLinuxPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account