Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in NYC, NY
Healthtech • Social Impact • Software
Define and scale reliability practices across the company by creating SLO/SLA frameworks, improving observability, evolving incident response, building self-service tooling and scorecards, and driving cross-team adoption to enable teams to build and operate reliable production systems at scale.
Top Skills:
AWSDatadogEksKubernetesPostgresTerraform
Artificial Intelligence • Information Technology • Software
As a Forward Deployed Reliability Engineer, you ensure stability of workflows, resolve issues swiftly, automate tasks, and drive product improvements through collaboration and documentation.
Top Skills:
JavaPythonSparkSQL
Artificial Intelligence • Information Technology • Software
As a Product Reliability Engineer, you'll ensure service health and performance, tackle outages, and enhance code stability while improving observability and resilience in complex systems.
Top Skills:
CSSDjangoFlaskGoHTMLJavaJavaScriptPrometheusPythonRubyRuby On Rails
Big Data • Cloud • Software • Database
Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.
Top Skills:
AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical, second-line oversight of SRE and cloud engineering practices. Perform deep-dive risk analyses of cloud architectures, resiliency, CI/CD, observability, and Gen AI integrations. Produce data-driven risk findings, mitigation recommendations, and executive-facing reports while partnering with first-line engineers and leadership to ensure robust controls and operational reliability.
Top Skills:
AWSAzureCi/CdCloud-NativeContainerizationDatadogElkGCPGenerative AiKubernetesPagerdutyPrometheusSplunk
Fintech • Payments • Financial Services
The role involves improving system reliability, building automation, debugging issues, collaborating across teams, and mentoring engineers, focusing on creating a reliable financial ecosystem.
Top Skills:
AWSAzureDatadogDockerEc2GCPGoKubernetesRustTerraform
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
The role involves developing AI-assisted product experiences for Datadog by building systems for chat, remediations, and codefixes, alongside collaboration with cross-functional teams to enhance user outcomes.
Top Skills:
Ai Coding ToolsGoKubernetesLlm-Based Systems
Information Technology • Software • Financial Services • Quantitative Trading
The Site Reliability Engineer will provide support and diagnose issues within a real-time, distributed environment, focusing on large-scale application and infrastructure management, with basic required skills in UNIX/Linux, networking, SQL, and scripting languages.
Top Skills:
BashPythonSQLTcp/IpUdpUnix/Linux
Artificial Intelligence • Fintech • Payments • Social Impact • Analytics • Financial Services • Automation
As a Senior SRE, you'll ensure reliable and scalable systems, develop observability solutions and infrastructure as code, and lead incident response efforts.
Top Skills:
AWSCloudFormationDatadogElkPrometheusTerraform
Fintech • Machine Learning • Payments • Software • Financial Services
Lead technical risk advisory for SRE and cloud-native engineering, assess resiliency, SLIs/SLOs, CI/CD, and observability, perform independent risk reviews, drive AI/automation adoption, and deliver executive-facing risk reporting and remediation guidance.
Top Skills:
AutomationAWSAzureCi/CdCloud-Native ArchitecturesContainerizationDatadogElkGCPGen AiObservabilityPagerdutyPrometheusSplunk
Healthtech • Pharmaceutical • Telehealth
As a Senior Site Reliability Engineer, you'll ensure production system reliability, design resilient infrastructures, and improve operational excellence while collaborating with cross-functional teams.
Top Skills:
AWSDatadogEksElasticacheGoPulumiPythonRdsRoute53S3Terraform
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Financial Services • Generative AI
As a Site Reliability Engineer, you'll design and improve critical production systems, lead incident response, and enhance observability while embedding with product teams to ensure reliability and performance at scale.
Top Skills:
AWSC++Ci/CdGoPythonRust
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Cloud • Information Technology • Machine Learning
Own, build, and operate production reliability tooling and systems across the cloud stack. Lead projects to improve availability, scalability, automation, observability, and incident response. Ship production services in Python/Go, participate on-call, reduce toil through automation, and maintain long-lived platform frameworks.
Top Skills:
Cloud-NativeGoGpu-Accelerated InfrastructureKubernetesMetricsPythonSlos/SlisStructured LogsTracing
Big Data • Cloud • Software • Database
The Site Reliability Engineer designs and builds infrastructure for a global cloud service, implements automation, and optimizes system performance while managing on-call operations.
Top Skills:
AWSDnsGCPHTTPKubernetesLinuxAzureProgramming LanguagesTls
Reposted 7 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
The Senior Site Reliability Engineer will lead security design and implementation for cloud infrastructures, mentor teams, and automate security solutions.
Top Skills:
AnsibleAWSAzureCloud Security ToolsCloudFormationGCPGoTerraform
Enterprise Web • Fintech • Software • Financial Services
Own and build quality systems across the platform: design automated unit, integration, and E2E tests; own CI/CD quality gates; build regression, performance, and load testing; ensure data integrity and SOC 2 compliance around PII and encryption; define release certification and quality metrics; surface risks in architecture reviews; and lead engineering teams to improve testing practices and release confidence.
Top Skills:
ArtilleryBullCi/CdClaudeCopilotCursorCypressE2EGithub ActionsJavaScriptJestK6KafkaLocustMongoDBPlaywrightRedisSnowflakeTypescript
Reposted 8 Days AgoSaved
Cloud • Software
Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.
Top Skills:
AWSGoKubernetesPuppetPythonTerraform
Artificial Intelligence • Cloud • Enterprise Web • Natural Language Processing • Software • App development • Automation
Design and implement large-scale distributed systems that integrate AI safely and reliably, focusing on infrastructure, observability, and security.
Top Skills:
Cloud NetworkingContainersDistributed SystemsEvent Driven RuntimesKedaKnativeKubernetesMulti Cloud ArchitectureOperating SystemsScalability
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
As a Senior Site Reliability Engineer, you will enhance platform reliability, lead incident management, and drive AI-driven improvements in operational workflows.
Top Skills:
Amazon Web ServicesDatadogDynamoDBEnvoyEvent Driven ArchitecturesGrpcHTTPIstioJSONKotlinKubernetesLaunchdarklyModern JavaMySQLProtocol BuffersTerraformVitess
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
As a Site Reliability Engineer, you will ensure system stability and resilience, define reliability standards, and automate operational processes while collaborating cross-functionally to improve performance and reduce incidents.
Top Skills:
BashCi/CdDockerGoGrafanaKubernetesLinuxPrometheusPython
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
The role involves defining reliability strategies, leading initiatives across teams, enhancing monitoring and incident response, and mentoring engineers at Dropbox.
Top Skills:
Ai TechnologiesDebuggingDistributed SystemsIncident ResponseObservabilityReliability Risk ManagementSlasSlos
Artificial Intelligence • Machine Learning
Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.
Top Skills:
Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks
Reposted 11 Days AgoSaved
Easy Apply
Easy Apply
Big Data • Cloud • Software • Database
As a Senior Site Reliability Engineer, you'll design and build complex systems, support Atlas platform operations, automate processes, and ensure high availability of services.
Top Skills:
AWSAzureDnsGCPGoHTTPLinuxPythonRubyTls
Healthtech • Pharmaceutical • Telehealth
As a Senior Site Reliability Engineer, you will ensure the reliability and scalability of production systems, drive incident response, and collaborate with cross-functional teams on best practices for resilience and observability.
Top Skills:
AWSDatadogEksElasticacheGoPulumiPythonRdsRoute53S3Terraform
Cloud • Information Technology • Security • Software • Cybersecurity
This internship role focuses on SRE skills, requiring collaboration and problem-solving in dynamic environments for Zscaler's Zero Trust Exchange team.
Top Skills:
AnsibleAws EcsKubernetesLinuxPythonTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top NYC Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs in NYC
.NET Developer Jobs in NYC
Android Developer Jobs in NYC
C# Jobs in NYC
C++ Jobs in NYC
DevOps Jobs in NYC
Engineering Manager Jobs in NYC
Front End Developer Jobs in NYC
Golang Jobs in NYC
Hardware Engineer Jobs in NYC
iOS Developer Jobs in NYC
Java Developer Jobs in NYC
Javascript Jobs in NYC
Linux Jobs in NYC
Perl Jobs in NYC
PHP Developer Jobs in NYC
Python Jobs in NYC
QA Jobs in NYC
Ruby Jobs in NYC
Sales Engineer Jobs in NYC
Salesforce Developer Jobs in NYC
Scala Jobs in NYC
Artificial Intelligence Jobs in NYC
Artificial Intelligence Engineer Jobs in NYC
AWS Engineer Jobs in NYC
Backend Engineer Jobs in NYC
DevOps Engineer Jobs in NYC
Director of Engineering Jobs in NYC
Engineering Jobs in NYC
Full Stack Engineer Jobs in NYC
Infrastructure Engineer Jobs in NYC
Lead Software Engineer Jobs in NYC
Network Engineer Jobs in NYC
Platform Engineer Jobs in NYC
Principal Architect Jobs in NYC
Principal Engineer Jobs in NYC
Principal Software Engineer Jobs in NYC
Quality Assurance Automation Engineer Jobs in NYC
Reliability Engineer Jobs in NYC
Senior Backend Engineer Jobs in NYC
Senior Cloud Engineer Jobs in NYC
Senior Full-Stack Engineer Jobs in NYC
Senior Platform Engineer Jobs in NYC
Senior Python Engineer Jobs in NYC
Senior Site Reliability Engineer Jobs in NYC
Solutions Architect Jobs in NYC
Solutions Engineer Jobs in NYC
Staff Engineer Jobs in NYC
Staff Software Engineer Jobs in NYC
Systems Engineer Jobs in NYC
Vice President of Engineering Jobs in NYC
All Filters
Total selected ()
No Results
No Results







.png)




.jpeg)
















