Maximum of 25 job preferences reached.
Top Reliability Engineer Jobs in NYC, NY
Artificial Intelligence • Natural Language Processing • Generative AI
The role involves enhancing AI system reliability, developing service objectives, monitoring infrastructure, leading incident responses, and collaborating across teams.
Top Skills:
Ai-Specific Observability ToolsDistributed SystemsHigh-Availability InfrastructureMl Hardware AcceleratorsMonitoring And Observability Systems
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills:
AWSAzureC++GCPGoKubernetesOci
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills:
AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Information Technology • Automation
The SRE/Infrastructure Engineer will architect and manage secure, scalable systems for automated penetration testing, optimizing reliability, and enhancing infrastructure based on customer demand. Responsibilities include maintaining production environments, leading technical discussions, and promoting high coding standards.
Top Skills:
AWSAzureCloudFormationElkGCPNew RelicOpentelemetryPostgresPrometheusTerraform
Artificial Intelligence
Seeking an experienced Site Reliability Engineer to enhance platform reliability, scalability, and performance by balancing operations with long-term software engineering improvements.
Top Skills:
AIBashDatadogDockerElk StackFluxGoGrafanaKubernetesPrometheusPythonTerraform
AdTech • Marketing Tech
The role involves enhancing the reliability and performance of media measurement platforms, managing incidents, implementing observability practices, automating processes, and ensuring high availability of cloud and on-premises infrastructures.
Top Skills:
AnsibleAWSBashGCPGitlabGoGrafanaHelmKubernetesLinuxMongoDBNagiosNoSQLOciPrometheusPythonSnowflakeSplunkSQLTerraformUnixVertica
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Principal Software Engineer on the SRE team, lead best practices adoption, mentor engineers, and improve system reliability and user experience through automation and collaboration.
Top Skills:
CdkCloudFormationDatadogGoJavaScriptPrometheusPythonTerraformTypescript
6 Days AgoSaved
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance for onshore mechanical equipment, use CMMS to plan and monitor maintenance, analyze reliability data, perform RCA, support operations and maintenance teams, ensure safety and compliance, and recommend improvements to reduce downtime and costs.
Top Skills:
CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
6 Days AgoSaved
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance programs for offshore mechanical equipment, use CMMS to plan and track work, perform RCA for failures, support offshore teams in troubleshooting, monitor equipment reliability, and ensure compliance with safety and maintenance standards.
Top Skills:
CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
As a Site Reliability Engineer, you will ensure system stability and resilience, define reliability standards, and automate operational processes while collaborating cross-functionally to improve performance and reduce incidents.
Top Skills:
BashCi/CdDockerGoGrafanaKubernetesLinuxPrometheusPython
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills:
AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
Cloud
The Senior Site Reliability Engineer will enhance the Splunk ecosystem and develop an Observability Platform by automating infrastructure and managing complex distributed systems, while optimizing log collection and incident response.
Top Skills:
AWSGCPGoKubernetesLinuxOpentelemetryPythonRubySplunkTerraform
New
Cut your apply time in half.
Use ourAI Assistantto automatically fill your job applications.
Use For Free
Artificial Intelligence • Machine Learning • Natural Language Processing • Software
As a Site Reliability Engineer, you'll enhance the performance and reliability of infrastructure and products by collaborating with engineering teams, automating configurations, and implementing monitoring systems.
Top Skills:
AWSGoKubernetesPythonTerraform
Financial Services
The Senior Site Reliability Engineer will enhance production insights, manage scalable infrastructure, optimize Kubernetes, and develop automation tools while ensuring high availability and performance in cloud-based systems.
Top Skills:
AnsibleAWSGCPGitopsGoGrafanaHelmIacKubernetesPythonSplunkTerraformTerragrunt
Artificial Intelligence • Healthtech • Software
Design, build, and maintain secure and scalable infrastructure for critical healthcare applications, lead incident responses, and support engineering teams.
Top Skills:
BashGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform
Fintech • Payments • Financial Services
The Site Reliability Engineer will automate processes, manage server deployments, and collaborate with teams to enhance operational workflows in a trading environment.
Top Skills:
AnsibleC++ChefCloud InfrastructureDistributed SystemsDockerGoGrafanaHashicorp NomadHpc ClustersKubernetesLinuxPerlPodmanPrometheusPuppetPythonRancherRustSalt
Software
Drive reliability testing and qualification of cellular base stations, collaborating with R&D for long-term reliability and product lifecycle support.
Top Skills:
ExcelMS OfficeMs WordPtc WindchillPythonTelcordia
Payments • Software • Automation
Lead platform and infrastructure direction on AWS, evolve CI/CD and ephemeral environments, set observability and SLO standards, drive incident response and postmortems, mentor engineers, and build automation to reduce operational risk.
Top Skills:
AWSCi/CdDistributed SystemsEcsEphemeral Environments/Preview DeploysFargateGithub ActionsLogsObservability (MetricsSlos/Slis/Error BudgetsTracing)
Enterprise Web • Information Technology • Mobile
The Senior Software Engineer will focus on infrastructure, reliability, and platform engineering, designing scalable systems, managing CI/CD processes, and evolving observability and incident response protocols.
Top Skills:
AWSDistributed TracingFly.IoGithub ActionsGoLoggingMetricsPostgresTerraform
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills:
ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills:
AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills:
AnsibleBashGCPGkeGoKubernetesPulumiPython
Big Data
You will manage AWS infrastructure, automate deployments, debug application issues, and improve the operational health of Metabase Cloud.
Top Skills:
AWSDatadogGoGrafanaKubernetesPrometheusPythonTerraform
Healthtech
Design, provision, and operate AWS infrastructure using Terraform; run and scale Kubernetes workloads with Helm; build observability, monitoring, and CI/CD automation; define SLIs/SLOs and lead incident response and postmortems; implement security and compliance (HIPAA/SOC2); participate in on-call rotation and partner with product and engineering on capacity, performance, and resilient system design.
Top Skills:
ArgocdAWSAws Secrets ManagerCi/CdClickhouseCloudwatchDatadogEvent SourcingFluxGoGrafanaHashicorp VaultHelmKubernetesLinuxMySQLOpentelemetryPostgresPrometheusPythonRedshiftSignozSnowflakeTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
Top NYC Companies Hiring Reliability Engineers
See AllPopular Job Searches
All Software Engineer Jobs in NYC
.NET Developer Jobs in NYC
Android Developer Jobs in NYC
C# Jobs in NYC
C++ Jobs in NYC
DevOps Jobs in NYC
Engineering Manager Jobs in NYC
Front End Developer Jobs in NYC
Golang Jobs in NYC
Hardware Engineer Jobs in NYC
iOS Developer Jobs in NYC
Java Developer Jobs in NYC
Javascript Jobs in NYC
Linux Jobs in NYC
Perl Jobs in NYC
PHP Developer Jobs in NYC
Python Jobs in NYC
QA Jobs in NYC
Ruby Jobs in NYC
Sales Engineer Jobs in NYC
Salesforce Developer Jobs in NYC
Scala Jobs in NYC
Artificial Intelligence Jobs in NYC
Artificial Intelligence Engineer Jobs in NYC
AWS Engineer Jobs in NYC
Backend Engineer Jobs in NYC
DevOps Engineer Jobs in NYC
Director of Engineering Jobs in NYC
Engineering Jobs in NYC
Full Stack Engineer Jobs in NYC
Infrastructure Engineer Jobs in NYC
Lead Software Engineer Jobs in NYC
Network Engineer Jobs in NYC
Platform Engineer Jobs in NYC
Principal Architect Jobs in NYC
Principal Engineer Jobs in NYC
Principal Software Engineer Jobs in NYC
Quality Assurance Automation Engineer Jobs in NYC
Reliability Engineer Jobs in NYC
Senior Backend Engineer Jobs in NYC
Senior Cloud Engineer Jobs in NYC
Senior Full-Stack Engineer Jobs in NYC
Senior Platform Engineer Jobs in NYC
Senior Python Engineer Jobs in NYC
Senior Site Reliability Engineer Jobs in NYC
Solutions Architect Jobs in NYC
Solutions Engineer Jobs in NYC
Staff Engineer Jobs in NYC
Staff Software Engineer Jobs in NYC
Systems Engineer Jobs in NYC
Vice President of Engineering Jobs in NYC
All Filters
Total selected ()
No Results
No Results


.png)
.png)















%20(1).png)










