Top Reliability Engineer Jobs in NYC, NY

Reposted YesterdaySaved
In-Office
New York, NY
325K-485K Annually
Senior level
325K-485K Annually
Senior level
Artificial Intelligence • Natural Language Processing • Generative AI
The role involves enhancing AI system reliability, developing service objectives, monitoring infrastructure, leading incident responses, and collaborating across teams.
Top Skills: Ai-Specific Observability ToolsDistributed SystemsHigh-Availability InfrastructureMl Hardware AcceleratorsMonitoring And Observability Systems
Reposted YesterdaySaved
In-Office or Remote
New York, NY
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills: AWSAzureC++GCPGoKubernetesOci
21 Days AgoSaved
Remote or Hybrid
New York, NY
175K-200K Annually
Senior level
175K-200K Annually
Senior level
eCommerce • Fintech • Payments • Software
The role involves ensuring software reliability and performance, managing incidents, developing infrastructure automation, and mentoring junior engineers within a platform team.
Top Skills: AWSCloudFormationDatadogKubernetesOpentelemetryRubyRuby On RailsTerraform
Reposted 4 Days AgoSaved
Hybrid
New York, NY
30K-120K Annually
Senior level
30K-120K Annually
Senior level
Information Technology • Automation
The SRE/Infrastructure Engineer will architect and manage secure, scalable systems for automated penetration testing, optimizing reliability, and enhancing infrastructure based on customer demand. Responsibilities include maintaining production environments, leading technical discussions, and promoting high coding standards.
Top Skills: AWSAzureCloudFormationElkGCPNew RelicOpentelemetryPostgresPrometheusTerraform
Reposted 4 Days AgoSaved
Hybrid
New York, NY
Senior level
Senior level
Artificial Intelligence
Seeking an experienced Site Reliability Engineer to enhance platform reliability, scalability, and performance by balancing operations with long-term software engineering improvements.
Top Skills: AIBashDatadogDockerElk StackFluxGoGrafanaKubernetesPrometheusPythonTerraform
Reposted 4 Days AgoSaved
In-Office
New York, NY
89K-178K Annually
Senior level
89K-178K Annually
Senior level
AdTech • Marketing Tech
The role involves enhancing the reliability and performance of media measurement platforms, managing incidents, implementing observability practices, automating processes, and ensuring high availability of cloud and on-premises infrastructures.
Top Skills: AnsibleAWSBashGCPGitlabGoGrafanaHelmKubernetesLinuxMongoDBNagiosNoSQLOciPrometheusPythonSnowflakeSplunkSQLTerraformUnixVertica
Reposted 23 Days AgoSaved
Easy Apply
Remote
New York, NY
Easy Apply
195K-270K Annually
Expert/Leader
195K-270K Annually
Expert/Leader
Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
As a Principal Software Engineer on the SRE team, lead best practices adoption, mentor engineers, and improve system reliability and user experience through automation and collaboration.
Top Skills: CdkCloudFormationDatadogGoJavaScriptPrometheusPythonTerraformTypescript
6 Days AgoSaved
In-Office or Remote
New York, NY
Expert/Leader
Expert/Leader
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance for onshore mechanical equipment, use CMMS to plan and monitor maintenance, analyze reliability data, perform RCA, support operations and maintenance teams, ensure safety and compliance, and recommend improvements to reduce downtime and costs.
Top Skills: CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
6 Days AgoSaved
In-Office or Remote
New York, NY
Expert/Leader
Expert/Leader
Agency • Information Technology • Professional Services • Software
Lead development and implementation of preventive and predictive maintenance programs for offshore mechanical equipment, use CMMS to plan and track work, perform RCA for failures, support offshore teams in troubleshooting, monitor equipment reliability, and ensure compliance with safety and maintenance standards.
Top Skills: CmmsPredictive MaintenancePreventive MaintenanceRoot Cause Analysis
25 Days AgoSaved
Easy Apply
Remote
New York, NY
Easy Apply
150K-200K Annually
Senior level
150K-200K Annually
Senior level
Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
As a Site Reliability Engineer, you will ensure system stability and resilience, define reliability standards, and automate operational processes while collaborating cross-functionally to improve performance and reduce incidents.
Top Skills: BashCi/CdDockerGoGrafanaKubernetesLinuxPrometheusPython
Reposted 7 Days AgoSaved
In-Office
New York, NY
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills: AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
Reposted 7 Days AgoSaved
In-Office
New York, NY
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The Senior Site Reliability Engineer will enhance the Splunk ecosystem and develop an Observability Platform by automating infrastructure and managing complex distributed systems, while optimizing log collection and incident response.
Top Skills: AWSGCPGoKubernetesLinuxOpentelemetryPythonRubySplunkTerraform
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 7 Days AgoSaved
Hybrid
New York, NY
150K-175K Annually
Expert/Leader
150K-175K Annually
Expert/Leader
Artificial Intelligence • Machine Learning • Natural Language Processing • Software
As a Site Reliability Engineer, you'll enhance the performance and reliability of infrastructure and products by collaborating with engineering teams, automating configurations, and implementing monitoring systems.
Top Skills: AWSGoKubernetesPythonTerraform
Reposted 7 Days AgoSaved
In-Office
New York, NY
140K-170K Annually
Senior level
140K-170K Annually
Senior level
Financial Services
The Senior Site Reliability Engineer will enhance production insights, manage scalable infrastructure, optimize Kubernetes, and develop automation tools while ensuring high availability and performance in cloud-based systems.
Top Skills: AnsibleAWSGCPGitopsGoGrafanaHelmIacKubernetesPythonSplunkTerraformTerragrunt
Reposted 8 Days AgoSaved
In-Office
New York, NY
200K-240K Annually
Senior level
200K-240K Annually
Senior level
Artificial Intelligence • Healthtech • Software
Design, build, and maintain secure and scalable infrastructure for critical healthcare applications, lead incident responses, and support engineering teams.
Top Skills: BashGCPGoGrafanaHelmKubernetesPrometheusPythonTerraform
Reposted 8 Days AgoSaved
In-Office
New York, NY
175K-225K Annually
Mid level
175K-225K Annually
Mid level
Fintech • Payments • Financial Services
The Site Reliability Engineer will automate processes, manage server deployments, and collaborate with teams to enhance operational workflows in a trading environment.
Top Skills: AnsibleC++ChefCloud InfrastructureDistributed SystemsDockerGoGrafanaHashicorp NomadHpc ClustersKubernetesLinuxPerlPodmanPrometheusPuppetPythonRancherRustSalt
Reposted 22 Days AgoSaved
In-Office or Remote
New York, NY
Senior level
Senior level
Software
Drive reliability testing and qualification of cellular base stations, collaborating with R&D for long-term reliability and product lifecycle support.
Top Skills: ExcelMS OfficeMs WordPtc WindchillPythonTelcordia
Reposted 9 Days AgoSaved
In-Office
New York, NY
200K-250K Annually
Expert/Leader
200K-250K Annually
Expert/Leader
Payments • Software • Automation
Lead platform and infrastructure direction on AWS, evolve CI/CD and ephemeral environments, set observability and SLO standards, drive incident response and postmortems, mentor engineers, and build automation to reduce operational risk.
Top Skills: AWSCi/CdDistributed SystemsEcsEphemeral Environments/Preview DeploysFargateGithub ActionsLogsObservability (MetricsSlos/Slis/Error BudgetsTracing)
Reposted An Hour AgoSaved
Remote
New York, NY
120K-190K Annually
Senior level
120K-190K Annually
Senior level
Enterprise Web • Information Technology • Mobile
The Senior Software Engineer will focus on infrastructure, reliability, and platform engineering, designing scalable systems, managing CI/CD processes, and evolving observability and incident response protocols.
Top Skills: AWSDistributed TracingFly.IoGithub ActionsGoLoggingMetricsPostgresTerraform
Reposted 2 Hours AgoSaved
Remote
New York, NY
165K-230K Annually
Senior level
165K-230K Annually
Senior level
Information Technology • Security
The Staff Site Reliability Engineer will lead the architecture and security of the SimSpace cyber range platform, focusing on reliability, automation, and observability across diverse deployment environments while mentoring engineers and driving infrastructure initiatives.
Top Skills: ArgocdGithub ActionsGoGrafana TankaJsonnetKubernetesPython
Reposted 2 Hours AgoSaved
In-Office or Remote
New York, NY
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software
As a Staff SRE, you will ensure the reliability and performance of Andromeda's GPU infrastructure, lead incident responses, build observability systems, and mentor engineers, while collaborating closely with engineering and customers.
Top Skills: AnsibleCudaGoHelmKubernetesLinuxNcclNvidiaPythonRustSlurmTerraform
Reposted 2 Hours AgoSaved
Remote
New York, NY
101K-161K Annually
Senior level
101K-161K Annually
Senior level
Cloud • Software • Analytics
Join Arista Networks as a Site Reliability Engineer to manage CloudVision service reliability, scalability, and stability in a FedRAMP environment, focusing on areas like architecture, security, and performance optimization.
Top Skills: AnsibleBashGCPGkeGoKubernetesPulumiPython
Reposted 2 Hours AgoSaved
Remote
New York, NY
Senior level
Senior level
Big Data
You will manage AWS infrastructure, automate deployments, debug application issues, and improve the operational health of Metabase Cloud.
Top Skills: AWSDatadogGoGrafanaKubernetesPrometheusPythonTerraform
YesterdaySaved
Remote
New York, NY
140K-150K Annually
Mid level
140K-150K Annually
Mid level
Healthtech
Design, provision, and operate AWS infrastructure using Terraform; run and scale Kubernetes workloads with Helm; build observability, monitoring, and CI/CD automation; define SLIs/SLOs and lead incident response and postmortems; implement security and compliance (HIPAA/SOC2); participate in on-call rotation and partner with product and engineering on capacity, performance, and resilient system design.
Top Skills: ArgocdAWSAws Secrets ManagerCi/CdClickhouseCloudwatchDatadogEvent SourcingFluxGoGrafanaHashicorp VaultHelmKubernetesLinuxMySQLOpentelemetryPostgresPrometheusPythonRedshiftSignozSnowflakeTerraform
Reposted 10 Days AgoSaved
Remote or Hybrid
New York, NY
165K-330K Annually
Mid level
165K-330K Annually
Mid level
Software
As an AI Support Engineer, you'll manage support requests, resolve user issues, optimize ML models, and contribute to product development.
Top Skills: Tensorrt
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account