Top Senior Site Reliability Engineer Jobs in NYC, NY

Reposted 20 Hours AgoSaved
Remote
New York, NY
115K-135K Annually
Mid level
115K-135K Annually
Mid level
Aerospace • Manufacturing
As a Site Reliability Engineer, you'll build and manage observability platforms for satellite communications, define SLOs/SLIs, and collaborate on incident response and deployment automation.
Top Skills: ArgocdAWSElkGCPGoGrafanaIstioJaegerKubernetesLinkerdLokiOpentelemetryPrometheusPythonTempoTerraform
Reposted YesterdaySaved
Remote
New York, NY
208K-330K Annually
Senior level
208K-330K Annually
Senior level
Fintech
The Staff Site Reliability Engineer role involves leading architecture, automating GCP environment, defining SLIs and SLOs, mentoring teammates, and enhancing system reliability and performance.
Top Skills: ArgocdDatadogGCPGoHelmJavaScriptKubernetesPythonTerraformTypescript
Reposted 11 Days AgoSaved
In-Office
New York, NY
140K-225K Annually
Senior level
140K-225K Annually
Senior level
Fintech
Lead adoption of SRE practices to improve reliability, observability, automation, and incident response. Implement and maintain observability tooling, instrumentation, CI/CD, and infrastructure-as-code. Partner with developers, participate in on-call rotations, drive postmortems, and reduce operational overhead through automation.
Top Skills: AnthropicAWSAws EcsAws EksAzureC#DockerGitlab CiGrafanaLinuxOpenaiPrometheusPuppetPythonSplunkTerraformTypescriptWindows
Reposted 3 Days AgoSaved
Remote
New York, NY
220K-250K Annually
Expert/Leader
220K-250K Annually
Expert/Leader
Cloud • Software • Database
Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.
Top Skills: AksAnsibleAWSAzureBashDockerEksGCPGitGithub ActionsGkeJavaKubernetesLinuxPostgresPrometheusPythonShellTerraform
Reposted 3 Days AgoSaved
Remote or Hybrid
New York, NY
140K-211K Annually
Mid level
140K-211K Annually
Mid level
Cloud • Software
As a Site Reliability Engineer, you will ensure system reliability, handle technical escalations, create automation tools, and collaborate with engineering teams during incidents.
Top Skills: AnsibleBashChefDockerElkGitGitlabGrafanaJenkinsLinuxPrometheusPulumiPythonSplunkSvnTcp/IpTerraformUnix
Reposted 4 Days AgoSaved
Remote
New York, NY
Mid level
Mid level
Healthtech • Software
Maintain reliability, performance, and scalability of cloud-hosted services and databases. Implement SRE best practices, define SLIs/SLOs, respond to incidents, build monitoring and automation, perform DBA tasks (backups, restores, tuning), support CI/CD and DB migrations, and document runbooks and procedures.
Top Skills: Amazon RdsAzure Sql DatabaseBashEcs FargateFlywayGitlabJenkinsKubernetesLiquibaseOctopus DeployOraclePostgresPowershellPythonRedisSolarwinds DpaSQL Server
Reposted 4 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Logistics • Software • Transportation
Lead and mentor teams in DevOps and SRE, architect scalable Azure Cloud infrastructure, implement CI/CD and IaC, ensure database reliability, and drive cross-functional collaboration.
Top Skills: Azure CloudAzure DevopsCi/CdCosmosdbDockerElkGrafanaKubernetesMySQLPostgresPrometheusRedisSQL ServerTerraform
Reposted 4 Days AgoSaved
In-Office or Remote
New York, NY
Senior level
Senior level
Software
The role involves managing compute infrastructure for decentralized applications, requiring critical thinking, documentation skills, and experience in Kubernetes and blockchain management.
Top Skills: BlockchainGitopsInfrastructure-As-CodeKubernetesProgramming Languages
Reposted 5 Days AgoSaved
Remote
New York, NY
150K-185K Annually
Mid level
150K-185K Annually
Mid level
Software
Join the SRE team to improve monitoring, alerting, observability, and reliability of Fireblocks' production systems. Triage incidents, run RCA, create runbooks and automation (Python, Lambda, shell, Ansible, ArgoCD), collaborate with R&D/support, and participate in on-call rotation.
Top Skills: AnsibleArgocdAWSAws LambdaAzureBashBitbucketC++ChefCoralogixDatadogDockerGerritGitGitlabGCPHelmJavaScriptKubernetesLinuxMySQLNew RelicNginxNode.jsPhabricatorPrometheusPuppetPythonShellSplunk
Reposted 6 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Big Data • Cloud • Information Technology
The Site Reliability Engineer at Iron Mountain will troubleshoot escalated tickets, manage Windows Server builds, perform security patching, and collaborate with customers and vendors to resolve issues and maintain systems.
Top Skills: CloudComputeHyper-Converged InfrastructureLinuxMicrosoft Endpoint Configuration ManagerNetworkNutanixPowershellRubrikStorageVirtualizationWindows Server
6 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Information Technology • Software • Cybersecurity • Automation
Design, build, and operate an agentic platform to automate vulnerability remediation and incident response while ensuring reliability in security operations.
Top Skills: DatadogGitGrafanaLinearLlmsOpentelemetryPrometheusSlack
15 Days AgoSaved
In-Office
New York, NY
147K-310K Annually
Expert/Leader
147K-310K Annually
Expert/Leader
Fintech • Financial Services
The Director of Splunk Platform Engineering & SRE owns the enterprise Splunk platform, drives incident resolution, optimizes systems, and mentors engineers, focusing on automation and performance.
Top Skills: AnsibleGitGoJavaKubernetesLinux/UnixMoogPrometheusPythonSplunk
New

Track Smarter, Apply Better.

Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.

Use For Free
Application Tracker Preview
Reposted 15 Days AgoSaved
In-Office or Remote
New York, NY
133K-148K Annually
Mid level
133K-148K Annually
Mid level
Cannabis • Consumer Web • eCommerce • Software
As a Site Reliability Engineer, you'll enhance the performance and reliability of web services by collaborating cross-departmentally, automating workflows, and improving deployment pipelines.
Top Skills: AWSCi/CdDockerElixirGoKubernetesNode.jsOpen CensusOpen MetricsOpen TracingPythonRuby
Reposted 16 Days AgoSaved
In-Office
New York, NY
120K-165K Annually
Senior level
120K-165K Annually
Senior level
Fintech • Financial Services
The SRE Application Support Engineer is responsible for ensuring operational reliability, stability, and optimizing performance of production systems, managing outages, troubleshooting issues, and developing documentation and standards for production applications.
Top Skills: AuroraAWSEc2EcsFargateGrafanaJavaKibanaLambdaPostgresPrometheusPythonS3Splunk
7 Days AgoSaved
Remote
New York, NY
180K-210K Annually
Senior level
180K-210K Annually
Senior level
Artificial Intelligence • Insurance • Software • Automation
The Staff Site Reliability Engineer will build and scale infrastructure for Assured's platform, automate delivery, enhance observability, and lead mentoring initiatives.
Top Skills: AWSKubernetesPostgresTerraform
Reposted 7 Days AgoSaved
Remote
New York, NY
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
8 Days AgoSaved
Remote
New York, NY
Mid level
Mid level
Information Technology • Software
As a DevOps/Site Reliability Engineer, you will manage cloud infrastructure, CI/CD pipelines, and improve system reliability and performance while supporting AI data pipelines.
Top Skills: AWSDatadogEc2EksGithub ActionsGoGrafanaIamKubernetesPrometheusPythonRdsS3Terraform
8 Days AgoSaved
Remote
New York, NY
120K-160K Annually
Senior level
120K-160K Annually
Senior level
Healthtech • Other • Software
The role involves managing PostgreSQL services, ensuring high availability and performance, driving incident response, automating tasks, and improving observability for a 24x7 SaaS platform.
Top Skills: AnsibleBashDatadogGrafanaHaproxyNew RelicPgbackrestPgbouncerPostgresPowershellPrometheusPythonRepmgrTerraform
Reposted 8 Days AgoSaved
Remote
New York, NY
Mid level
Mid level
Software • Analytics
The role involves automating and managing AWS infrastructure, ensuring reliability and scalability of stateful systems, and optimizing deployment processes. You'll also handle incident responses and improve operational tooling.
Top Skills: AWSKubernetesTerraformTerragrunt
Reposted 8 Days AgoSaved
Remote
New York, NY
136K-177K Annually
Senior level
136K-177K Annually
Senior level
Big Data • Machine Learning • Software • Analytics
As a Lead Site Reliability Engineer, you will drive the reliability strategy, improve system health, lead incident management, and mentor engineers for a multi-region SaaS platform.
Top Skills: ArgocdC++Ci/CdCloud PlatformsDatadogGitopsGrafanaInfrastructure As CodeJavaJavaScriptKubernetesPython
Reposted 8 Days AgoSaved
Remote
New York, NY
Junior
Junior
Computer Vision • Information Technology • Machine Learning • Natural Language Processing • Real Estate • Software
The SRE will maintain infrastructure for SaaS products on AWS, support developers, manage platform components, and handle IT tasks.
Top Skills: AWSComputer VisionIacLarge Language ModelsNlpTerraform
Reposted 8 Days AgoSaved
Remote
New York, NY
205K-270K Annually
Senior level
205K-270K Annually
Senior level
Artificial Intelligence • Other • Sales • Software
The role involves designing and advancing infrastructure for the engineering team, ensuring the reliability of Kubernetes clusters, automating operations, and building machine learning infrastructure.
Top Skills: ArgoAWSAzureCloudFormationFluxGithub ActionsGoGCPKubernetesPostgresPythonTerraform
9 Days AgoSaved
Remote
New York, NY
66K-88K Annually
Mid level
66K-88K Annually
Mid level
Cloud • Information Technology
The Site Reliability Engineer I is responsible for supporting Backblaze’s infrastructure stability by addressing customer issues, monitoring system health, and improving operational processes through documentation and automation.
Top Skills: AnsibleLinuxZabbix
9 Days AgoSaved
Remote or Hybrid
New York, NY
165K-190K Annually
Mid level
165K-190K Annually
Mid level
Artificial Intelligence • Healthtech • Information Technology • Software
As the first Site Reliability Engineer in the US, you'll ensure platform stability and oversee incident responses during PST hours, bridging infrastructure and code, while improving operability and compliance in a medical-device environment.
Top Skills: AWSElixirKubernetesTerraform
9 Days AgoSaved
Remote
New York, NY
320K-489K Annually
Expert/Leader
320K-489K Annually
Expert/Leader
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Lead the design and operation of large scale Kubernetes clusters, ensuring high availability and performance while supporting system lifecycle and reliability improvements.
Top Skills: ContainersGoKubernetesLinuxNetworkingOpenstackPerlPythonRuby
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account