Top Senior Site Reliability Engineer Jobs in NYC, NY

Reposted 2 Days AgoSaved
Remote
New York, NY
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Software
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Top Skills: AWSAws MarketplaceAzureAzure MarketplaceGCPGoogle Cloud MarketplaceGrafanaKubernetesPrometheusTerraform
Reposted 2 Days AgoSaved
Remote
New York, NY
180K-210K Annually
Senior level
180K-210K Annually
Senior level
Artificial Intelligence • Insurance • Software • Automation
The Staff Site Reliability Engineer will build and scale infrastructure for Assured's platform, automate delivery, enhance observability, and lead mentoring initiatives.
Top Skills: AWSKubernetesPostgresTerraform
Reposted 2 Days AgoSaved
Remote
New York, NY
205K-270K Annually
Senior level
205K-270K Annually
Senior level
Artificial Intelligence • Other • Sales • Software
The role involves designing and advancing infrastructure for the engineering team, ensuring the reliability of Kubernetes clusters, automating operations, and building machine learning infrastructure.
Top Skills: ArgoAWSAzureCloudFormationFluxGithub ActionsGoGCPKubernetesPostgresPythonTerraform
Reposted 2 Days AgoSaved
Remote
New York, NY
136K-177K Annually
Senior level
136K-177K Annually
Senior level
Big Data • Machine Learning • Software • Analytics
As a Lead Site Reliability Engineer, you will drive the reliability strategy, improve system health, lead incident management, and mentor engineers for a multi-region SaaS platform.
Top Skills: ArgocdC++Ci/CdCloud PlatformsDatadogGitopsGrafanaInfrastructure As CodeJavaJavaScriptKubernetesPython
3 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Agency • Information Technology
Lead SRE role designing and maintaining CI/CD pipelines (GitHub Actions), containerized deployments (Docker, Kubernetes, AKS, Helm), web/mobile app releases, observability, automated testing, and DevOps best practices across cloud environments with cross-functional collaboration and regulatory compliance.
Top Skills: AksAndroidAzure Application InsightsAzure Log AnalyticsAzure MonitorBashBranchingDockerDocker ComposeGitGit HooksGithub ActionsGoogle PlayHelmHerokuiOSIos App StoreJavaKubernetesNpmPowershellPull RequestsPythonSonarqubeVeracodeVercel
12 Days AgoSaved
Hybrid
New York, NY
95K-125K Annually
Mid level
95K-125K Annually
Mid level
Artificial Intelligence • eCommerce • Retail • Software
Build and maintain CI/CD pipelines, manage and automate cloud infrastructure and configurations, implement monitoring/logging and alerting for reliability, enforce security and compliance practices, and collaborate with development teams to support scaling and operations.
Top Skills: Soc ISoc Ii
13 Days AgoSaved
Hybrid
New York, NY
Mid level
Mid level
Cryptocurrency
Own production reliability, availability, and performance for cloud-native systems. Operate and scale Kubernetes (EKS) clusters, manage AWS infrastructure, implement IaC with Terraform and Helm, improve CI/CD, build observability with Prometheus/Grafana/EFK, lead incident response and RCA, participate in on-call rotations, and support security and compliance.
Top Skills: AirflowAws BatchAws Ec2Aws LambdaAws OrganizationsBashClickhouseCloudwatchDatabricksDockerDynamoDBEfk (ElasticsearchEksElasticacheEmrFluentdGitlab Ci/CdGitopsGrafanaHelmHpaKafkaKarpenterKedaKibana)KubernetesLoad BalancingNatPostgresPrometheusPythonRdsRedisS3SnowflakeSparkSqsTerraformTlsVpcVpn
Reposted 13 Days AgoSaved
In-Office or Remote
New York, NY
Senior level
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
The Site Reliability Engineer will develop, deploy, and operate AI infrastructure, focusing on high-performance and scalable machine learning systems using Kubernetes and cloud platforms.
Top Skills: AWSAzureC++GCPGoKubernetesOci
Reposted 4 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Digital Media • Social Media • Software • Sports
Lead the technical architecture and execution of migration to AWS, drive developer enablement, and automate infrastructure using code-first principles.
Top Skills: Aws EksDatadogGithub ActionsGoIstioK6KubernetesNode.jsTerraform
Reposted 4 Days AgoSaved
Remote
New York, NY
156K-288K Annually
Mid level
156K-288K Annually
Mid level
Computer Vision • Machine Learning • Software
As a Site Reliability Engineer, ensure the reliability, performance, and scalability of Ditto's cloud infrastructure by developing observability solutions, leading incident management, and collaborating with product engineering teams.
Top Skills: AWSAzureCDatadogGCPGoGrafanaHelmJavaKubernetesPrometheusRustTerraform
5 Days AgoSaved
Remote or Hybrid
New York, NY
150K-225K Annually
Senior level
150K-225K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Natural Language Processing • Business Intelligence
Lead architecture and implementation of reliability platforms and SRE practices for a production SaaS. Build self-service reliability tooling, drive AIOps automation, advance observability (monitoring, tracing, profiling), lead incident response and postmortems, mentor engineers, and embed production readiness across teams to achieve 99.99% uptime.
Top Skills: AWSAzureContinuous ProfilingDatadogDnsElkGCPGoGrafanaHttp/SKubernetesLoad BalancingOpentelemetryPrometheusPythonTcp/Ip
Reposted 6 Days AgoSaved
Remote
New York, NY
Mid level
Mid level
Other
As a Site Reliability Engineer, you will design cloud platforms, automate operations, maintain infrastructure, and support engineering teams in delivering reliable services.
Top Skills: AnsibleAWSAzureBashCircleCICloudFormationDatadogDnsDockerGitlab CiGoGCPGrafanaHTTPHttpsJenkinsKubernetesKvmLinuxPerlPrometheusPythonRubyTcp/IpTerraformUnixVMware
New

Cut your apply time in half.

Use ourAI Assistantto automatically fill your job applications.

Use For Free
Application Tracker Preview
Reposted 6 Days AgoSaved
Remote
New York, NY
120K-160K Annually
Senior level
120K-160K Annually
Senior level
Healthtech • Other • Software
As a Senior Database Site Reliability Engineer, you'll design, implement, and maintain PostgreSQL systems, ensure reliability, automate maintenance tasks, and participate in incident response.
Top Skills: AnsibleBashDatadogGrafanaNew RelicPostgresPowershellPrometheusPythonTerraform
Reposted 6 Days AgoSaved
Remote
New York, NY
114K-148K Annually
Senior level
114K-148K Annually
Senior level
Software • Financial Services
Ensure platform reliability, performance, and availability by implementing observability, automating infrastructure, participating in on-call rotations and post-mortems, partnering with Product and Engineering, designing scalable architectures, mentoring teammates, and integrating Dynatrace with Azure DevOps and Jira while supporting compliance (SOC/FedRAMP).
Top Skills: .NetAksAlpineAnsibleAppinsightsArm TemplatesAWSAzure DevopsBashBicepC#ChefCloudFormationDatadogDebianDynatraceEksGCPGitGitGksGrafanaHelmJIRAKubernetesLog AnalyticsAzureNew RelicOnestream SoftwareOpenshiftPowershellPowershell DscPrometheusPuppetPythonRest ApisSQLTerraformUbuntu
Reposted 6 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Fintech • Information Technology
As a Site Reliability Engineer at Alpaca, you will ensure system reliability and performance, troubleshoot issues, and collaborate with teams to design scalable features.
Top Skills: GoGormLinuxPgxPostgresPrometheusSqlc
Reposted 6 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Gaming • Software
The Site Reliability Engineer will manage infrastructure stability and scalability, lead cloud migrations, and optimize performance across systems while mentoring team members.
Top Skills: AnsibleAWSAzureBashChefCloudFormationDatadogDockerElk StackGCPGoGrafanaKubernetesPrometheusPuppetPythonTerraformUnix/Linux
6 Days AgoSaved
Remote
New York, NY
150K-210K Annually
Senior level
150K-210K Annually
Senior level
Artificial Intelligence • Cloud • Information Technology • Software • Big Data Analytics
Founding Staff SRE for Volcano: define SLOs/error budgets, architect multi-region Kubernetes infrastructure, build GitOps/CI-CD with ArgoCD/Helm/Terraform, scale managed Postgres/Redis/object storage, implement observability with Datadog/Prometheus/Grafana, lead incident response and SRE culture, and mentor cross-functional teams.
Top Skills: ArgocdCanary DeploymentsCi/CdCniDatadogGitopsGrafanaHelmIngressKubernetesObject StoragePostgresPrometheusRedisService MeshTerraformTerragrunt
Reposted 6 Days AgoSaved
Remote
New York, NY
175K-275K Annually
Mid level
175K-275K Annually
Mid level
Software
As a Site Reliability Engineer, you'll enhance system reliability, collaborate on production readiness, define SLIs/SLOs, and improve incident response.
Top Skills: AWSDatadogGrafanaKubernetesOpentelemetryPrometheusTypescript
Reposted 16 Days AgoSaved
Hybrid
New York, NY
30K-120K Annually
Senior level
30K-120K Annually
Senior level
Information Technology • Automation
The SRE/Infrastructure Engineer will architect and manage secure, scalable systems for automated penetration testing, optimizing reliability, and enhancing infrastructure based on customer demand. Responsibilities include maintaining production environments, leading technical discussions, and promoting high coding standards.
Top Skills: AWSAzureCloudFormationElkGCPNew RelicOpentelemetryPostgresPrometheusTerraform
Reposted 8 Days AgoSaved
Remote
New York, NY
200K-250K Annually
Senior level
200K-250K Annually
Senior level
Software • Cryptocurrency
Manage and scale Kubernetes clusters, automate infrastructure, optimize performance, maintain blockchain nodes, and improve system reliability while collaborating with product teams.
Top Skills: Aws (Ec2Aws EksDatadogDockerIam)KubernetesOpentelemetryPulumiRdsS3Terraform
9 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Database
Embed with service teams to define SLIs/SLOs and error budgets, run Operational Readiness Reviews, improve incident-to-improvement pipelines, advise on resilience and architecture, reduce operational toil through automation, and shape org-wide on-call practices and operational maturity.
Top Skills: AWSCdkGrafanaKubernetesOpentelemetryPostgresPulumiTerraformVictoriametrics
9 Days AgoSaved
Remote
New York, NY
Senior level
Senior level
Energy • Manufacturing • Solar • Renewable Energy
Operate and harden production EKS Kubernetes clusters across multiple AWS regions. Build IaC (Terraform, Ansible), implement policy-as-code, ensure security and compliance, manage observability (Prometheus/Grafana), perform L3 support and incident RCA, run platform-level testing and DR, automate toil, and partner with application teams for sizing and cost optimization to achieve high availability for critical cloud infrastructure.
Top Skills: AlbAnsibleArgocdAws Ec2Certificate ManagementDatadogDynatraceEksFluxGoGrafanaKubernetesMskPod PriorityPrometheusPythonRdsS3Service MeshSplunkTerraformVpc
Reposted 9 Days AgoSaved
Remote
New York, NY
100K-110K Annually
Mid level
100K-110K Annually
Mid level
Healthtech • Software
The SRE Technical Project Manager will lead project delivery, incident management, automation processes, and uptime communication, partnering with SRE and development teams to ensure system stability and scalability.
Top Skills: Ai BotsDatadogJIRAJira Service ManagementMs TeamsOpsgeniePagerduty
Reposted 10 Days AgoSaved
Remote
New York, NY
110K-140K Annually
Senior level
110K-140K Annually
Senior level
Real Estate • Financial Services • PropTech
Support and optimize products migrated to AWS, implement cloud best practices, maintain operational coverage, enhance automation, observability, CI/CD/GitOps, and security. Collaborate with development and platform teams to scale, troubleshoot, and ensure reliable SaaS operations.
Top Skills: AmisArgocdAWSAws Elastic BeanstalkAws Transfer FamilyAzure DevopsBashCloudwatchCurlDockerEc2EksFluxcdGitGitopsHTTPIstioKubernetesLinkerdLoad BalancerPowershellPythonRdsSQLTerraformWget
Reposted 19 Days AgoSaved
In-Office
New York, NY
194K-267K Annually
Senior level
194K-267K Annually
Senior level
Cloud
The Site Reliability Engineer will manage Kubernetes platforms, optimize AWS cloud infrastructure, ensure high availability, and automate deployment while handling troubleshooting and security compliance.
Top Skills: AWSBashCi/CdCloudwatchElk StackGoGrafanaHelmIstioKubernetesPrometheusPythonTerraform
All Filters
JobType
New Jobs
Job Category
Experience
Industry
Company Name
Company Size

Sign up now Access later

Create Free Account