Yugabyte

Staff Site Reliability Engineer

Reposted Yesterday

Easy Apply

Remote

Hiring Remotely in United States

220K-250K Annually

Expert/Leader

Easy Apply

Remote

Hiring Remotely in United States

220K-250K Annually

Expert/Leader

Lead design, build, and operate the YugabyteDB DBaaS infrastructure. Drive architecture, automate lifecycle and maintenance, manage incidents and on-call rotations, implement security/encryption processes, and optimize reliability using SRE principles and observability.

The summary above was generated by AI

At Yugabyte, we are on a mission to become the default transactional database for enterprises building cloud-native applications. YugabyteDB is our PostgreSQL-compatible distributed database for cloud-native apps. Resilient, scalable, and flexible, it runs on any cloud and enables developers to become instantly productive using well-known APIs.We are looking for talented and driven people to join us on our ambitious mission and help us build a lasting and impactful company.The transactional database market is estimated to grow to $64B by 2025. YugabyteDB is cloud-native by design, has on-demand horizontal scalability, and supports geographical distribution of data using built-in replication. This means that we are well-positioned to meet market demand for geo-distributed, high-scale, high-performance workloads.
Join the Database Revolution at Yugabyte.
Modern applications need a cloud-native database that eliminates tradeoffs and silos. YugabyteDB retains the power and familiarity of PostgreSQL by pairing its trusted API with a precision-engineered, distributed, cloud-native architecture. Even better, it’s 100% open source. Many of the world's leading enterprises are migrating from legacy RDBMSs (like Oracle, SQL Server, and DB2) to YugabyteDB, to meet their mission-critical app demands.

YugabyteDB Aeon Staff Site Reliability Engineer

At YugabyteDB, we are on a mission to build an open source, high-performance, distributed, and fault tolerant PostgresQL compatible database for powering global, internet-scale applications. The YugabyteDB Managed team is building a Database as a Service (DBaaS) to run in major cloud providers, and be available globally.

As a Site Reliability Engineer focused on database availability and reliability you will be using your skills to operate and automate the life cycle of the YugabyteDB DBaaS. You will design and build processes that will spin up systems and the infrastructure that manages the databases using secure, reliable, scalable and highly observable methodologies. You will be using, operating, and configuring Kubernetes environments (GKE, EKS, AKS), Java frameworks, Shell scripts, Python scripts, Terraform templates and many other cloud technologies. You will participate in the on-call rotation for 12 hours a day over 7 days, every 4-5 weeks and manage incidents on the DBaaS infrastructure coordinating support for our customers. You will learn how to diagnose problems with our database and infrastructure technology and help deliver reliable service to our customers.

We are looking for a strong Staff SRE who exemplifies collaboration, teamwork, empathy and likes to lead by example. We enjoy working with people who are driven and thrive in a fast-paced startup environment, and who have a strong desire to build an internet-scale, extensible control plane with strong emphasis on simplicity and user experience.

Responsibilities

Define and drive the technical vision, architecture, and strategy for YugabyteDB’s Database-as-a-Service (DBaaS).
Lead, Design, develop, test, debug, troubleshoot, and maintain components of the DBaaS cloud infrastructure
Manage operational priorities of the DBaaS infrastructure
Establish processes for handling and leading response to incidents on databases or infrastructure
Automate and manage regular maintenance operations such as upgrades etc.
Design and build DBaaS processes for encryption, security key/password management, storage management, etc.
Utilize SRE golden signals to analyze and optimize the DBaaS system's performance and reliability strategies

Requirements

Strong software design and implementation skills in building infrastructure frameworks
15+ years of experience as a SRE and 5+ years of technical leadership experience
Experience in building and managing large-scale distributed systems
Experience building and operating data systems for production applications, including fault tolerant designs, software lifecycles, and automation of critical operations
Strong track record of Incident Response and Management in a managed service which is mission critical for its customers
Experience with:

Relational Database systems (PostgresQL preferred)
Public cloud infrastructure (AWS, GCP, and/or Azure)
Containerization tooling, theory and design (Docker, Kubernetes)
Infrastructure as Code (Terraform preferred)
Configuration Management Tooling (Ansible preferred)
Automation Scripting (Python and Bash preferred)
Monitoring systems (Prometheus preferred)
Version control systems (git preferred)
CI/CD systems (GitHub Actions preferred)

Solid understanding of Linux systems operations and troubleshooting
Willingness and ability to learn new languages and concepts

We feel strongly about equal pay for equal work, and transparency in compensation is one way to help achieve that. The cash compensation for this role is market competitive, with a range of USD 220,000-USD 250,000, inclusive of variable/incentive for some roles. As well as equity (when applicable), and benefits including health plans, retirement plans, and unlimited paid time off (PTO). The pay range for this position is a general guideline only and not a guarantee of compensation or salary. The actual pay will vary based on factors including experience, qualifications, and skill level.

Due to the Proclamation, “Restriction on Entry of Certain Nonimmigrant Workers”, which went into effect on September 21, 2025, at this time we are no longer able to sponsor new H-1B visa petitions filed after September 21, 2025 for new hires. We are still able to consider candidates who require H-1B extensions, changes of employer, or other types of work authorization.

#LI-Remote

Equal Employment Opportunity Statement:

As an equal opportunity employer, Yugabyte is committed to a diverse workforce. Employment decisions regarding recruitment and selection will be made without discrimination based on race, color, religion, national origin, gender, age, sexual orientation, physical or mental disability, genetic information or characteristic, gender identity and expression, veteran status, or other non-job related characteristics or other prohibited grounds specified in applicable federal, state and local laws.

To review Yugabyte's Privacy Policy please visit Yugabyte Privacy Notice.

Top Skills

Aks

Ansible

AWS

Azure

Bash

Docker

Eks

GCP

Git

Github Actions

Gke

Java

Kubernetes

Linux

Postgres

Prometheus

Python

Shell

Terraform

Similar Jobs

NBCUniversal

Staff Software Engineer

5 Days Ago

Remote or Hybrid

New York, NY, USA

130K-170K Annually

Senior level

130K-170K Annually

Senior level

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development

Oversee operational support of SAP BTP CPI applications, manage incidents, lead support specialists, and collaborate on architecture and governance for finance processes.

Top Skills: Abap ProxiesAemCapmCloud ConnectorCloud FoundryEdge Integration CellIdocJSONMessage QueuesOauthOdataRestSAMLSap BtpSfapiSftpSoapXML

Jellyfish

Site Reliability Engineer

5 Days Ago

Remote or Hybrid

United States

165K-235K Annually

Mid level

165K-235K Annually

Mid level

Big Data • Cloud • Productivity • Software • Database • Analytics • Automation

The Site Reliability Engineer will automate tasks, enhance platform infrastructure, improve observability, and lead incident response efforts for optimal performance.

Top Skills: AWSGrafanaHoneycombLinuxPythonTerraform

Dave Inc.

Site Reliability Engineer

Yesterday

Remote

United States

208K-330K Annually

Senior level

208K-330K Annually

Senior level

Fintech

The Staff Site Reliability Engineer role involves leading architecture, automating GCP environment, defining SLIs and SLOs, mentoring teammates, and enhancing system reliability and performance.

Top Skills: ArgocdDatadogGCPGoHelmJavaScriptKubernetesPythonTerraformTypescript

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory