The Senior Site Reliability Engineer will enhance cloud infrastructure, enforce SRE best practices, manage scalable systems, and mentor junior team members.
< Remote - United States >
Job Description:
Stability AI’s Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure. The person will closely work with engineering, IT, security, and product teams to drive innovation and reliability in an evolving environment. Candidates should have the initiative to build and improve a maturing cloud landscape.
- Developing and enforcing SRE best practices and standards across the organization.
- Architecting and managing scalable systems in AWS and other cloud environments, focusing on high availability and resilience.
- Implementing and maintaining infrastructure as code using Terraform.
- Setting up and refining monitoring, logging, and alerting systems.
- Driving incident management and root cause analysis to improve system reliability.
- Championing SRE principles and mentoring junior team members.
- Collaborating with development teams to enhance CI/CD pipelines.
- Experience scaling resource intensive systems, be it storage, networking, or compute.
- Knowledge and experience with Kubernetes or other container scaling solutions
- Background in software development or automation scripting.
- Knowledge and experience with Grafana, ELK stack, or similar tools.
- Cloud security experience.
Equal Employment Opportunity:
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.
Top Skills
AWS
Elk Stack
Grafana
Kubernetes
Terraform
Similar Jobs
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The role involves supporting network infrastructure, automating cloud services, deploying Kubernetes, managing CI/CD workflows, and ensuring cloud security best practices.
Top Skills:
AnsibleAWSBashChefDockerGitGoKubernetesPuppetPythonRubySaltTerraform
Big Data • Cloud • Software • Database
Manage continuous delivery infrastructure for reliable code deployment. Collaborate with teams to streamline onboarding, support deployment systems, and participate in on-call rotations.
Top Skills:
Argo WorkflowsArgocdAWSAzureGoGoogle Cloud PlatformKubernetesPython
AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.
Top Skills:
ArgocdCi/CdDockerGitopsGoGrafanaHoneycombJenkinsKubernetesOpentelemetryPrometheusPythonTerraform
What you need to know about the NYC Tech Scene
As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.
Key Facts About NYC Tech
- Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
- Key Industries: Artificial intelligence, Fintech
- Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
- Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

.png)

