Fabric Health Jobs

Site Reliability Engineer

Sorry, this job was removed at 08:31 a.m. (EST) on Monday, Jul 20, 2026

In-Office or Remote

2 Locations

In-Office or Remote

2 Locations

Similar Jobs

Openly

Site Reliability Engineer

Yesterday

Remote

United States

115K-173K Annually

Junior

115K-173K Annually

Junior

Insurance

Build, automate, and maintain cloud infrastructure and CI/CD for Openly's insurance platform. Implement IaC, monitoring, and security best practices; lead incident response and postmortems; reduce operational toil through tooling and automation; influence architecture and deployment decisions.

Top Skills: AirflowAiven DebeziumArcgisBigQueryCircleCICloud FunctionsCloud RunCloudsqlComposerDatadogDonutFivetranGCPGcsGitGoJupyter NotebooksKafkaKubernetesNuxtPostgresPub/SubPythonRSlackSQLTailwindTerraformVuejsWebpackZoom

JPMorganChase

Site Reliability Engineer

6 Days Ago

Remote or Hybrid

OH, USA

Senior level

Financial Services

Lead SRE responsible for resiliency design reviews, mentoring, SRE best-practice adoption, building IaC and CI/CD pipelines, operating containerized services, observability and SLO-driven incident prevention, 24x7 production support, and driving AI-assisted reliability workflows with governance and auditability.

Top Skills: .NetAWSCi/CdDatadogDnsDockerDynatraceEcsGitlabGrafanaJavaJenkinsKafkaKubernetesLinuxLoad BalancingPrometheusPythonSplunkSpring BootTcp/IpTerraformTls

Cohere Health

Site Reliability Engineer

11 Days Ago

Easy Apply

Remote

United States

Easy Apply

100K-110K Annually

Mid level

100K-110K Annually

Mid level

Healthtech • Software

Operate and maintain AWS-hosted MERN applications and large-scale data workflows. Manage serverless and Spark-based pipelines, perform incident response and on-call duties, engineer automation to eliminate operational toil, ensure HIPAA/SOC2/HITRUST compliance, build observability and lead blameless post-mortems.

Top Skills: Amazon EcsAmazon EksAmazon EmrAthenaAws GlueAws LambdaAws SnsAws SqsCloudwatchEc2IamJavaScriptMernMySQLNode.jsOpentofuPysparkPythonRabbitMQTerraformTypescriptVpc

About Fabric Health

At Fabric Health, we are powering boundless care by solving healthcare’s biggest challenge: clinical capacity. We aren’t here to disrupt healthcare; we’re here to fix it. We unify the care journey from intake to treatment, using intelligent automation to remove administrative burdens and make care delivery 2-10x more efficient. Our technology empowers clinicians to move faster and focus on what matters most: the patient.

We are a mission-driven team of brilliant minds trusted by leading organizations including Intermountain Health, OSF HealthCare, SSM Health, and MUSC Health. Our vision is backed by premier investors such as Thrive Capital, GV (Google Ventures), General Catalyst, and Salesforce Ventures. We move quickly for good reason, listen deeply to solve big challenges, and build products with the same care and quality we’d want for our own loved ones.

Learn more: About Us | News & Press | LinkedIn | Careers

About the Role

As a Site Reliability Engineer, you will own and evolve the infrastructure powering healthcare experiences for millions of patients. This role bridges the gap between traditional infrastructure excellence and the future of AI-driven operations. You will act as a primary architect for our AWS and Kubernetes (EKS) environment, ensuring the platform is resilient, scalable, and compliant while exploring how agentic workflows can modernize SRE practices.

What You'll Do

As a Site Reliability Engineer, you will be a steward of Fabric’s production integrity, leading the strategy for infrastructure automation, observability, and system resilience. Your primary responsibilities include:

Infrastructure & Kubernetes Orchestration
- Designing, deploying, and maintaining production Kubernetes (EKS) clusters to ensure enterprise-grade availability for our users.
- Eliminating manual configuration by building and managing a scalable infrastructure state entirely through Terraform.
- Optimizing the AWS footprint—specifically EC2, RDS, and S3—to balance high performance with cost-efficiency and reliability.
AI-Assisted Operations & Automation
- Exploring and deploying agentic workflows for AI-assisted runbooks that automate complex operational decisions and repetitive tasks.
- Building and evolving deployment pipelines using GitHub Actions or Semaphore to ensure delivery is both rapid and safe.
- Focusing on toil reduction by developing internal tools that replace manual operational work with intelligent, autonomous systems.
Observability & Incident Management
- Driving the evolution of the observability stack in Datadog by implementing the sophisticated metrics, traces, and logs needed to meet SLOs.
- Leading incident response efforts and facilitating the blameless postmortems that help systematically reduce recovery time (MTTR).
- Defining and monitoring the SLIs and SLOs that ensure the platform consistently meets rigorous healthcare performance standards.
Compliance & Collaboration
- Ensuring every piece of infrastructure remains fully compliant with HIPAA and other critical healthcare regulatory requirements.
- Mentoring engineers across the company on reliability best practices and contributing a clinical-safety perspective to cross-functional design reviews.

Why You Might Be a Good Fit

You are a deeply proficient engineer who excels at the intersection of cloud infrastructure, automation, and system design.
You possess a meticulous approach to observability and a passion for finding the "root cause" rather than just applying a patch.
You enjoy exploring the "next frontier" of SRE, including how AI and agentic tools can make operations more efficient.
You thrive in fast-paced environments where technical rigor is balanced with pragmatism and clinical-grade safety.

This Might Not Be The Right Fit If...

You prefer working on static infrastructure rather than evolving systems through code and automation.
You are uncomfortable with the "agile" pace of tech-driven platform development or integrating AI tools into your daily workflow.
You prefer a siloed role that does not involve active participation in incident response or collaborative postmortems.

Your Qualifications

5+ years of experience in SRE, DevOps, or Platform roles managing production environments at scale.
Expert technical depth in AWS (EKS, EC2, RDS, S3) and production-grade Kubernetes management.
Proficiency with modern tooling including Terraform (IaC), Datadog (Observability), and CI/CD systems.
Deeply proficient coding and scripting skills in Python, Bash, Ruby, or Go.
Preferred experience building agentic workflows or AI-assisted tooling to drive operational efficiency.
A "rigor-first" mindset with a dedication to HIPAA-compliant, high-availability architecture.

The national pay range for this role is $135,000.00 – $160,000.00 per year. Actual compensation will be determined by factors such as the candidate's geographic market, experience, skills, and qualifications. Certain roles may also be eligible for additional compensation, including a comprehensive benefits package such as medical, dental, vision, unlimited PTO, and a 401(k) plan, stock options and bonuses. If your compensation requirement is greater than our posted range, please still consider applying; a determination can be made based on unique qualifications. Expected compensation ranges for this role may change over time.At Fabric, we believe that a diverse workforce is essential to our success. We are an equal opportunity employer and are committed to creating an inclusive environment for all employees. We do not discriminate on the basis of race, color, religion, sex, national origin, age, disability, veteran status, or any other legally protected characteristic. We actively encourage individuals from all backgrounds to apply.

Recruitment Fraud Alert: Protect Yourself

Fabric Health is aware of scammers attempting to impersonate employers. To ensure that any recruiting contact you receive is legitimate, please adhere to the following:

Verify the Domain: Official recruitment emails will only come from addresses ending in @fabrichealth.com or @gem.com. No other domain names are legitimate.
Official Interview Tools: We use Gem for our recruitment process and Google Meet for all video interviews. Google Meet is always the platform used for your first interview; you will never be sent a Zoom link to set up or conduct an initial interview. All interviews are conducted via video unless specifically stated by our team as an audio call. We never conduct interviews via chat, social media, Skype, or WhatsApp.
Zoom Usage: Zoom is utilized only for specific meetings set directly by our team for purposes outside of the standard interview process (e.g., coordination or onboarding discussions). It is never the first link you will receive from us.
Authorized Contact & Texting: Fabric will only contact you if you have submitted an application or if you are connected to a current employee who shared your information with us. We will only send text messages if you have provided explicit authorization and consent, either through your application or while communicating directly with our team. If you have not explicitly authorized us to reach out, treat any SMS or unsolicited outreach as fraudulent and do not respond.
Sensitive Data: We will never ask you for sensitive personal or financial documents (ID, banking info, SSN) during the application, interview, or candidacy stages. All sensitive data is handled through secure internal systems post-offer.
Verify the Team: You can reference LinkedIn to verify members of our recruiting team; however, please remain vigilant as scammers may create fraudulent profiles. Always cross-reference the sender's email domain with our official @fabrichealth.com address.

If you question the validity of a contact or receive a suspicious message, do not click any links. Report the issue immediately to [email protected].

Please note: The security inbox is for reporting fraudulent activity only. Do not email this address for application status updates or to share application materials, as these will not be reviewed. Applications are only accepted and reviewed if submitted through our official application portal, and no application status information will be provided via the security email.

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory