Synthesia Logo

Synthesia

Senior Site Reliability Enigneer

Posted Yesterday
Be an Early Applicant
Remote
Hiring Remotely in US
Senior level
Remote
Hiring Remotely in US
Senior level
Own operational excellence for cloud infrastructure: run incident management, improve reliability through automation, own a platform domain (e.g., Kubernetes, Temporal, observability), manage vendor and cost relationships, and deliver measurable reductions in incidents and costs within 12 months.
The summary above was generated by AI

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US.

As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations.

Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow.

Remote (US East Coast preferred, for timezone coverage)

About the team

Cloud Infrastructure owns the platform every Synthesia product runs on — AWS, Kubernetes, MongoDB, Temporal, our observability stack, and the vendor and cost relationships underneath them. We're a small, high-leverage team scaling toward a domain-ownership model: small groups that both build and operate the systems they're accountable for.

The role

We're hiring a dedicated SRE to take real ownership of operational excellence across Cloud Infrastructure. Today, too much critical operational knowledge — vendor relationships, cost management, and incident response — lives with one or two people. Your mission is to take genuine ownership of those domains, make them resilient to any single person, and raise the bar on how reliably we run. This is not simply a ticket-queue or keep-the-lights-on role. You'll own domains end to end: understand them deeply, operate them well, and build the automation and tooling that make them boring. We deliberately pair operational and engineering work so the role grows rather than narrows.

What you'll own

  • Incident management & operational excellence — take custody of the incident process: on-call quality, response, post-mortems, and driving down incident count, time-to-detect, and time-to-resolve.

  • Automation & reliability engineering — automate low-frequency, high-consequence operations (the certificate-renewal class of problem — rare, easy to forget, outage-causing when missed), not just the high-frequency toil. You decide what to automate based on risk and blast radius, not just time saved.

  • A platform domain — over time, deep ownership of a domain such as Temporal, observability, or Kubernetes operations, partnering with the engineers building in it.

  • Vendor & third-party management — own key external relationships and integrations (e.g. LLM API providers, third-party services), today managed manually and informally. Bring structure, automation, and bus-factor resilience.

  • FinOps — own cloud and platform cost visibility and efficiency, and the mechanics of how usage maps to billing.

What success looks like (first 12 months)

  • Critical operational knowledge is documented and shared — no single point of failure for vendor, cost, or incident response.

  • Measurable reliability gains: fewer SEV1–SEV3 incidents per quarter, faster customer-impact resolution, and a much higher share of incidents caught by monitoring before customers feel them.

  • High-risk manual processes are automated and self-documenting.

What we're looking for

  • Strong production operations experience on AWS and Kubernetes; comfortable with MongoDB and scripting/automation in Python.

  • An operations-and-reliability mindset — you take pride in systems that run quietly — paired with the instinct to engineer the problem away rather than absorb it manually.

  • Sound judgement on incidents and risk; calm and clear under pressure.

  • Influences through relationships and evidence, not escalation; comfortable owning a domain and partnering across teams.

  • Bonus: vendor/cost management exposure, Temporal, observability tooling.

How we think about this role

We don't letterbox engineers. You'll have a clear primary mission (operational excellence) but real domain ownership and the mandate to build — not a fixed lane. We expect the shape of the role to evolve as the team grows.

Synthesia New York, New York, USA Office

111 E 14th St,, New York, New York, United States, 10003

Similar Jobs

2 Hours Ago
Remote or Hybrid
245K-336K Annually
Senior level
245K-336K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
Lead and deliver large-scale AI/ML programs and the next-generation Generative AI platform. Build and grow a world-class TPM discipline, manage cross-functional delivery, mitigate technical risk, and drive execution across product, engineering, design, and data science to achieve business impact in regulated environments.
Top Skills: AgileAIAWSCloud ComputingData PlatformsDistributed ComputingDistributed SystemsGenerative AiLow-Latency SystemsMachine Learning
2 Hours Ago
In-Office or Remote
249K-373K Annually
Senior level
249K-373K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Provide physician leadership for utilization management: conduct coverage reviews, render determinations, document findings, engage in peer-to-peer discussions, collaborate with providers and internal teams, participate in clinical rounds, and ensure cost-effective, evidence-based care for members.
Top Skills: ExcelMs WordOutlook
2 Hours Ago
In-Office or Remote
73K-130K Annually
Junior
73K-130K Annually
Junior
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Perform quantitative and qualitative research and analysis to identify fraud, waste, and abuse in government health programs; support project teams with data review, ad hoc analyses, deliverable improvement, and client-ready presentations while managing multiple priorities and maintaining high accuracy.
Top Skills: ExcelPowerPointSpssSQLStataWord

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account