Zapier

Sr. Program Manager, Incident Management

Posted 2 Hours Ago

Be an Early Applicant

Remote

Hiring Remotely in United States

174K-261K Annually

Senior level

Remote

Hiring Remotely in United States

174K-261K Annually

Senior level

The Sr. Program Manager will lead incident management for Zapier's Product and Engineering, focusing on response and post-incident processes, integrating AI to enhance workflows and collaboration across teams.

The summary above was generated by AI

AI at Zapier

At Zapier, we build and use automation every day to make work more efficient, creative, and human. So if you’re using AI tools while applying here - that’s great! We just ask that you use them responsibly and transparently.

Check out our guidance on How to Collaborate with AI During Zapier’s Hiring Process, including how to use AI tools like ChatGPT, Claude, Gemini, or others during our hiring process - and when not to.

Job Posted: April 9th, 2026

Location: Americas - North, Central and South America

As Zapier expands into the enterprise market, operational rigor matters more than ever. The Sr. Program Manager will own the end-to-end incident management program for Zapier's Product and Engineering organization: response, post-incident learning and actions, and everything in between. You'll report to the Director of Engineering for Internal Platforms & Infrastructure and be the DRI for the program's design, execution, and outcomes. You build the program and leverage AI to scale its impact.

We need someone with deep incident management expertise who's comfortable navigating ambiguity and stretching across engineering, support, security, and GTM. You have a thesis on where AI-enabled incident management is going and you'll lead us there. Zapier's product surface is expanding rapidly and with it, the complexity and stakes of incident management. This role grows with that complexity.

About You

You have deep incident management experience and you've moved beyond just executing it. You've built and led incident response programs, post-incident processes, SRE practices, or reliability-focused work. You know incident management deeply enough to rethink it, not just replicate it. You've ideally done 0-to-1 work in this space: stood up programs, defined standards, trained responders.
You re-engineer how work happens based on where AI is headed. You've created repeatable systems (workflows, agents, copilots, or automation) that fundamentally changed how work gets done. You use AI-native tools (Cursor, Claude Code, or similar) as your default, and orchestrate them into durable capabilities that compound over time. You have a forward-looking thesis on how AI will reshape your domain and you've already acted on it: stopping legacy work, redesigning processes around what AI makes possible, and redefining what the role itself looks like. You can quantify the impact on velocity, quality, or organizational capacity. You iterate, refine, and critically evaluate AI outputs, embedding quality standards and accountability into the systems you build, not just the outputs.
You're a builder, not a specialist. You have deep expertise in incident management, but you're not rigidly attached to how you've done it before. You can stretch into adjacent areas (reliability strategy, enterprise readiness, operational tooling) as the role evolves. A year from now, parts of this role may look very different, and you'll be the one driving that change. You build durable systems that work without you: processes that continue when you're on PTO or move to other work. You're energized by creating, not just maintaining.
You bring an upstream, systems mindset. You instinctively look for root causes and design solutions that scale beyond your immediate program. You understand how the full incident lifecycle (prevention, detection, response, learning) supports customer trust and enterprise readiness.
You influence without authority. You shape outcomes by building trust. You know how to build coalitions across engineering, support, security, GTM, and leadership. You lead change and not just implement it, you make it stick. You anticipate resistance, adapt your approach, and help others adopt new ways of working.
You have technical empathy. You can go toe-to-toe with engineers, support leads, and product leaders to clarify the "why" behind technical tradeoffs and incident decisions. You understand the role of observability (logs, metrics, traces), SLOs, and thresholds in incident response and prevention even if you're not the one implementing them.
You bias for velocity and clarity. You act decisively even in high ambiguity. When priorities collide, you clarify, decide, and help the org move forward. You communicate with relentless clarity: context and intent early, often, and candidly especially when it's uncomfortable.
You're analytical and hands-on with data. You can work directly with data tools (e.g., Databricks, SQL) to build rich reporting and meaningful insights. You understand incident tooling (incident.io or similar) and how it integrates with Slack, PagerDuty, and on-call workflows.
You work well remotely. Zapier is 100% remote. You communicate proactively, write clearly, and know when async works and when to jump on a call.

Things You'll Do

Own the incident program. Lead the design, evolution, and governance of incident processes across the Build organization both response and post-incident processes. Ensure workflows are consistent, auditable, and aligned with enterprise expectations. You are the DRI for incident management as a program.
Build AI-powered incident systems. Design and ship repeatable AI tools: automated incident summarization, intelligent severity classification, AI-assisted root cause analysis, postmortem draft generation, and more. Turn one-off AI experiments into durable workflows that compound over time.
Accelerate decisions. Create clarity in ambiguity, align stakeholders, and drive decisions across teams and zones. Serve as the point of contact for questions related to incident process, expectations, and best practices.
Surface and resolve systemic issues. Identify recurring org friction, drive root-cause solutions, and implement fixes that persist beyond individual incidents.
Build and maintain reporting. Build, maintain, and refine dashboards and reports using Databricks, Looker, and related tools. Translate data into actionable insight: identify trends, risks, weak signals, and hotspots. Communicate findings to the right audiences.
Raise the bar. Instill rigor and accountability. Coach responders and incident roles (Incident Commander, Support Leads, and new roles as they emerge). Produce and maintain clear documentation (playbooks, templates, guides) and deliver training for all incident roles and stakeholder groups.
Partner cross-functionally. Collaborate with engineering leads, EMs, product, support, security, GTM, and leadership to strengthen practices. Share clear insights, align expectations, and help teams act on opportunities for improvement. Your day-to-day counterparts are senior engineering leaders and engineering line managers.
Step in when needed. Step into incident response roles during business hours as appropriate to experience the work firsthand and inform program improvements. Facilitate retrospectives and go through the process for select incidents to help inspect and up-level the process.

Our Stack & Tools

Incident tooling: incident.io, PagerDuty, Slack, Zendesk
Data & Reporting: Databricks, Grafana, Looker
Observability context: Datadog, Grafana, Prometheus, Opensearch
Infra context: AWS, Kubernetes, Terraform (with SRE/Platform partners)
Collaboration: GitLab, Coda, Google Workspace

What Success Looks Like

The incident program is dependable and normalized. It's part of Zapier's operating rhythm. You own program direction and ensure day-to-day execution aligns with enterprise expectations across the full incident lifecycle.
Internal teams feel supported. Processes, communication, and tools reduce friction and meet the needs of engineering, support, and GTM partners. Stakeholder feedback is incorporated pragmatically.
Workflows run consistently with low friction. They're easy to follow, easy to learn, and allow people to focus their energy where it counts.
Systemic improvements persist. You elevate technical and program management rigor beyond individual incidents. The systems you build continue to work when you're not there.
Data quality is rich and trusted. Reports and insights help leadership understand trends, systemic risks, and improvement opportunities.
Outcomes improve measurably. Reduced incident frequency, faster time-to-resolution, higher stakeholder confidence, operational maturity increasing across engineering.
You're a force multiplier. The org has fewer blockers and more velocity than you found it.

Application Deadline:

The anticipated application window is 30 days from the date job is posted, unless the number of applicants requires it to close sooner or later, or if the position is filled.

Even though we’re an all-remote company, we still need to be thoughtful about where we have Zapiens working. Check out this resource for a list of countries where we currently cannot have Zapiens permanently working.

Top Skills

Ai Tools

AWS

Coda

Databricks

Datadog

Gitlab

Google Workspace

Grafana

Incident.Io

Kubernetes

Looker

Opensearch

Pagerduty

Prometheus

Slack

Terraform

Similar Jobs at Zapier

Zapier

GTM at Zapier

Yesterday

Remote

United States

Entry level

Artificial Intelligence • Productivity • Software • Automation

Join Zapier's GTM team, contributing to sales and automation strategies while leveraging AI tools to build playbooks and foster growth.

Top Skills: AIAutomation

Zapier

Solutions Architect

Yesterday

Remote

United States

213K-300K Annually

Senior level

213K-300K Annually

Senior level

Artificial Intelligence • Productivity • Software • Automation

As a Pre-Sales Solutions Architect, you'll partner with sales to develop strategies, educate customers, and create integration solutions while driving revenue and collaborating with cross-functional teams.

Top Skills: Ai ToolsAutomationIntegration ArchitecturesZapier

Zapier

Automation Strategist (Customer Success)

4 Days Ago

In-Office or Remote

United States

119K-238K Annually

Senior level

119K-238K Annually

Senior level

Artificial Intelligence • Productivity • Software • Automation

The Automation Strategist will guide customers in automating processes, help identify use cases, and promote AI-enabled transformation, focusing on value delivery and relationship building.

Top Skills: AIAutomation

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory