Datadog Logo

Datadog

Manager II, Engineering - APM Root Cause Analysis (GenAI / LLM)

Posted Yesterday
Be an Early Applicant
Easy Apply
Hybrid
New York, NY
234K-300K Annually
Senior level
Easy Apply
Hybrid
New York, NY
234K-300K Annually
Senior level
Manage the APM Root Cause Analysis team, oversee the development of automated systems for incident response, and mentor engineers to improve team performance.
The summary above was generated by AI

The APM Root Cause Analysis team’s mission is to help engineers and SREs respond rapidly and effectively to incidents affecting their production systems. During an incident, one of the first questions a responder asks is, "What is the change that caused the incident?"—and that's exactly what this team aims to answer.

To answer that question, the team is building several impactful systems:

  • A platform to ingest interesting changes from across our customer environments (Deployments, DB changes, Feature Flag changes, K8s changes, etc.)
  • A system to process past incidents in our environment and label the faulty changes that led to the incidents, enabling us to build a high quality evaluation dataset for faulty change detection
  • A system that uses LLM, ML, and statistical models to assess whether a specific change is the cause of an incident
  • A product experience to expose those faulty changes in strategic locations in the product in a way that aids incident response and reduces MTTR

As a manager, you will play an active role in shaping the roadmap for automated root cause analysis through collaboration with multiple stakeholder teams. You will have a deep and immediate impact in guiding the product through your design and engineering decisions.

At Datadog, we place value in our office culture - the relationships that it builds, the creativity it brings to the table, and the collaboration of being together. We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them.


What You’ll Do:

  • Solve challenging and ambiguous problems of automating root cause analysis through faulty change detection using latest agentic AI approaches as well as ML anomaly detection and statistical methods
  • Evaluate and benchmark the quality and real-world performance of the automated faulty change detection model
  • Lead and mentor a team of experienced software engineers, fostering their career growth while ensuring high team performance
  • Drive the technical roadmap in collaboration with your team, product managers, and design teams

Who You Are: 

  • An experienced software engineering leader with a track record of successfully delivering GenAI/ML products at scale
  • Experienced working with high scale distributed systems as well as participating in and structuring on-call processes for them
  • You are passionate about building products that solve real user problems, you are adept at formulating an opinion on the product direction and how we should structure our execution strategy
  • You have a BS/MS/PhD in a Computer Science, Engineering or related scientific field or equivalent experience

Datadog values people from all walks of life. We understand not everyone will meet all the above qualifications on day one. That's okay. If you’re passionate about technology and want to grow your skills, we encourage you to apply. 

Benefits and Growth: 

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our Internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits


Benefits and Growth listed above may vary based on the country of your employment and the nature of your employment with Datadog.

Datadog offers a competitive salary and equity package, and may include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, a 401(k) plan and match, paid time off, fitness reimbursements, and a discounted employee stock purchase plan.

The reasonably estimated yearly salary for this role at Datadog is:
$234,000$300,000 USD

About Datadog: 

Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another. Learn more about #DatadogLife on Instagram, LinkedIn, and Datadog Learning Center.

Equal Opportunity at Datadog:

Datadog is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and other characteristics protected by law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Here are our Candidate Legal Notices for your reference. 

Datadog endeavors to make our Careers Page accessible to all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please complete this form. This form is for accommodation requests only and cannot be used to inquire about the status of applications. 

Privacy and AI Guidelines:

Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice. For information on our AI policy, please visit Interviewing at Datadog AI Guidelines.

Top Skills

Distributed Systems
Genai
Kubernetes
Ml
Statistical Modeling
HQ

Datadog New York, New York, USA Office

We are located in the New York Times building and five-minute walk away from Times Square. The 42 St Port Authority Bus Terminal is right across the street, providing a highly accessible transportation network.

Similar Jobs at Datadog

17 Hours Ago
Easy Apply
Hybrid
New York, NY, USA
Easy Apply
235K-290K Annually
Expert/Leader
235K-290K Annually
Expert/Leader
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
The role involves providing employment law support across the Americas, collaborating with the People Team and managing outside counsel, while ensuring compliance with employment laws in the region.
17 Hours Ago
Easy Apply
Hybrid
New York, NY, USA
Easy Apply
187K-240K Annually
Mid level
187K-240K Annually
Mid level
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Manage and mentor engineers, collaborate on strategic roadmaps, solve complex technical issues, and contribute to code and design decisions.
Top Skills: GoKubernetesMachine LearningTypescript
17 Hours Ago
Easy Apply
Hybrid
New York, NY, USA
Easy Apply
110K-150K Annually
Mid level
110K-150K Annually
Mid level
Artificial Intelligence • Cloud • Security • Software • Cybersecurity
As a SaaS Administrator, you will manage SaaS applications, automate workflows, and ensure efficient operations, collaborating with teams to improve user experience.
Top Skills: APIsAtlassian ConfluenceAtlassian JiraGoogle WorkspaceSlackZoom

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account