AHEAD Logo

AHEAD

Principal Observability & Reliability Architect

Posted 2 Hours Ago
Be an Early Applicant
Remote
Hiring Remotely in United States
180K-240K Annually
Expert/Leader
Remote
Hiring Remotely in United States
180K-240K Annually
Expert/Leader
Lead client-facing discovery, architecture, and delivery of enterprise observability and reliability solutions. Design telemetry pipelines, monitoring, logging, tracing, alerting, and governance. Guide SRE practices (SLIs/SLOs, error budgets), tool rationalization, integrations (ITSM, CMDB), build reference architectures and playbooks, and mentor delivery teams while serving as escalation and practice leader.
The summary above was generated by AI
AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.
 
At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD. 
 
We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived. 
 
We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD. 

AHEAD is seeking a Principal Observability & Reliability Architect to join AHEAD’s Observability Practice within Intelligent Operations. This is a senior, client-facing architecture and delivery leadership role focused on helping enterprise clients improve operational visibility, service reliability, incident response, telemetry governance, and business-service visibility. The role operates as a player-coach, pairing senior advisory leadership with hands-on architecture guidance across observability platforms, telemetry pipelines, AIOps, and SRE-aligned operating models. It spans the full engagement lifecycle: pursuit solutioning, discovery, architecture design, estimation, delivery governance, escalation, and practice enablement.

Responsibilities

  • Lead client discovery, architecture workshops, and solution design across observability, telemetry, reliability, and operational intelligence initiatives.
  • Design enterprise observability architectures spanning monitoring, logging, metrics, tracing, telemetry pipelines, alerting, event correlation, service visibility, and platform integrations.
  • Define scalable standards for telemetry onboarding, naming, tagging, RBAC, service ownership, dashboards, alert governance, runbooks, and operational handoff.
  • Advise on telemetry governance, including data quality, retention, access control, sampling, cardinality, and cost optimization.
  • Lead modernization initiatives including tool rationalization, dashboard and alert rationalization, telemetry strategy, and migration from legacy monitoring platforms.
  • Guide SRE practices including SLIs, SLOs, error budgets, production readiness, and incident response maturity.
  • Design integration patterns across ITSM, CMDB, event management, and automation platforms.
  • Support pursuits by shaping solution strategy, validating scope, informing estimates, and building client-facing technical narratives.
  • Serve as a senior escalation point and provide architecture governance during delivery.
  • Build reusable reference architectures, playbooks, and accelerators while mentoring architects, consultants, and offshore teams.

Qualifications

  • 10+ years in observability, monitoring, APM, platform operations, SRE, or related enterprise technology domains, including 5+ years leading architecture and delivery strategy for enterprise observability or reliability initiatives.
  • Deep, hands-on experience designing and implementing across monitoring, logging, metrics, tracing, telemetry collection, and pipeline patterns in hybrid and multi-cloud environments.
  • Strong knowledge of telemetry governance, including routing, transformation, normalization, enrichment, retention, access control, and cost management.
  • Experience defining enterprise standards for dashboards, alerts, tagging, naming, service ownership, RBAC, and operating model adoption.
  • Strong command of incident response, event correlation, alert strategy, service health, and business-service visibility, plus applied SRE concepts including SLIs, SLOs, error budgets, and production readiness.
  • Ability to lead executive and technical workshops and translate business needs into actionable architecture and delivery plans.
  • Consulting or professional services experience with strong client-facing communication, estimation, risk management, and cross-functional leadership.

Preferred Qualifications

  • Platform experience such as Dynatrace, Splunk, Grafana, LogicMonitor, Datadog, New Relic, AppDynamics, Elastic, Prometheus, or OpenTelemetry.
  • Experience with telemetry pipeline tools such as OpenTelemetry Collector, Grafana Alloy, Fluent Bit, Kafka, Cribl, or Vector, along with familiarity with cloud, Kubernetes, CI/CD, and infrastructure as code.
  • Experience integrating with platforms such as ServiceNow, Jira Service Management, PagerDuty, Opsgenie, BigPanda, or xMatters.
  • Experience developing reusable consulting assets such as reference architectures, governance models, playbooks, POVs, and accelerators; relevant cloud, SRE, ITIL, or FinOps certifications are a plus.

The compensation range indicated in this posting reflects the On-Target Earnings (“OTE”) for this role, which includes a base salary and any applicable target bonus amount. This OTE range may vary based on the candidate’s relevant experience, qualifications, and geographic location.  
 
Why AHEAD:
 
Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.
 
We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.
 
USA Employment Benefits include: 
- Medical, Dental, and Vision Insurance 
- 401(k) 
- Paid company holidays 
- Paid time off 
- Paid parental and caregiver leave 
- Plus more! See benefits https://www.aheadbenefits.com/ for additional details. 
 
Use of AI:
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, assessing responses, or to capture recordings and create transcriptions or summaries during interviews. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans.
 
If you would like more information about how your data is processed, please refer to the Candidate Privacy Notice or contact us at [email protected]
 
You may opt-out of the review or analysis of your application and resume by AI tools by using the General Application. Please include the role you wish to apply for in the Additional Information field. You may also choose to opt-out of recording and transcription at any time, including after joining an interview.  Candidates will not be penalized for choosing to opt-out.

Similar Jobs

A Minute Ago
Easy Apply
Remote or Hybrid
USA
Easy Apply
196K-245K Annually
Senior level
196K-245K Annually
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
Lead the enterprise data platform strategy and architecture, driving Snowflake/dbt-based platform evolution, self-service Data Mesh and medallion models. Build AI-ready pipelines, RAG systems, and observability/cost frameworks while managing a central data team, supporting federated BI, and executing hands-on technical work.
Top Skills: AiopsAutomated TestingCi/CdCortexData MeshDbtGitGraph RagMatillion Data Productivity Cloud (Matillion Dpc)Medallion ArchitecturePythonRetrieval-Augmented Generation (Rag)SnowflakeSnowparkSQLStreamlit
5 Minutes Ago
Remote or Hybrid
121K-205K Annually
Senior level
121K-205K Annually
Senior level
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Design and develop FPGA systems using VHDL and vendor toolchains. Create testbenches and perform simulation with QuestaSim, perform lab verification and debugging, manage configurations with Git, provide architectural input, and mentor/lead a team of engineers on communications-related FPGA projects.
Top Skills: GitIntel QuartusLinuxMentor Graphics QuestasimVhdlWindowsXilinx Vivado
5 Minutes Ago
Remote or Hybrid
133K-226K Annually
Senior level
133K-226K Annually
Senior level
Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Lead and deliver enterprise-scale IT projects (infrastructure modernization, cloud migration, cybersecurity, compliance) across the project lifecycle. Manage schedules, EVM, budgets, vendors, resources, risks, stakeholder engagement, change management, and benefits realization. Drive agile execution and executive communications.
Top Skills: AgileCloud MigrationEarned Value Management (Evm)ExcelMicrosoft ProjectPowerPointServicenow SpmWord

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account