Orion Innovation Logo

Orion Innovation

Senior DevOps Engineer - Observability

Sorry, this job was removed at 04:09 a.m. (EST) on Tuesday, Jun 03, 2025
In-Office or Remote
2 Locations
In-Office or Remote
2 Locations

Similar Jobs

An Hour Ago
Remote
US
Mid level
Mid level
Aerospace • Artificial Intelligence • Logistics • Machine Learning • Software • Transportation • Defense
Translate high-fidelity designs into production-quality UI components and maintain a code-based design system. Prototype interactions and data visualizations, build tooling to accelerate design-to-development workflows, iterate from user feedback and performance data, and collaborate with designers, engineers, product, and users to deliver responsive, accessible, high-density data web applications.
Top Skills: CanvasComponent LibrariesCSSDesign TokensFigmaGeospatial LibrariesMapping LibrariesReactSvgTypescriptWebgl
An Hour Ago
Remote
USA
50K-55K Annually
Entry level
50K-55K Annually
Entry level
Fintech • Mobile • Real Estate • Financial Services • PropTech
Serve as first-line member support via chat and ticketing, resolving entry-level issues (payments, loyalty), triaging and escalating complex cases, documenting interactions, collaborating with internal teams and BPO partners, staying current on product changes, and delivering patient, organized digital customer service.
An Hour Ago
Remote or Hybrid
180K-190K Annually
Senior level
180K-190K Annually
Senior level
Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Lead and scale the Dynatrace University LMS (Docebo) platform, establishing governance, SLAs, ticketing and support models, driving integrations and automation, defining KPIs and reporting, enabling stakeholders, and managing a team to ensure secure, scalable, customer-focused learning operations.
Top Skills: Docebo

Orion Innovation is a premier, award-winning, global business and technology services firm.  Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity.  We work with a wide range of clients across many industries including financial services, professional services, telecommunications and media, consumer products, automotive, industrial automation, professional sports and entertainment, life sciences, ecommerce, and education.

Senior DevOps Engineer - Observability

We are seeking a Senior DevOps Engineer focused on Observability to own and drive the observability strategy in our AWS Java-centric cloud environment. You will be responsible for designing and managing a code-driven Datadog observability platform that ensures full visibility into Java applications, Kubernetes workloads and AWS containerized infrastructure all while optimizing Datadog costs and eliminating unnecessary overhead.

This role requires a deep understanding of Datadog, Java logging and tracing, AWS observability best practices and cost efficiency techniques. The ideal candidate will work closely with SRE, DevOps and Software Engineers to standardize monitoring, alerting, metrics and tracing while ensuring cost-effective observability solutions.

As a Senior DevOps Engineer, you will set observability standards, lead automation efforts and mentor engineers ensuring all monitoring and Datadog configuration changes are implemented Infrastructure-as-Code (IaC). You will also drive cross-functional collaboration, working across engineering, infrastructure and product teams to deliver scalable and cost-effective observability outcomes.

Key Responsibilities

Observability Architecture & Strategy

  • Own and define observability standards for Java applications, infrastructure and Kubernetes workloads
  • Design and maintain Datadog observability stacks using Terraform - all configurations must be code-driven
  • Define and enforce JSON structured logging, distributed tracing and metric collection best practices for Java applications
  • Ensure all microservices have proper observability configurations for logs, traces and metrics
  • Implement service-level objectives (SLOs), SLIs, and SLAs to measure application and system reliability
  • Automate observability testing as part of CI/CD pipelines, ensuring new deployments include proper monitoring and logging

Java Application Logging, APM & Distributed Tracing

  • Enforce log filtering and retention policies to prevent excessive log ingestion and reduce Datadog costs
  • Work directly with Java developers to ensure proper logging, tracing and metrics instrumentation
  • Integrate OpenTelemetry for Java distributed tracing, capturing request flow across microservices
  • Standardize logging practices across services using Logback, Log4j or SLF4J ensuring:
    • Structured JSON logs for easy parsing
    • Correlation IDs for linking logs to traces
    • Error reporting integrates with Datadog alerts
  • Implement Datadog APM (Application Performance Monitoring) for Java-based services ensuring deep JVM observability

Datadog Cost Management & Optimization

  • Own Datadog cost governance, ensuring observability costs remain within budget
  • Monitor Datadog ingestion volumes (logs, traces, and metrics) to prevent overages
  • Implement cost-efficient log filtering, retention and sampling policies to reduce unnecessary logging costs
  • Automate Datadog usage reporting and integrate cost tracking into SRE Observability dashboards
  • Establish automated alerts for unexpected increases in Datadog usage costs and cost contributors
  • Advocate for efficient observability by reducing noisy alerts, redundant logs and low-value traces

Incident Response & Reliability Engineering

  • Lead incident response efforts, ensuring Datadog alerts are actionable and tuned to minimize noise
  • Establish RCA and use Datadog data to identify root causes of application failures
  • Work with application teams to optimize their response protocol based on Datadog insights
  • Implement Datadog Watchdog (AI anomaly detection) for proactive failure prediction

Kubernetes & AWS Observability

  • Deploy Datadog monitoring and logging for AWS-native services: EKS, EC2, Lambda, API Gateway, RDS, DynamoDB, SQS, SNS, Step Functions, VPC Flow Logs
  • Optimize Datadog Kubernetes monitoring, ensuring pod-level tracing and resource utilization tracking
  • Kubernetes events and autoscaling alerts are configured in Datadog

Leadership & Mentorship

  • Proactively drive observability initiatives across teams to ensure alignment, adoption and execution of observability goals
  • Act as technical authority on observability, mentoring engineers on Datadog best practices
  • Work closely with Java developers, helping them implement efficient logging, tracing and monitoring
  • Drive observability as a key SRE function, ensuring every new service is fully instrumented from day one
  • Develop internal training programs on Datadog, Terraform-based observability and AWS monitoring

Required Qualifications

  • 5+ years of experience in DevOps, SRE, observability, monitoring, development or software engineering roles
  • Proficient in Terraform for Infrastructure-as-Code (IaC) – Datadog must be managed as code
  • Solid in Datadog including APM, Logs, Metrics, Tracing and Security Monitoring
  • Coding, scripting and automation skills in Java, Python, Node, Bash or Go
  • Experience integrating observability into CI/CD pipelines (GitLab CI, AWS CodePipeline, GitHub)
  • Extensive experience with AWS services and their observability patterns
  • Monitoring experience (ELK, Prometheus, Grafana, OpenTelemetry, New Relic, Dynatrace, Sysdig)
  • Java development background with experience in:
    • Spring Boot, Java microservices architecture
    • Java logging frameworks (Logback, Log4j, SLF4J)
    • Java APM instrumentation with OpenTelemetry or Datadog APM
  • Proficient understanding of JVM performance tuning, GC monitoring, and thread profiling
  • Experience implementing distributed tracing for Java applications
  • Incident response and on-call experience with a proven track record of reducing MTTR

Orion is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, citizenship status, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

Candidate Privacy Policy

Orion Systems Integrators, LLC and its subsidiaries and its affiliates (collectively, “Orion,” “we” or “us”) are committed to protecting your privacy. This Candidate Privacy Policy (orioninc.com) (“Notice”) explains:

  • What information we collect during our application and recruitment process and why we collect it;
  • How we handle that information; and
  • How to access and update that information.

Your use of Orion services is governed by any applicable terms in this notice and our general Privacy Policy.


HQ

Orion Innovation Edison, New Jersey, USA Office

333 Thornall Street, 7th Floor, Edison, NJ, United States, 08837

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account