Optimal Dynamics Logo

Optimal Dynamics

Staff Software Engineer, Site Reliability (SRE)

Reposted 3 Days Ago
Easy Apply
Remote
Hiring Remotely in USA
170K-190K Annually
Senior level
Easy Apply
Remote
Hiring Remotely in USA
170K-190K Annually
Senior level
Lead reliability initiatives for the production platform, manage incident response, define SLIs/SLOs, and enhance security by embedding it into delivery pipelines. Drive platform improvements in AWS and CI/CD processes.
The summary above was generated by AI

About Our Company

Built on over four decades of pioneering research at Princeton University, our platform represents the leading edge of innovation in freight and transportation planning. We help customers unlock double-digit revenue gains and drive smarter, data-driven operations at scale.
With the recent close of our Series C funding round led by Koch Disruptive Technologies, we’re entering an exciting new phase of growth. Today, Optimal Dynamics is a high-growth company of ~70 employees, backed by top-tier investors including Bessemer Venture Partners, The Westly Group, Activate Capital, and Koch. 

We're on a mission to redefine the way logistics decisions are made—and we’re just getting started.

About Our Team

We are a team of bright, kind, and solution-oriented people focused on creating value for our customers. We can solve problems individually, but understand that the best solutions are found when the team brainstorms ideas together. We are excited about balancing the need to deploy new solutions quickly and designing solutions that are secured, reliable, maintainable, and scalable for the long run.

About the Role 

We’re hiring a Staff Software Engineer, Site Reliability to lead reliability across our production platform. As a Staff‑level Individual contributor, you will drive strategy and hands‑on execution across incident response, SLO/SLI programs, and production readiness, directly owning highly available services in AWS; all while partnering with Platform/Infra to build paved‑road tooling in our monorepo.

This is a full‑time, remote‑friendly role open to candidates across the United States. For those who prefer an in‑office experience, our HQ in New York City offers a collaborative environment.

What You’ll Do

Reliability (≈50%)

  • Own the company‑wide incident lifecycle: standards for detection, escalation, incident command, customer comms, and high‑quality postmortems with action tracking.
  • Define and drive SLIs/SLOs for core services; build guardrails and dashboards that make reliability visible and actionable.
  • Lead production readiness reviews, capacity/performance planning, load testing, disaster recovery exercises, and resilience engineering (failure testing/chaos where appropriate).
  • Level‑up on‑call: right‑sizing rotations, paging hygiene, runbooks, auto‑remediation, and continuous improvement of MTTA/MTTR.

Security (≈30%)

  • Embed security into the delivery pipeline: dependency and image scanning, least‑privilege/IAM baselines, secrets management, and service‑to‑service auth.
  • Partner with Engineering leadership to maintain SOC 2‑aligned controls as code; make audit‑friendly evidence generation part of everyday engineering.
  • Drive secure‑by‑default patterns in the platform (e.g., network posture, data protection, runtime policies) without slowing down developers.

Platform & DevEx (≈20%)

  • Build and evolve paved roads for deploys, config, and runtime operations in our monorepo (Bazel) and CI/CD (AWS CodePipeline/CodeBuild).
  • Partner with product teams to make the “secure, reliable default” the easiest path—templates, tooling, libraries, and automation.
  • Improve observability end‑to‑end (traces, logs, metrics, alerts).
Who You Are
  • Experienced: Individual contributor who has led reliability programs at a meaningful scale and owned incident response standards.
  • Technically Grounded: Deep, hands-on experience with infrastructure at scale, cloud, containerization, and more:
    • AWS (multi‑service)
    • ECS and/or Kubernetes containerization workloads 
    • CICD & IaC (Terraform) 
    • Production Networking/Fundamentals
  • Python Proficient: You can read/review service code and land operational improvements.
  • Data Driven: In your approach to SLOs, capacity, performance, and cost efficiency with strong observability chops
  • Influential: Able to shape direction and create simple, durable standards
  • Communicative: Excels in both technical and interpersonal communication, with strong written and verbal skills
Nice To Have (Bonus Points)
  • Aware of FinOps (cost attribution, efficient scaling) and DR/BCP program experience.
  • Familiar with secure SDLC, threat modeling, and compliance automation in a SOC 2 context.
  • Experience collaborating with Data Science/ML teams and batch/streaming workloads.
  • Exposure to monorepo frameworks such as (bazel, buck, etc.) 

About our tech stack and development practices

At Optimal Dynamics, our entire infrastructure runs on AWS, leveraging a wide range of services including DynamoDB, Aurora, SSM, and SQS to power our intelligent logistics platform.

Our tech stack includes:

  • Backend & AI: Python 3 and Java
  • Frontend: JavaScript/TypeScript for our web-based SPA
  • Data Stack: Trino, Dagster, dbt, DuckDB, and Preset
  • IaC: Terraform and Spacelift
  • Cloud: AWS (ECS/RDS/S3/etc)
  • CI/CD: Bazel, Github, AWS CodePipeline/CodeBuild

We follow modern development best practices with all code stored on GitHub. Every pull request undergoes thorough code reviews, is fully unit tested, and deployed through our CI/CD pipeline for continuous quality assurance.

Pay Range
$170,000$190,000 USD

Benefits

  • Competitive compensation, including Series C level equity
  • Health / Dental / Vision 100% covered for employee and 50% for dependents
  • Life Insurance, with optional supplemental insurance
  • Flexible Spending Account (FSA)
  • Health Spending Account (HSA)
  • 401(k) with match
  • Unlimited PTO (vacation, personal days, sick days, jury duty, military leave, bereavement)
  • 11 Holidays
  • Paid Parental Leave for all employees
  • Short-term and Long-term Disability Insurances, and AD&D Insurance
  • Fitness membership reimbursement
  • Commuter benefits

Optimal Dynamics is proud to be an equal opportunity employer that celebrates diversity and is committed to creating an inclusive workplace with equal opportunity for all applicants and employees. Our goal is to recruit the most talented people from a diverse candidate pool regardless of race, color, ancestry, national origin, religion, disability, sex (including pregnancy), age, gender, gender identity, sexual orientation, marital status, veteran status, or any other characteristic protected by law.
Optimal Dynamics is committed to working with and providing access and reasonable accommodation to applicants. If you require an accommodation, please reach out to [email protected] once you've begun the interview process. All requests for accommodations are treated discreetly and confidentially, as practical and permitted by law.

Top Skills

Aurora
AWS
Bazel
Ci/Cd
Dagster
Dbt
Duckdb
DynamoDB
Ecs
Java
JavaScript
Kubernetes
Python
Spacelift
Sqs
Ssm
Terraform
Trino
Typescript
HQ

Optimal Dynamics New York, New York, USA Office

New York, NY, United States

Similar Jobs

2 Days Ago
Remote or Hybrid
New York, NY, USA
130K-180K Annually
Senior level
130K-180K Annually
Senior level
AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Oversee SAP BTP CPI operations, manage incidents, collaborate with teams for enhancement and deployment, ensuring system availability and performance.
Top Skills: AbapCapmCloud ConnectorCpiIdocJSONMessage QueuesOauthOdataRestSAMLSap BtpSfapiSftpSoapXML
5 Days Ago
Easy Apply
Remote
United States
Easy Apply
170K-200K Annually
Senior level
170K-200K Annually
Senior level
Cloud • Security • Software
The Site Reliability Engineer will design, automate and scale cloud infrastructure while ensuring uptime, performance, and security best practices.
Top Skills: AnsibleAWSAzureChefDockerGCPGoJavaScriptKubernetesLinuxPuppetPythonRubySaltstackTerraform
17 Days Ago
Easy Apply
Remote
United States
Easy Apply
200K-275K Annually
Senior level
200K-275K Annually
Senior level
Big Data • Fintech • Mobile • Payments • Financial Services
This role involves setting technical strategies, collaborating across teams, managing operations and availability, and fostering a culture of quality and ownership within the Site Reliability Engineering team.
Top Skills: AWSKotlinKubernetesMySQLPythonSpark

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account