Traversal Logo

Traversal

AI Engineer - Infrastructure

Posted 19 Days Ago
Easy Apply
In-Office
New York, NY
150K-300K Annually
Mid level
Easy Apply
In-Office
New York, NY
150K-300K Annually
Mid level
As an AI Infrastructure Engineer, you will design and operate systems for AI products, focusing on high-concurrency inference and data pipelines using technologies like Python, Rust, and Kubernetes, ensuring reliability under load.
The summary above was generated by AI
About Traversal

Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—already trusted by some of the largest companies in the world to troubleshoot, remediate, and even prevent the most complex production incidents. Our mission is to free engineers from endless firefighting and enable them to focus on creative, high-impact work. 

Our roots remain deeply embedded in AI research, and we’re channeling that scientific rigor and creativity into building the premier AI agent lab for the enterprise. Hence, what we’re proudest of is assembling the most talented yet nicest group of individuals, including researchers from MIT, Harvard, and Berkeley, to world-class engineers from industry: Citadel Securities, Cockroach Labs, Datadog, DE Shaw, ServiceNow, Glean, Perplexity, Pinecone, and more, to take on one of the hardest problems for AI to solve. Without the entire team, none of this would be possible.

The Role

As an AI Infrastructure Engineer on the Platform / Reliability team, you’ll design, secure, and operate the core systems that power Traversal’s AI products. We already serve Fortune 50 enterprises with multi-tenancy and SOC 2 Type II controls, and we’re rapidly scaling.

You’ll focus on high-concurrency inference, Kafka data pipelines, and agentic tooling (via MCP) — building infrastructure that’s reliable under extreme load. This includes safe concurrency, graceful retries, queue management, autoscaling, observability, and Kubernetes-native scheduling.

This is a senior, high-impact role: you’ll own foundational systems, work across Python, Rust, Kubernetes, and Kafka, and shape how enterprise AI reliability is built and scaled.

Responsibilities
  • System Design & Architecture: Design scalable, reliable infrastructure for AI inference, data pipelines, and agentic workflows.
  • Queue & Job Scheduling (K8s-native): Migrate from Python multiprocessing + Postgres-as-queue to Kubernetes-native queuing and orchestration (KEDA/HPA, Jobs/CronJobs, Kueue/Argo).
  • Managed Kafka Operations: Tune partitioning and throughput, design DLQ + replay runbooks, implement idempotent sinks to avoid duplicates
  • Autoscaling: Scale on real signals (queue lag, in-flight requests, latency); add burst capacity and safe drains
  • Per-Tool Reliability: Productionize MCP toolchains with circuit breaking, timeouts, sandboxing, and audit
  • Progressive Delivery: Implement canary and blue/green rollouts for stateful services, pre-warm caches/weights, and enable graceful termination
  • Observability: Build RED/USE dashboards and OpenTelemetry traces across gateway → agent → tool → Kafka → sinks
  • Infrastructure as Code: Evolve Terraform/Helm/Kustomize for multi-environment deployments, secrets, policy-as-code (OPA/Rego), and workload identity
Requirements
  • 3+ years of experience at technically rigorous companies or teams
  • Proven experience operating high-concurrency backends with managed Kafka fan-in/out and at-least-once processing
  • Experience designing idempotent systems (outbox, dedupe keys, safe replay)
  • Production experience building and maintaining systems in Python and Rust (Rust 2024)
  • Incident response, chaos testing, capacity planning
  • Familiarity with AWS, EKS, Terraform, Helm/Kustomize
  • Strong debugging skills across runtime, Kafka, network, and auth layers
  • Security-minded, with experience implementing least privilege, default-deny egress, auditability, and policy-as-code

Nice to Have

  • GPU workload operations (MIG, topology-aware placement), inference servers, token streaming gateways
  • Data governance (PII discovery/redaction), lineage, tokenization
  • Cross-region active/active for Kafka and stateless services
  • Service mesh (Envoy/Istio), Cilium/eBPF, ClickHouse for analytics
Compensation

We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.

Why You Should Join Us

We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team.

Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.

Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.

Top Skills

AWS
Eks
Helm
Kafka
Kubernetes
Kustomize
Opentelemetry
Python
Rust
Terraform
HQ

Traversal New York, New York, USA Office

New York, New York, United States

Similar Jobs

11 Days Ago
Easy Apply
In-Office
2 Locations
Easy Apply
300K-500K Annually
Mid level
300K-500K Annually
Mid level
Software • Generative AI
Design and develop scalable backend infrastructure for Fireworks AI's generative AI platform, focusing on reliability, performance, and model quality optimization.
Top Skills: GoKubernetesMlflowPythonPyTorchSagemakerVertex Ai
15 Days Ago
In-Office
New York, NY, USA
135K-280K Annually
Senior level
135K-280K Annually
Senior level
Artificial Intelligence • Software • Automation
The software engineer will build and operate infrastructure for AI agents, ensuring reliability, scalability, and security. Responsibilities include managing LLM services, developing storage systems, and enhancing observability while collaborating cross-functionally.
Top Skills: AnthropicAWSBuildkiteCircleCICloudwatchDatadogGoKarpenterKubernetesMezmoOpenaiPgbouncerPgvectorPineconePostgresPythonRedisSnowflakeWeaviate
11 Days Ago
Easy Apply
In-Office
3 Locations
Easy Apply
179K-311K Annually
Mid level
179K-311K Annually
Mid level
Artificial Intelligence • Big Data • Machine Learning
The role involves designing and maintaining ML infrastructure systems for compute allocation and scheduling, improving reliability of workloads, and collaborating with ML engineers to enhance system efficiency.
Top Skills: DockerGoKubernetesPythonRayRustSlurmTerraform

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account