Ombud Logo

Ombud

Principal Platform Engineer — Kubernetes & Cloud Infrastructure

Reposted Yesterday
Be an Early Applicant
In-Office
Denver, CO
Senior level
In-Office
Denver, CO
Senior level
Responsible for maintaining and scaling the cloud infrastructure to support enterprise AI workloads, optimizing costs, and ensuring security and reliability.
The summary above was generated by AI
  • Location: Denver, CO (hybrid — Tue/Wed/Thu in office)
  • Reports to: CEO
The role

Ombud's platform runs production AI workloads for enterprise customers, and we're scaling toward a self-service motion where customers onboard, ingest content, and operate the product without manual implementation. That requires an infrastructure foundation that can handle multi-tenant scale, high reliability, and the unique demands of generative AI workloads — without ballooning the AWS bill.

We're hiring a Principal Platform Engineer to own that foundation. This is a senior individual contributor role with broad architectural authority. You will not have direct reports. You will set the technical direction for our cloud infrastructure, partner with engineering on production scaling decisions, and operate the platform with the discipline a SOC 2 / ISO 27001 customer base requires.

What you'll own
  • Production Kubernetes (EKS) clusters: capacity planning, node group strategy, gen-AI workload isolation, blast-radius containment.
  • AWS infrastructure end-to-end: RDS, DMS, Kafka (MSK), ECR, networking, IAM, multi-region deployments (including Ireland for EU data residency).
  • Infrastructure-as-code in Terraform — modules, environments, drift management, peer review.
  • CI/CD pipelines (Jenkins, GitHub Actions, or your recommended replacement) — fast, reliable, secure builds for backend and frontend services.
  • Observability: Grafana dashboards, Prometheus metrics, log pipelines, on-call alerting, SLO definition.
  • Cost optimization. AWS spend is one of our top three variable costs. Reducing it by 20% is a tangible objective for this seat.
  • Security posture: secrets management (Consul/Vault), IAM hygiene, vulnerability patching, support for SOC 2 and ISO 27001 audit cycles.
  • Architecture leadership on the self-service infrastructure roadmap: how we onboard a customer without human intervention and scale to 10x our current tenant count.
  • Documentation and runbooks that let the rest of the engineering team operate the platform when you're unavailable.
Must-haves
  • 8+ years of platform, infrastructure, SRE, or DevOps experience, with at least 3+ years operating production Kubernetes at scale.
  • Deep AWS expertise across compute, storage, networking, data services, and IAM.
  • Production fluency with Terraform, Docker, Linux, and CI/CD systems.
  • Track record of architectural decisions that materially improved reliability, cost, or developer velocity — with specific, measurable outcomes you can point to.
  • Comfort operating as a senior IC who sets technical direction across teams without formal authority.
  • Strong written communication — runbooks, architecture decision records, post-incident reviews.
  • Willingness to be in-office Tuesday through Thursday in Denver.
Nice-to-haves
  • Production experience supporting generative AI or ML workloads (GPU node groups, vector databases, model serving).
  • Experience with Qdrant, Pinecone, Weaviate, or other vector stores in production.
  • PostgreSQL operational depth — replication, performance tuning, backup/restore.
  • Experience scaling a multi-tenant SaaS platform from ~100 customers to ~1,000.
  • SOC 2 Type II and ISO 27001 audit experience.
  • Familiarity with event-driven architectures (Kafka, Kinesis, or equivalent).
What success looks likeFirst 30 days
  • Complete a written audit of our current infrastructure: what we have, where the risks are, what's costing us money.
  • Establish on-call rotation participation and respond to your first production incident.
  • Identify the top three architectural debt items.
First 60 days
  • Deliver first architectural recommendation with implementation plan — typically cost optimization or scaling bottleneck.
  • Refresh and own the observability stack.
  • Document the production runbook for the rest of the engineering team.
First 90 days
  • Ship a measurable improvement: cost reduction, reliability uplift, deployment velocity, or scale headroom.
  • Deliver the multi-tenant scale roadmap for the self-service motion.
  • Establish quarterly architecture review cadence with the engineering team.
Why Ombud

You'll own the platform that runs production AI for some of the largest enterprise software companies in the world. The infrastructure decisions you make will directly enable our 2026 strategy of moving from response management to autonomous revenue execution. You'll work with a small, senior engineering team that ships fast and trusts each other.

ABOUT OMBUD

Ombud is a Denver-based B2B SaaS company building the agentic AI platform that powers Revenue Operations teams at enterprises like Workday, UKG, and Prudential. Our product, Ombuddy, automates the response work — RFPs, security questionnaires, proposals — that has historically eaten enterprise sales cycles. Our 2026 strategy is to extend this from response management into Orchestrated Revenue Operations: autonomous execution of the discrete sales processes that move revenue. Our 2035 BHAG is $1B ARR powering 80% of discrete B2B sales motions.

We run on EOS. We hire for output, not pedigree. We expect honesty over politeness, decisions over discussions, and execution over enthusiasm.

HOW WE WORK — PIRCC VALUES
  • Progressive — We grow. We learn. We push the model forward, not protect the status quo.
  • Integrity — We do the right thing and keep our commitments. Said and done are the same thing.
  • Resourceful — We turn constraints into creativity. We do more with less and bring solutions, not problems.
  • Customer-Centric — Our customers' success is the metric that matters. We anticipate their needs and earn their trust.
  • Community — We build a team people want to be part of, and we invest in the communities we serve.

Similar Jobs

4 Minutes Ago
Remote or Hybrid
United States
130K-150K Annually
Senior level
130K-150K Annually
Senior level
Consumer Web • eCommerce • Internet of Things
Own and produce developer-facing documentation for DNSid including API references (TypeScript, Python, Go), conceptual guides, integration tutorials, developer portal IA, standards/spec writing, changelogs, and CI-validated code samples. Work closely with SDK engineers and developer advocates to document features pre-release, set style and tooling, and ensure docs are machine- and AI-consumable.
Top Skills: A2ACiCrewaiDnsDocusaurusGitGoLangchainLlamaindexLlms.TxtMcpMicrosoft Agent FrameworkMintlifyOauth 2.0OidcOpenai Agents SdkPythonReadthedocsSpiffeSpireTxt RecordsTypescript
4 Minutes Ago
Remote or Hybrid
United States
130K-150K Annually
Senior level
130K-150K Annually
Senior level
Consumer Web • eCommerce • Internet of Things
Founding Developer Advocate for DNSid: build and grow the developer community, create videos/blogs/tutorials, speak at events, run workshops/hackathons, engage on GitHub/Discord, ship SDK demos and integrations (TypeScript/Python/Go), contribute upstream open-source, and feed developer insights into the product roadmap.
Top Skills: A2ACrewaiDnsGitGoLangchainLlamaindexMcpMicrosoft Agent FrameworkMtlsOauth 2.0OidcOpenai Agents SdkPythonSpiffeTypescript
5 Minutes Ago
Remote or Hybrid
United States
150K-180K Annually
Senior level
150K-180K Annually
Senior level
Consumer Web • eCommerce • Internet of Things
Build and maintain production SDKs (TypeScript, Python, Go) and integrations for AI agent frameworks and edge runtimes. Implement DNSid identity flows, cryptographic key lifecycle, middleware/plugins, testing and CI pipelines, package releases, and reference apps. Collaborate with Developer Advocates and technical writers while contributing upstream to third-party frameworks and shaping protocol specifications.
Top Skills: A2ACertificate ChainsCi/CdCloudflare WorkersCrewaiDnsDns Operator ApisDnssecEd25519Es256Fastly ComputeGitGoGo Module ProxyGo ModulesHttp/1.1Http/2Jwk SetsJwtLangchainLanggraphLlamaindexMcpMicrosoft Agent FrameworkMtlsNpmOauth 2.0OidcOpenai Agents SdkPypiPythonSemantic VersioningSpiffe/SpireTlsTypescriptVercel EdgeWebassemblyWebid

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account