Cake (cake.ai) Logo

Cake (cake.ai)

Staff Software Engineer, ML Platform

Reposted 21 Days Ago
Easy Apply
Remote
Hiring Remotely in USA
Expert/Leader
Easy Apply
Remote
Hiring Remotely in USA
Expert/Leader
As a Staff Software Engineer, you will lead the development of Cake's AI platform infrastructure, focusing on building, operating, and improving ML systems. The role requires collaboration with customers to address ML challenges and enhance platform capabilities, emphasizing ownership and operational excellence.
The summary above was generated by AI

Location: Remote US, collaboration primarily during EST hours

About the company

Cake is on a mission to make cutting-edge AI accessible to enterprise teams.

Enterprises want to move faster with AI, but are constrained by infrastructure complexity, high operating costs, and the governance required to run AI systems safely at scale. Cake removes those barriers, enabling teams to deploy and operate AI/ML platforms 10x faster and 10x cheaper than traditional approaches—without sacrificing reliability or control. Cake runs inside the customer’s own VPC, giving enterprises full ownership of their data, security, and operations.

Cake solves the full infrastructure problem across 4 layers: compute infrastructure management, open-source ML components, common integrations, and pre-built project components. Built-in security, monitoring, and governance ensure clean ownership, enforce guardrails, and provide a dependable path from experimentation to production at scale. 

Backed by top investors, Cake is seeing strong adoption and is positioned for rapid growth in the next 12 months. Our culture emphasizes ownership, clear communication, and collaboration, with a high bar for operational excellence and production-ready systems. 

What you’ll do

As a Staff Software Engineer, you will play a critical leadership role in building and operating the infrastructure that powers Cake’s AI platform. This is a high-ownership role for an engineer who thrives at the intersection of distributed systems, cloud infrastructure, and developer experience.

You’ll design and operate the ML platform foundations that both internal teams and customers rely on, owning systems end-to-end from architecture to production. You’ll work closely with customers to translate real-world ML use cases into reliable, scalable platform capabilities.

This role is ideal for someone who wants to be a technical owner, not just an implementer, someone who cares deeply about system quality, operational excellence, and clear communication.

You will:

  • Build Enterprise-Scale Infrastructure
    • Leverage infrastructure-as-code to manage complex cloud environments supporting critical ML and AI initiatives.
    • Design Kubernetes-native systems, including controllers/operators where appropriate.
    • Improve platform networking, security, and observability
  • Sustain Platform Health and Performance
    • Own critical systems in production, including reliability, scalability, security, and cost efficiency.
    • Identify and proactively address technical debt, operational risk, and platform bottlenecks.
    • “Learn by doing” — Quickly ramp up to a complex tech stack (Terraform, Kubernetes, Istio, Crossplane, Go, TypeScript)
  • Enable Teams and Customers to Move Faster 
    • Create abstractions and tooling that make it easier for teams and customers to deploy, run, and scale AI/ML workloads. 
    • Collaborate directly with customers to understand their ML infrastructure challenges and translate them into platform improvements.
    • Balance speed and rigor—shipping quickly while maintaining a high bar for quality and safety.
  • Lead Through Influence 
    • Act as a technical leader and mentor across the engineering organization.
    • Write clear documentation and design proposals that align stakeholders and drive decisions.
    • Partner closely with product and leadership to shape platform direction and priorities.
Requirements
  • Core Experience 
    • 10+ years of engineering experience, with significant time spent on infrastructure, platform, or distributed systems.
    • Deep hands-on experience with Kubernetes in production environments.
    • Strong cloud experience across AWS, GCP, and/or Azure.
    • Proven track record of building and operating secure, scalable MLOps platforms.
  • Technical Strength
    • Deep understanding of infrastructure-as-code (e.g., Terraform, Pulumi, CDK).
    • Strong programming skills in at least one backend language (Go preferred; TypeScript also welcome).
    • Experience diagnosing and debugging complex production issues.
    • Familiarity with modern CI/CD, test-driven development, and DevSecOps practices.
    • Bonus: experience building Kubernetes operators and/or working with service meshes (e.g., Istio).
  • Ownership & Communication 
    • Comfortable owning large, ambiguous problems from inception to production.
    • Excellent communicator, able to clearly explain complex systems to both technical and non-technical audiences.
    • Experience working directly with customers and incorporating feedback into technical decisions.
    • Ability to operate autonomously while keeping stakeholders informed and aligned.
  • Mindset 
    • Customer-first and product-oriented.
    • Curious, adaptable, and eager to learn new systems and domains.
    • Collaborative, respectful, and willing to lean into hard conversations.
    • Energized by fast-paced environments and meaningful responsibility.
Why Join Cake
  • High impact, high ownership: You’ll own foundational systems that directly power customer success.
  • Small, senior team: Your work won’t get lost—you’ll shape the platform and engineering culture.
  • Real customers, real problems: You’ll build systems used in production by growing companies.
  • Autonomy and trust: We hire experienced engineers and give them room to operate.
Benefits
  • Competitive cash compensation alongside above-market equity upside
  • Top-tier fully covered medical, dental, and vision insurance
  • Life insurance
  • 401k program
Additional Perks
  • Unlimited PTO
  • Monthly half day
  • Citi Bike membership
  • Monthly wellness stipend
  • Office equipment stipend, including reimbursement for approved disability-related accommodations
  • Investment in employee learning and growth opportunities

Cake is committed to providing equal employment opportunities to all employees and job seekers regardless of race, color, religion, national origin, sexual orientation, gender, gender identity, marital status, disability, veteran status, or any other legally protected category. As an equal opportunities employer, we value diversity and its positive impact on our culture. 

Cake also complies with the Americans with Disabilities Act (ADA). We are dedicated to working with and providing reasonable accommodation to job applicants with physical or mental disabilities. If you require accommodation, please email us at [email protected] and we will promptly address your request.

Top Skills

AWS
Azure
GCP
Go
Istio
Kubernetes
Terraform
Typescript
HQ

Cake (cake.ai) New York, New York, USA Office

New York, New York, United States, 10010

Similar Jobs

11 Days Ago
Easy Apply
In-Office or Remote
San Francisco, CA, USA
Easy Apply
Senior level
Senior level
Energy
Join Equilibrium Energy as a Staff/Sr Staff Software Engineer to enhance the Science Platform for ML model deployment, requiring strong Python and ML expertise.
Top Skills: ArgoCi/CdDagsterDatabricksDockerGrafanaHoneycombKubernetesMetaflowPandasPanderaPolarsPrometheusPythonPyTorchSklearnSparkXgboost
32 Minutes Ago
Remote or Hybrid
9 Locations
38K-67K Hourly
Senior level
38K-67K Hourly
Senior level
Fintech • Financial Services
Lead and manage a team to drive business growth, enhance customer experience, and ensure compliance with banking regulations. Responsible for coaching, talent development, and managing branch operations.
Top Skills: Banking RegulationsCustomer Engagement Processes
32 Minutes Ago
Remote or Hybrid
7 Locations
37K-66K Hourly
Senior level
37K-66K Hourly
Senior level
Fintech • Financial Services
Provide advisory and relationship management to affluent consumer and small business clients, proactively acquiring and deepening relationships, recommending banking and investment solutions, ensuring regulatory compliance and licensing (SAFE/FINRA), collaborating with mortgage, wealth, and business partners, and mentoring branch colleagues to manage moderately complex client needs.

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account