CloudZero Logo

CloudZero

Lead ML Engineer

Reposted 20 Days Ago
Hybrid
Boston, MA
220K-250K Annually
Senior level
Hybrid
Boston, MA
220K-250K Annually
Senior level
Lead the development and implementation of AI and ML solutions, transforming prototypes into scalable products while managing a team of specialists and driving technical strategy.
The summary above was generated by AI
Lead ML / Data ScientistAbout the Role

CloudZero is growing fast. Our customer base is expanding, the data challenges we're solving are getting more complex, and the platform is scaling to match. As our founding ML/Data Scientist, you'll own the hardest data science problems at CloudZero: building the models, pipelines, and intelligence layer that powers real-time cost visibility, anomaly detection, forecasting, and agentic governance across billions of dollars in cloud spend.

This is real ML engineering work at scale, not a research role or a prompt engineering job. You'll work at the intersection of financial telemetry, cloud infrastructure, AI inference, and stream processing, shaping how CloudZero evolves from a billing-first platform toward a telemetry-first, cost-per-anything model for cloud and AI. You'll set the technical patterns, solve problems no one has solved before, and help build the team around you.

This role is ideal for an engineer who thrives on hard data science problems, cares deeply about correctness and production quality, and wants to see their work matter to customers in direct and measurable ways.

What You'll Do

Build the ML Foundation

  • Spend 70% or more of your time hands-on: building models, writing production code, designing pipelines, and shipping ML capabilities that customers use

  • Define the standards, infrastructure, and patterns the future ML team will build on

  • Partner closely with platform engineering and product to embed ML into CloudZero's core, serving as the technical bridge rather than a separate track

Solve Genuinely Hard ML Problems

  • Build real-time anomaly detection systems that identify cost spikes, efficiency breaches, and AI usage anomalies across millions of cloud and inference events via stream processing (Kafka, Flink/KStreams)

  • Develop production-grade time-series forecasting models for cost and usage, with proper seasonality handling, confidence intervals, and feedback loops

  • Model relationships between cloud resources, services, products, and business units as semantic cost graphs at cloud scale

  • Tackle cardinality estimation for compound effects of high-dimensional column combinations at the core of our data model

  • Build the multi-tier architecture that processes every AI inference event in real time, per model, per token, per team, per customer, reconciled against billing to produce total cost-to-produce intelligence

  • Design the intelligence layer for autonomous AI agents, including real-time budget enforcement, policy compliance detection, and spend guardrails for the agents customers deploy in production

Take Models to Production

  • Own the full stack: feature engineering, model serving, monitoring, retraining pipelines, and feedback loops

  • Turn research and prototypes into production-grade features with full observability baked in

  • Apply LLM-based approaches for semantic parsing, NL-to-query translation, and conversational analytics where they genuinely fit, and know when they don't

What You Bring
  • 6+ years of ML engineering and data science experience, with meaningful time in production systems at scale

  • Deep time-series fluency: you've built forecasting and anomaly detection systems that made it to production and earned customer trust

  • Classical ML foundations across graphs, clustering, probabilistic modeling, and data structures; you reach for the right tool, not the trendiest one

  • Full-stack production ML ownership: feature engineering, model serving, monitoring, retraining pipelines, and feedback loops

  • Python fluency and data warehouse experience (Snowflake, BigQuery, or equivalent)

  • Formal background in Computer Science, Statistics, Mathematics, or a related quantitative field

Bonus If You Have...
  • GenAI/LLM experience: you've integrated LLMs, seen their failure modes, and know when to use them versus traditional ML

  • Cloud ML infrastructure experience with AWS SageMaker, Bedrock, or equivalent at enterprise scale

  • FinOps or cost intelligence domain knowledge, including cloud billing, infrastructure cost models, or related financial data

  • Founding IC experience: you've been the first or second data scientist and know what it takes to build from scratch

  • Graph modeling and semantic layer experience in production contexts

  • A bias toward correctness: you care whether models are actually right, not just accurate on a validation set

About CloudZero

Cloud cost management is one of the biggest challenges organizations face today. As cloud adoption continues to accelerate, so do the complexities and costs associated with it, and macroeconomic conditions only increase pressure to prove cloud efficiency.

CloudZero is a SaaS platform at the intersection of next-generation cloud cost management and FinOps. We ingest billing and usage data from all cloud, SaaS, and PaaS providers, organize it in real time according to our customers' business structures, and empower organizations to make more informed business decisions.

Since our founding in 2016, our mission has been to make efficient innovation a reality for every cloud-driven organization. We believe every engineering decision is a buying decision, and we're applying proven reliability engineering principles to financial efficiency.

We believe the best AI empowers users with clear insights and confident decisions, transforming complex cloud cost data into actionable intelligence that drives meaningful business outcomes.

To date, we've raised over $56 million from leading venture capital firms. We're solving problems of massive scale, business importance, and complexity in a space that needs it more than ever.

Equal Opportunity Employer

CloudZero is an equal opportunity employer and values diversity. We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status or disability status. All job offers are contingent upon the candidate passing background and reference checks.

Please note: CloudZero is unable to sponsor employment visas. Candidates must have permanent authorization to work in the United States without the need for current or future sponsorship.

Top Skills

Bedrock
Python
Sagemaker

Similar Jobs

6 Days Ago
Hybrid
New York, NY, USA
230K-286K Annually
Senior level
230K-286K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
The Senior Lead Machine Learning Engineer will design, develop, and implement machine learning systems, collaborating with teams to solve business problems using machine learning applications.
Top Skills: AWSAzureDaskGoogle Cloud PlatformJavaKubeflowPythonPyTorchScalaScikit-LearnSparkTensorFlow
9 Days Ago
Hybrid
New York, NY, USA
179K-246K Annually
Senior level
179K-246K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
As a Lead Machine Learning Engineer, you'll design and implement machine learning applications, collaborate with teams, ensure model performance, and build cloud-based infrastructures.
Top Skills: AWSJavaKserveKubernetesPythonPyTorchScalaTensorFlow
20 Days Ago
In-Office or Remote
USA
225K-260K Annually
Senior level
225K-260K Annually
Senior level
Robotics
Develop and scale machine learning training systems for robotics data, optimize training pipelines, and collaborate with ML researchers to enhance autonomy models.
Top Skills: Distributed Training FrameworksMachine LearningNeural NetworksPython

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account