Wizard AI Jobs

Senior Machine Learning Engineer (Inference Platform)

Wizard AI

Senior Machine Learning Engineer (Inference Platform)

Reposted 10 Days Ago

Remote

Hiring Remotely in USA

200K-250K Annually

Senior level

Remote

Hiring Remotely in USA

200K-250K Annually

Senior level

As a Senior MLOps Engineer, you will manage production ML systems, define lifecycle strategies, optimize ML pipelines, and collaborate with cross-functional teams to enhance ML operations.

The summary above was generated by AI

About Wizard AI

At Wizard AI, we’re building the top-performing AI Shopping Agent that delivers the best products from across the web with unmatched accuracy, quality, and trust. Our ML models power the core of our platform, and we’re looking for a Senior Machine Learning Engineer to own how they run in production reliably, efficiently, and at scale.

The Role

As a Senior ML Engineer on our Inference Platform, you’ll own the end-to-end lifecycle of production ML serving systems from model packaging and deployment to monitoring, optimization, and scaling. This is not a traditional MLOps role focused solely on pipelines and tooling. You’ll be responsible for the inference infrastructure powering a live conversational shopping agent, operating multiple specialized serving engines under real-world production load.

You’ll own critical decisions around serving architecture, performance, reliability, and scalability, working closely with ML Engineers, Data teams, Product, and DevOps to ensure models move seamlessly from experimentation into high-performance production systems.

What You'll Do

Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements.
Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving.
Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability.
Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).
Build observability, monitoring, alerting, and operational tooling for production inference systems.
Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.
Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design.
Ensure ML serving systems are secure, scalable, and operationally resilient.
Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.

What We're Looking For

Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems.
Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints.
Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge.
Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries.
Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning.
Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements.
Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems.
Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.

What Success Looks LikeReliable, Scalable Inference Systems

Production serving infrastructure operates with clear SLAs, strong observability, and minimal downtime. Latency, availability, throughput, and GPU utilization are actively measured and optimized as platform demands grow.

End-to-End Ownership

You own the complete serving lifecycle — from deployment and release management through monitoring, optimization, and scaling — enabling ML engineers to ship quickly while maintaining reliability and reproducibility.

Technical Leadership and Impact

You shape the future of Wizard's inference platform, driving key architectural decisions that improve performance, reduce infrastructure costs, and support the next generation of AI-powered shopping experiences.

New York, New York, United States, 10013

Similar Jobs

General Motors

Senior ML Inference Engineer - Platform

12 Days Ago

Remote or Hybrid

129K-261K Annually

Senior level

129K-261K Annually

Senior level

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing

The Senior ML Inference Engineer will design and operate a deployment platform for ML models onto autonomous vehicle hardware, collaborating with teams to enhance tools and address deployment issues.

Top Skills: AirflowCudaFlyteKubeflowOnnxPythonPyTorchRayRay ServeTemporalTensorrtTorchserveTritonTriton Inference ServerVllm

Applied Systems

Customer AI Transformation Lead - EZLynx

55 Minutes Ago

Remote or Hybrid

140K-180K Annually

Senior level

140K-180K Annually

Senior level

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics

Lead design and deployment of AI agents and automation across customer delivery, defining ROI and performance metrics, building RAG/LLM solutions, creating an AI playbook for CX teams, and partnering with Product and Engineering to drive adoption and quality in implementations.

Top Skills: Agentic FrameworksAutogptLangchainLlmsPrompt EngineeringRetrieval-Augmented Generation (Rag)

Airwallex

Associate Director, Obligations Management

2 Hours Ago

Remote or Hybrid

New York, NY, USA

Expert/Leader

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI

Lead the global obligations management function: design and maintain a centralized obligations register, map legal and partner mandates to controls, manage RFI knowledge base and audit register, ensure traceability and remediation, partner with regional legal/compliance/audit teams, and scale the team and GRC tooling to replace manual trackers.

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Wizard AI

Senior Machine Learning Engineer (Inference Platform)

Wizard AI New York, New York, USA Office

Similar Jobs

Senior ML Inference Engineer - Platform

Customer AI Transformation Lead - EZLynx

Associate Director, Obligations Management

What you need to know about the NYC Tech Scene

Key Facts About NYC Tech