Featherless AI

Machine Learning Engineer — Inference Optimization

Reposted 22 Hours Ago

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.

The summary above was generated by AI

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
- Quantization (fp16, bf16, int8, fp8)
- KV-cache optimization & reuse
- Speculative decoding, batching, and streaming
- Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Why Join Us

Real ownership over performance-critical systems
Direct impact on product reliability and unit economics
Close collaboration with research, infra, and product
Competitive compensation + meaningful equity at Series A
A team that cares about engineering quality, not hype

Top Skills

Cuda

Ml Inference Optimization

Onnx Runtime

PyTorch

Tensorrt

Triton

Similar Jobs

Basis

Customer Success Manager

8 Minutes Ago

Easy Apply

Remote

United States

Easy Apply

69K-98K Annually

Junior

69K-98K Annually

Junior

AdTech • Digital Media • Marketing Tech • Software • Automation

Manage a book of customers to drive revenue and platform adoption through strategic account management, training, optimization, and cross-functional collaboration. Deliver partner reviews, maintain client health metrics in CRM/CS tools, provide campaign feedback, and influence product improvements while growing customer advocacy and NPS.

Top Skills: BasisCRMCustomer Success ToolsDspProgrammatic

Garner Health

GTM Systems Manager

17 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

140K-185K Annually

Mid level

140K-185K Annually

Mid level

Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics

Manage and optimize GTM systems to support revenue teams: implement rapid system changes, enforce Salesforce data hygiene, manage vendor licenses, monitor system health, and advise on RevOps tooling to improve operational efficiency.

Top Skills: Salesforce,Outreach,Clay,Gong,Hubspot,Zoominfo

Upstart

Collections Vendor Manager

17 Minutes Ago

Easy Apply

Remote

United States

Easy Apply

79K-109K Annually

Senior level

79K-109K Annually

Senior level

Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software

Manage day-to-day performance and operations of collections vendors, monitor KPIs and SLAs, drive performance improvements, ensure compliance and high-quality borrower experience, and coordinate cross-functional stakeholders while supporting onsite audits and vendor relationship management.

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory