Mecka AI is building the data infrastructure layer for robotics and embodied AI.
We partner with leading AI labs and robotics companies to deliver high-quality, real-world datasets used to train, evaluate, and deploy robotic systems - where model performance is dictated by data quality.
We are hiring a Forward Deployed Data Engineer to operate on the frontier with customers: take messy, real-world capture data - much of it raw video - and turn it into beautiful, reliable, model-ready datasets, while owning the technical relationship end-to-end.
This is a senior, high-trust role with significant autonomy. You'll combine data engineering, hands-on analysis, and product judgment to deliver datasets customers can train and ship on - and to make our delivery systems more reliable every time you do.
Own the end-to-end delivery of customer datasets: requirements, validation, iteration, final handoff.
Be the technical point of contact: communicate clearly, set expectations, and close loops.
Turn one-off customer needs into durable internal improvements - tooling, pipelines, and standards that make every future delivery faster and safer.
Build, debug, and harden data pipelines across ingestion, transformation, QA, and export.
Work fluently across storage and database paradigms (SQL + NoSQL + object storage) and pick the right tool for the job.
Establish reliable dataset "contracts": schemas, versioning, provenance, and reproducible builds - so every dataset has a clear source of truth.
Define and measure what makes a dataset good for a given task: coverage, diversity, balance, label fidelity, and fitness for the customer's model.
Build quality scorecards and coverage/diversity reports that make dataset health legible to customers and internal teams.
Query and slice large corpora to maximize customer fit - surface exactly the data that matches a target distribution, not just bulk volume.
When the signal a customer needs is missing or weak in the raw video, diagnose it and partner with the perception/ML pipeline teams to extract or improve it upstream.
5+ years in data engineering and/or backend engineering (or equivalent impact).
Strong experience with large data systems, pipelines, and analytical workflows.
Strong SQL proficiency and comfort across multiple database/storage paradigms.
Excellent engineering judgment and debugging ability in production systems.
Genuine data taste - you can look at a dataset and reason about whether it's complete, balanced, and trustworthy, not just whether the job ran.
You've owned high-stakes customer deliveries with autonomy and trust.
You can translate ambiguous requirements into crisp dataset specs and execution plans.
You have strong product instincts and care about polish: "would I trust this dataset?"
You're comfortable working with unstructured, real-world data - especially video.
Working literacy in video understanding, embeddings, and encoders - enough to reason about what a dataset teaches a model and where signal is missing.
Experience building data-quality, coverage, or diversity tooling.
Background adjacent to ML, computer vision, or robotics data.
Own the customer-facing delivery loop for world-class robotics datasets.
High autonomy, high trust, and direct impact on customer success and revenue.
Work across the full stack of the problem: data, pipelines, analysis, and delivery quality.
Sit at the exact point where raw, messy, real-world data becomes the thing that makes embodied-AI models work.
Similar Jobs
What you need to know about the NYC Tech Scene
Key Facts About NYC Tech
- Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
- Key Industries: Artificial intelligence, Fintech
- Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
- Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory



