Udio (udio.com) Logo

Udio (udio.com)

Senior Backend Engineer, Data Modeling and Ingestion Platform

Posted 2 Days Ago
Easy Apply
In-Office or Remote
2 Locations
160K-220K Annually
Senior level
Easy Apply
In-Office or Remote
2 Locations
160K-220K Annually
Senior level
Lead the unification of large heterogeneous datasets for generative audio models by creating robust systems for data ingestion, deduplication, and reconciliation. Collaborate with ML researchers and develop scalable entity-resolution solutions while tracking data quality metrics.
The summary above was generated by AI
About the Role

We are looking for a Senior Backend Engineer to lead the unification of large, highly rich, and heterogeneous datasets sourced from a wide range of external providers. These datasets are used to power our generative audio models. 

Your work will create the foundational dataset that powers our research by building robust, scalable systems for linking, deduplicating, reconciling, and enriching data at massive scale. This role centers on high-impact bulk ingestion and advanced data linkage. You will design the logic, algorithms, and strategies that transform many independent datasets into a unified, high-quality canonical asset used throughout the company.

You will collaborate closely with ML researchers and product teams, working with tools such as BigQuery, Dataflow/Beam, TFRecords, and—where beneficial—distributed systems frameworks like Ray. Familiarity with ML workflows using JAX or multihost training is a plus, as the datasets you produce will directly support that ecosystem.

What You'll Do
  • Build high-throughput bulk ingestion workflows to integrate datasets from multiple external providers. 
  • Design and implement scalable entity-resolution solutions, including record linking, deduplication, clustering, and conflict arbitration. 
  • Create and refine matching logic, decision rules, and similarity functions to align datasets with high accuracy and strong coverage. 
  • Define and track data quality indicators, such as overlap metrics, match precision/recall, duplicate rates, and completeness. 
  • Prepare training-ready datasets in formats such as TFRecords, and structure data to meet ML research requirements. 
  • Develop processing components using Dataflow (Beam) and manage large analytical workloads in BigQuery
  • Leverage frameworks like Ray to accelerate large-scale experiments, feature extraction, and research-oriented data preparation. 
  • Collaborate with ML researchers to anticipate downstream requirements and evolve linkage strategies as new sources and use cases emerge. 
What We're Looking For 
  • Experience working with large, heterogeneous datasets from multiple providers or domains. 
  • Strong background in entity resolution, deduplication, data unification, or related large-scale data integration techniques. 
  • Proficiency in Python, with an emphasis on efficient, scalable data processing. 
  • Experience with BigQuery, Google Dataflow/Apache Beam, or similar batch-processing frameworks. 
  • Familiarity with data validation, normalization, reconciliation, and building consistent views across diverse data sources. 
  • Ability to craft well-structured matching and decision strategies that balance accuracy, completeness, and computational efficiency. 
  • Comfortable iterating quickly on pragmatic solutions, balancing correctness with time-to-delivery. 
  • Clear communication skills and the ability to collaborate closely with ML and research teams. 
 Nice to Have
  • Knowledge of architecting Google Cloud Platform systems at scale
  • Experience with distributed compute frameworks such as Ray, Spark, or Flink
  • Understanding of JAX-based ML pipelinesmultihost training setups, or large-scale data preparation for accelerator-backed workflows. 
  • Familiarity with TFRecords or other high-volume training data formats. 
  • Exposure to ranking, clustering, or statistical similarity modeling. 
  • Experience with Go, NextJS, and/or React Native to contribute to full-stack development
Why Join Us
  • You will design the core dataset that underpins our research, product development, and generative audio models. 
  • You'll work on large-scale data challenges that require creativity, algorithmic thinking, and engineering excellence.
  • You'll join a small, fast-moving team where your decisions shape the direction of our data and research capabilities.
Benefits
  • Highly competitive salary and equity 
  • Quarterly productivity budget
  • Flexible time off
  • Fantastic office location in Manhattan
  • Productivity package, including ChatGPT Plus, Claude Code, and Copilot
  • Top notch private health, dental, and vision insurance for you and your dependents
  • 401(k) plan options with employer matching 
  • Concierge medical/primary care through One Medical and Rightway
  • Mental health support from Spring Health
  • Personalized life insurance, travel assistance, and many other perks

Udio’s success hinges on hiring great people and creating an environment where we can be happy, feel challenged, and do our best work. 

Udio provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, genetics, sexual orientation, gender identity, or gender expression. We are committed to a diverse and inclusive workforce and welcome people from all backgrounds, experiences, perspectives, and abilities.

This role is eligible for a compensation package of base salary, equity, and benefits. The starting base salary range for this role is $160,000 - $220,000. Actual salary may vary based on level, work experience, performance, and other factors evaluated during the hiring process.


Top Skills

Apache Beam
BigQuery
Flink
Google Dataflow
Python
Ray
Spark
Tfrecords

Udio (udio.com) New York, New York, USA Office

New York, New York, United States

Similar Jobs

18 Hours Ago
Remote or Hybrid
Indiana, USA
45K-86K Annually
Senior level
45K-86K Annually
Senior level
Automotive • Cloud • Greentech • Information Technology • Other • Software • Cybersecurity
The Senior Field Sales Representative will develop new business, maintain client relationships, achieve sales targets, and oversee the customer lifecycle.
Top Skills: Salesforce
21 Hours Ago
Remote or Hybrid
35 Locations
100K-145K Annually
Mid level
100K-145K Annually
Mid level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Software Development Engineer role involves designing and developing user mode software for Windows, collaborating across teams, troubleshooting issues, and managing feature development from concept to delivery.
Top Skills: Agile DevelopmentC++LinuxMac Os XWindows
21 Hours Ago
Remote or Hybrid
22 Locations
120K-180K Annually
Mid level
120K-180K Annually
Mid level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
As a Backend Engineer, develop and enhance cloud microservices, implement solutions for cyber threats, and work with Large Language Models while collaborating across teams.
Top Skills: AWSCassandraDockerEc2ElasticsearchGitGoIamKafkaKubernetesPythonRedisS3

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account