Avaak Logo

Avaak

Data Engineer

Posted Yesterday
In-Office
New York City, NY, USA
180K-220K Annually
Mid level
In-Office
New York City, NY, USA
180K-220K Annually
Mid level
Build and maintain ingestion pipelines for heterogeneous healthcare data (TPA feeds, claims, eligibility, enrollment). Design dbt models and orchestration (Dagster/Airflow), implement data quality monitoring and observability, and partner with data science to produce and maintain feature pipelines and feedback loops.
The summary above was generated by AI

Most of what makes American healthcare expensive isn’t medical care. It’s the machinery wrapped around it: middlemen taking a cut, fraud nobody stops, and billing systems designed to fight over payment instead of deliver care. The result is higher premiums, denied claims, surprise bills, and a system patients increasingly experience as adversarial.

Arlo is rebuilding health insurance for small businesses from first principles: making sure as much of every premium dollar as possible goes to care instead of getting absorbed by the system around it. We do that by identifying fraud earlier, steering members toward higher-quality and lower-cost care, automating operational overhead, and eliminating vendors whose business exists mostly to take a cut.

AI is the foundation that makes this work. We use it across underwriting, operations, clinical programs, and member experience to build an insurer that becomes more efficient as the technology improves.

We’re already operating at meaningful scale: profitable, hundreds of millions in premiums, tens of thousands of members covered, and growing quickly through brokers, employers, and partners. Backed by Upfront Ventures, 8VC, and General Catalyst, with a team from Palantir, YC companies, and longtime healthcare operators.

The Opportunity

Arlo quotes small businesses using AI-powered underwriting, and the quality of that underwriting is only as good as the data beneath it. We're hiring a Data Engineer to build and maintain the pipelines, models, and monitoring systems that keep our data infrastructure clean, timely, and trustworthy.

This is a hands-on individual contributor role. You'll sit at the boundary between data engineering and data science, working directly with underwriting, pricing, and analytics teams to ensure the right data reaches the right systems at the right time.

What You'll Work On

Pipeline development and maintenance

  • Build and maintain ingestion pipelines for complex, heterogeneous data sources — TPA feeds, carrier data, census files, claims, eligibility, and enrollment records

  • Design and implement dbt models and transformation logic that produce clean, reliable "source of truth" tables used across underwriting, pricing, and reporting

  • Own pipeline orchestration using tools like Dagster or Airflow, ensuring reliable scheduling, retries, and alerting

Data quality and observability

  • Build monitoring and alerting for data inconsistencies: duplicate records, mismatched member IDs, enrollment timing gaps, and carrier reporting lags

  • Profile ingest delay characteristics across live policy data and flag where structural latency introduces systematic bias

  • Maintain clear documentation of known data quality limitations so downstream teams know what the data can and cannot reliably support

Collaboration with data science

  • Partner closely with the data science team to build and maintain feature pipelines that feed underwriting and pricing models

  • Support feedback loop infrastructure that carries post-quoting learnings back into upstream models

  • Work with engineering to prioritize data quality fixes and accelerate resolution of upstream issues

What We're Looking For

Required

  • 3–5 years in a data engineering or backend engineering role with significant data pipeline ownership

  • Proficiency in Python and SQL; comfortable writing production-quality code in both

  • Hands-on experience with pipeline orchestration tools (Dagster, Airflow, Prefect, or similar)

  • Experience with dbt or equivalent transformation frameworks

  • Familiarity with cloud data environments (AWS, GCP, or Azure) and columnar/analytical databases

  • Track record working with messy, real-world datasets and building systems that handle inconsistency gracefully

  • Strong instincts around data quality — you catch problems before they reach downstream consumers

Nice to have

  • Background in health insurance, claims data, or actuarial/TPA data environments

  • Experience supporting ML feature pipelines or working alongside data science teams

  • Familiarity with MLflow or similar MLOps tooling

  • Exposure to healthcare data standards or sensitive regulated data environments

How You'll Work

You'll own your projects end-to-end — from initial scoping through to production deployment and ongoing monitoring. There's no separate ML engineering handoff; you'll work directly with the people who depend on your pipelines daily. The role requires equal comfort in Python-based engineering and SQL-driven analysis, and a genuine interest in understanding the business context behind the data.

Interview Process

  1. Intro call with our recruiter

  2. Resume interview with an Arlo co-founder

  3. Technical take-home challenge (data engineering problem)

  4. Onsite (or virtual): technical review + behavioral/cultural interviews

Compensation

$180,000 - $220,000 + equity

 
Why Join Arlo:
  • High ownership: You’ll get real responsibility from day one—our high-trust team empowers you to run with big problems and shape core parts of the company.

  • Join an important mission: Your work directly influences how people access care and improves lives at scale.

  • Growth & expansion: We’re moving fast, and as we grow, your scope will grow with us—new challenges, bigger opportunities, and rapid career velocity.

  • Apply AI to a problem that matters: Instead of optimizing ads or cutting labor costs, you’ll use AI to fundamentally reimagine how people get healthcare.

  • High pace, high collaboration: We operate with velocity, first-principles thinking, and a team that works closely, openly, and with ambition.


Exact compensation inclusive of salary and any bonuses is determined based on a number of factors including experience and skill level, location, and qualifications which are assessed during the interview process.
Arlo is an equal opportunity employer. We do not discriminate based on age, race, color, creed or religion, national origin, sexual orientation, gender identity or expression, military status, sex, disability, predisposing genetic characteristics, marital status, familial status, status as a victim of domestic violence, or arrest or conviction record, as defined under New York State law.
🔒 Your safety matters to us. If you're selected to move forward in our hiring process, you'll hear directly from a member of our Recruiting team via an @joinarlo.com email address. We will never ask for personal or financial information outside of our formal onboarding process. When in doubt, please reach out to us to verify at: [email protected].

Similar Jobs

2 Days Ago
Hybrid
New York, NY, USA
81K-116K Annually
Mid level
81K-116K Annually
Mid level
Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Build and maintain dbt pipelines and real-time ingestion using PySpark and AWS Glue. Apply software engineering best practices, review code in Git, and collaborate with data scientists and stakeholders to source, govern, and deliver reliable data for analytics and AI. Translate requirements into Jira stories and support deployment, tooling, and data governance across the enterprise.
Top Skills: Aws GlueDatabricksDbtGCPGenerative AiGitJIRAKafkaPysparkPythonSQL
4 Days Ago
Remote or Hybrid
New York, NY, USA
215K-250K Annually
Senior level
215K-250K Annually
Senior level
Artificial Intelligence • Fintech • Machine Learning • Mobile • Payments • Retail • Software
Own and modernize Upsides analytics data platform: migrate pipelines, reduce cost, improve governance, design reusable modeling/orchestration patterns, deliver domain-critical data products, lead cross-functional initiatives, mentor engineers, and support ML and product teams.
Top Skills: AWSCi/CdDagsterDatabricksDbtSnowflakeTerraform
4 Days Ago
Hybrid
New York, NY, USA
124K-177K Annually
Senior level
124K-177K Annually
Senior level
Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Design, build, and optimize SQL and NoSQL database solutions (PostgreSQL, Elasticsearch, DynamoDB). Develop stored procedures, functions, triggers, and complex queries. Implement CDC and AWS DMS, manage platform via GitHub/CI-CD and Terraform, monitor and tune performance, participate in Level 3 on-call, and collaborate with analysts, architects, and developers to deliver scalable data services.
Top Skills: AWSAws MskChange Data Capture (Cdc)Ci/CdDms (Aws Database Migration Service)DynamoDBElasticsearchGitNoSQLPostgresSQLTerraform

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account