Lead end-to-end data engineering on GCP: design scalable data platforms and architectures (lake/lakehouse/warehouse), build and optimize BigQuery/Spark pipelines, ensure data quality/security, enable ML workflows with Data Science, mentor engineers, and own delivery and performance.
We are seeking a Senior Data Engineer (Lead) to drive and own end-to-end data engineering initiatives. This role will lead all data engineering efforts, working closely with Data Science and Analytics teams to design scalable data platforms, enable advanced analytics, and support machine learning use cases.
The ideal candidate will bring deep expertise in cloud data engineering (GCP), strong data modeling capabilities, and proven experience in leading enterprise-grade data solutions.
Responsibilities- Lead and manage end-to-end Data Engineering delivery across projects and initiatives
- Act as the primary technical owner for data pipelines, architecture, and platform design
- Mentor and guide a team of data engineers, ensuring best practices and coding standards
- Design, build, and optimize scalable data pipelines on GCP
- Define and implement modern data architectures (data lake, lakehouse, warehouse)
- Ensure high performance, reliability, and data quality across pipelines
- Partner closely with Data Science teams to enable ML/AI workflows
- Translate business and modeling requirements into optimized data structures
- Support feature engineering, model training, and deployment pipelines
- Design logical and physical data models for analytics and ML use cases
- Implement dimensional modeling (Star/Snowflake schemas) and data vault where applicable
- Optimize datasets for performance, scalability, and usability
- Build and manage solutions using GCP services such as: BigQuery, Cloud Composer (Airflow), Cloud Storage, Dataproc
- Ensure security, governance, and cost optimization on GCP
Required Qualification and Skills:
- 8+ years of experience in Data Engineering, with leadership experience
- Strong expertise in GCP ecosystem and services
- Proficiency in SQL, Python, and/or Scala
- Hands-on experience with ETL frameworks and distributed processing
- Solid experience in Dimensional modeling, Data warehousing concepts, Data structures for ML/analytics
- Experience with Apache Spark and Real-time and batch processing frameworks
- Experience working with cross-functional teams (Data Science, Analytics, Business)
- Proven ability to lead, mentor, and drive delivery
- Strong ownership mindset with leadership capabilities
- Excellent problem-solving and architectural thinking
- Ability to operate in a fast-paced, collaborative environment
Preferred Qualifications:
- Experience in ML data pipelines / feature stores
- Knowledge of data governance, lineage, and quality frameworks
- Exposure to healthcare/payor domain (nice to have)
- Certifications in GCP (Professional Data Engineer)
What you need to know about the NYC Tech Scene
As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.
Key Facts About NYC Tech
- Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
- Key Industries: Artificial intelligence, Fintech
- Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
- Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory
