apiphani

Senior Data Pipeline Engineer

Reposted 16 Days Ago

Easy Apply

Remote

Hiring Remotely in USA

45K-90K Annually

Senior level

Easy Apply

Remote

Hiring Remotely in USA

45K-90K Annually

Senior level

Design, develop, and maintain scalable data pipelines and data products using AWS and Apache Spark. Collaborate with analytics teams to ensure data quality and optimize performance. Mentor junior engineers and lead technical efforts.

The summary above was generated by AI

Apiphani is a technology-enabled managed services company dedicated to redefining what it means to support mission-critical enterprise workloads. We’re a small but rapidly growing company, which means there’s lots of room for growth and learning opportunities abound!

Apiphani is dedicated to creating a diverse and inclusive work environment for all as a fundamental component of our business. Diversity and inclusion are the bedrock of creativity and innovation. Without diversity of experience and thought, we would fail to progress as a company and as a team. Apiphani strives to foster an environment of belonging, where every employee feels respected, valued, and empowered. We embrace the unique experiences, perspective, and cultural background, which only you can bring to the table.

Job Description

An experienced data pipeline engineer who uses modern data engineering practices to transform raw data into reliable, consumable data products on AWS and other cloud platforms. The role is responsible for designing, developing, testing, and deploying scalable data pipelines, data warehouses, data lakes, and data products that support business and analytics needs. As a senior member of the analytics team, you will own critical production data pipelines and shape the evolution of our customer-facing data products and metrics. You will work closely with data analysts, data scientists, and other stakeholders to ensure data quality, reliability, and availability across batch and streaming workloads. Typical activities include developing and configuring jobs for data ingestion, transformation, enrichment, efficient database and table design, and exposing curated data to downstream consumers. The role focuses on efficiency and resilience by aligning data platforms and pipelines with business goals and cloud architecture best practices. You will also influence data and platform roadmaps by providing technical leadership, setting best practices, and mentoring other engineers.

Job Duties

Design, develop, and maintain scalable batch and streaming data pipelines using Apache Spark and cloud-native services (for example AWS Glue, EMR, Kinesis, and Lambda).
Utilize and optimize Apache Spark (RDDs, DataFrames, Spark SQL) for distributed processing of large datasets, including both batch and near real‑time use cases.
Implement robust ETL/ELT processes to ingest and transform data from databases, APIs, files, and event streams into curated datasets stored in S3 data lakes, data warehouses (such as Amazon Redshift), and data marts.
Implement data quality checks, validation rules, and governance controls (including schema enforcement, profiling, and reconciliation) to ensure accuracy, completeness, and consistency.
Develop and maintain logical and physical data models, schemas, and metadata in catalogs to support analytics, BI, and ML consumption.
Create and manage data warehouses, data lakes, and data marts on AWS and other cloud platforms (such as Azure or GCP) following modern architectural patterns.
Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into scalable pipeline and modeling solutions.
Collaborate with DevOps, platform, security, and compliance teams to ensure secure, reliable cloud implementations and adherence to organizational standards.
Develop cloud and data architecture documentation, including diagrams, guidelines, and best practices, to enable knowledge sharing and reuse.
Troubleshoot and resolve data pipeline and job issues across development and production environments, ensuring minimal downtime and preserving data integrity.
Continuously optimize data pipelines for performance, cost, reliability, and data quality using best practices in distributed data engineering and cloud resource tuning.
Build algorithms and prototypes that combine and reconcile raw information from multiple sources, including resolving data conflicts and inconsistencies.
Provide technical leadership for the analytics data stack, including reviewing designs, establishing standards for observability and reliability, and guiding junior engineers in delivering high-quality solutions.
Define and manage data and cloud infrastructure using infrastructure‑as‑code tools such as Terraform (and/or AWS CDK/CloudFormation) to ensure consistent, repeatable environments across development, test, and production.
Participate actively in agile ceremonies (backlog refinement, sprint planning, daily stand‑ups, reviews), including estimating and updating user stories, tracking progress, and collaborating closely with data product and analytics stakeholders.

Required Skills

Bachelor’s degree in Computer Science, Engineering, Mathematics, or related field, or equivalent work experience.
6+ years of experience in data engineering or closely related roles, working with large, complex datasets.
Demonstrated experience owning production-grade data pipelines end to end, from design and implementation through monitoring, incident response, and continuous improvement.
Extensive hands-on experience with Apache Spark for large-scale data processing, including RDDs, DataFrames, and Spark SQL.
Familiarity with big data ecosystem components such as HDFS, Hive, and HBase, and their cloud-native equivalents on AWS and other clouds.
Experience with SQL and NoSQL databases such as MySQL, PostgreSQL, DynamoDB, or similar technologies.
Strong proficiency in SQL and at least one programming language such as Python (preferred) for data processing, automation, and orchestration glue code.
Experience with data pipeline orchestration and scheduling tools such as AWS Step Functions, Amazon Managed Workflows for Apache Airflow (MWAA), or Apache Airflow.
Experience with cloud-based data platforms and services, ideally AWS (S3, Glue, EMR, Redshift, Kinesis, Lambda), with exposure to Azure or GCP as a plus.
Experience designing and implementing data warehouses and data lakes, including partitioning, file formats, and performance optimization.
Experience with data quality, automated data testing, and data governance methodologies and tools; familiarity with lineage, cataloging, and access controls.
Strong analytical and problem-solving skills, high attention to detail, and clear written and verbal communication.
Ability to work independently and collaboratively in a fast-paced, agile, and cross-functional environment.
Experience working with a modern data catalog such as Alation, Collibra, or similar tools is a plus.
Ability to prepare and curate data for prescriptive and predictive modeling (for example, features for ML models) is a plus.
Hands‑on experience with infrastructure as code, preferably Terraform (and/or AWS CDK/CloudFormation), to provision and manage data and cloud resources.
Practical experience working in an agile delivery model, including breaking down work into user stories, sizing and updating them during the sprint, and delivering incrementally.

Base Salary

$45,000—$90,000 USD

Company Benefits*

Medical/dental/vision - 100% paid for employees, 50% paid for dependents
Life and disability - 100% paid for employees
401K - 3% contribution, no employee contribution necessary
Education and tuition reimbursement - up to $50K annually
Employee Stock Options Plan
Accident, critical illness, hospital indemnity benefits offered through our providers
Employee Assistance Program
Legal assistance
Paid Time Off - up to 6 weeks per year
Sick Leave - up to 2 weeks per year
Parental Leave - up to 12 weeks

*Benefits listed in the job description apply to employees working in the United States. For international employees, Apiphani partners with an Employer of Record, Deel, and provides all statutory benefits required under local law; certain U.S.-specific programs (such as EAP, legal assistance, etc.) may not be available outside the United States. The specific benefits package will be outlined in the local employment agreement issued through Deel.

Top Skills

Spark

Aws Emr

Aws Glue

Aws Kinesis

Aws Lambda

Python

SQL

Terraform

Similar Jobs

Wasabi Technologies

Accounting Manager

9 Minutes Ago

Easy Apply

Remote

United States of America

Easy Apply

130K-160K Annually

Senior level

130K-160K Annually

Senior level

Cloud • Information Technology

Manage end-to-end revenue accounting for a high-volume SaaS environment, ensuring ASC 606 compliance, overseeing AR/billing teams, supporting month-end close and audits, and driving automation and system improvements across NetSuite and connected systems.

Top Skills: Ai-Enabled Finance ToolsAutomated Billing And Revenue Recognition ToolsAutomation PlatformsExcelNetSuiteSalesforceTableau

Basis

Customer Success Manager

11 Minutes Ago

Easy Apply

Remote

United States

Easy Apply

69K-98K Annually

Junior

69K-98K Annually

Junior

AdTech • Digital Media • Marketing Tech • Software • Automation

Manage a book of customers to drive revenue and platform adoption through strategic account management, training, optimization, and cross-functional collaboration. Deliver partner reviews, maintain client health metrics in CRM/CS tools, provide campaign feedback, and influence product improvements while growing customer advocacy and NPS.

Top Skills: BasisCRMCustomer Success ToolsDspProgrammatic

Garner Health

GTM Systems Manager

20 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

140K-185K Annually

Mid level

140K-185K Annually

Mid level

Big Data • Healthtech • HR Tech • Machine Learning • Software • Telehealth • Big Data Analytics

Manage and optimize GTM systems to support revenue teams: implement rapid system changes, enforce Salesforce data hygiene, manage vendor licenses, monitor system health, and advise on RevOps tooling to improve operational efficiency.

Top Skills: Salesforce,Outreach,Clay,Gong,Hubspot,Zoominfo

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory