The Big Data Lead will implement ETL pipelines, ensure data integrity, troubleshoot PySpark applications, and integrate with existing frameworks while leading a team.
Responsibilities: • Experience with big data processing and distributed computing systems like Spark. • Implement ETL pipelines and data transformation processes. • Ensure data quality and integrity in all data processing workflows. • Troubleshoot and resolve issues related to PySpark applications and workflows. • Understand source, dependencies and data flow from converted PySpark code. • Strong programming skills in Python and SQL. • Experience with big data technologies like Hadoop, Hive, and Kafka. • Understanding of data warehousing concepts and relational databases like SQL. • Demonstrate and document code lineage. • Integrate PySpark code with frameworks such as Ingestion Framework, DataLens, etc., • Ensure compliance with data security, privacy regulations, and organizational standards. • Knowledge of CI/CD pipelines and DevOps practices. • Strong problem-solving and analytical skills. • Excellent communication and leadership abilities. Qualifications: • 4+ years of experience in big data development, Hadoop , Hive & Spark framework. • Good to have experience in SAS. • Strong Python, PySpark Development and SQL knowledge. • Certification in big data or cloud technologies is preferred.
Top Skills
Hadoop
Hive
Kafka
Pyspark
Python
Spark
SQL
Similar Jobs
Information Technology • Consulting
The Big Data Lead will manage data projects utilizing technologies like Amazon Redshift, Azure Data Factory, and Apache Spark to optimize data processes.
Top Skills:
Amazon RedshiftSparkAzure Data FactoryDatabricksGoogle Cloud PlatformHadoopHiveScalaSnowflake
Information Technology • Consulting
Lead the development of data pipelines and transformations in Azure Databricks, converting Scala programs to PySpark while leveraging various Azure technologies.
Top Skills:
AdfAzure Data Lake Gen 2Azure DatabricksDelta LakePysparkPythonSparkSynapse Analytics
Information Technology • Consulting
The Big Data Lead will manage database development, ETL/ELT processes, and data warehousing, optimizing performance and ensuring data pipelines work reliably.
Top Skills:
AWSAws GlueAws S3AzureAzure BlobAzure Data FactoryAzure DevopsGitJenkinsOracleSnowflakeSQLTalendTeamcity
What you need to know about the NYC Tech Scene
As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.
Key Facts About NYC Tech
- Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
- Key Industries: Artificial intelligence, Fintech
- Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
- Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory
