Principal Data Architect
Greater NYC Area
2 weeks ago
Develop ETL (Extract, Transform and Load) Data pipelines in Spark, Kinesis, Kafka, custom Python apps to transfer massive amounts of data (over 20TB/ month) most efficiently between systems.
Engineer complex and efficient and distributed data transformation solutions using Python, Java, Scala, SQL.
Productionalize Machine Learning models efficiently utilizing resources in clustered environment.
Research, plan, design, develop, document, test, implement and support Yieldmo proprietary software applications.
Analytical data validation for accuracy and completeness of reported business metrics.
Open to taking on, learn and implement engineering projects outside of core competency.
Understand the business problem and engineer/architect/build an efficient, cost-effective and scalable technology infrastructure solution.
Monitor system performance after implementation and iteratively devise solutions to improve performance and user experience.