Data Platform Analyst
Stash is a digital-first financial services company committed to making saving and investing accessible to everyone. By breaking down barriers and building transparent, technology-driven products, we help the 99% build smarter financial habits so they can confidently save more, grow wealth, and enjoy life.
At Stash, data is at the core of how we make decisions and build great products for millions of users. As a Data Platform Analyst you will be a part of our Data Engineering Team which is leading the architectural design decisions and implementation of a modern data infrastructure at scale and also building the data lake to bring all data in one place for easy consumption. You will be a key resource in defining and driving data modeling standards in data lake based on query patterns, ETL/reporting requirements.
Our tech stack (but not limited to): Python, Scala, Hadoop, Yarn, Pandas, Spark, MongoDB, AWS EMR/EC2/Lambda/kinesis/S3/Glue/DynamoDB/API Gateway, Yarn, ElasticSearch, Hive, Redshift, Airflow, and Terraform.
What you'll do:
- Understand various datasets (structured, semi-structured, unstructured) and relationship among them
- Coordinate with various groups and understand data requirements
- Define schemas/tables in data lake based on the standard practices, query patterns
- Help engineers define integrated datasets based on the business requirements
- Identify reporting requirements as and when needed by various groups
- Make sure to enforce data contracts between analytics and application space
- Make sure to keep analytics datasets in sync with operational/transactional datasets used by backend applications.
- Data validation
Who we’re looking for:
- BS in Computer Science, Statistics, Applied Mathematics, Physics, Engineering or a related field with 3+ years of experience
- Proficient in SQL, database concepts, data modelling.
- Data warehouse experience, ETLs
- Familiarity with Python or similar ETL friendly language
- Comfortable in working with linux OS
- Familiarity with data formats (row level, columnar etc)
- Prior experience with redshift/postgres
- Clear communication and presentation skills
- Familiarity with data validation best practices
- Experience with Amazon Web Services
- Experience with Hadoop, Hive
**No Recruiters, please