What you'll do
- Design, build, and scale our data warehouse.
- Collaborate with Data Scientists, BI Analysts, Engineers, and Product Managers to gather requirements for specific datasets, reports, and model training.
- Research new data warehouse and data pipeline solutions. Identify requirements, build prototype implementations, propose adoption, and plan migration paths.
- Implement data pipelines that aggregate data from various sources and write to a variety of internal and external platforms. (Examples include S3, NetSuite, Salesforce, customer APIs.)
- Identify and resolve performance bottlenecks in SQL queries, workflows, and model training.
- Ensure availability of our production data stores and pipelines through building resilient systems, monitoring, alerting, team-wide on-call rotations, and triaging and resolving issues.
- Participate in blameless post-mortems and retrospectives that enable us to learn from both failures and successes in order to improve team process and software design.
- Our current data storage infrastructure is comprised of RedShift, PostgreSQL (Amazon RDS), DynamoDB, and Redis. We are hosted on AWS and leverage a variety of their services. We use Ruby and Python heavily throughout our production application and data infrastructure, along with bits of Java. We use Airflow to manage our ETL workflows and Amazon’s Database Migration Service for moving data around.
- Code review and continuous integration are important parts of all our production code, including data architecture. We use Github and CircleCI and write automated tests using RSpec and unittest.
- Bachelor’s degree in a STEM or quantitative field, or have equivalent training or work experience.
- Minimum of 2-years professional software engineering experience (strong preference for experience in data engineering roles or data-intensive environments).
- Fluency in at least one programming language (Python, Ruby, C++, Java, or Go preferred).
- Proficiency with SQL, schema design, and query optimization.
- Experience with Linux command line and shell scripts.
- Working knowledge of data warehouses, data pipelines, and schedulers.
- Experience with cloud infrastructure on AWS, Google Cloud, or Azure.
- You take pride in your work and the quality of products you build.
- You value collaboration and enjoy teaching and learning from peers.
- You speak up when you see something wrong and support your opinions with data.