Senior Data Engineer
The biggest bottleneck in bringing new treatments to patients is the clinical trial. On average, getting a drug through the trial process takes nearly a decade and frequently costs $1B+. And the problem is only getting worse.
As Senior Data Engineer you will be responsible for TrialSpark’s clinical data platform. You will lead the engineering effort to ingest millions of Electronic Health Records, clean and structure this data for analytical and product use cases, and identify patients that will be served by a clinical trial. You will partner with the Data, Product, and Medical teams to set and achieve targets for data quality, and build a learning feedback loop to move the needle over time. You will evolve our data infrastructure to meet growing operational and data complexity and scale. You will become a domain expert in clinical data and its application to products and operations across the company. As a founding member of the Clinical Data team, you will play a significant role in developing the team’s culture and strategy. Ultimately, you will leverage data to bring treatments to patients who may not have had access otherwise.
Responsibilities- Build and maintain pipelines to clean and structure complicated health data
- Evolve infrastructure and data architecture to accommodate product needs
- Partner with Data Analysts to assess the quality of our data and automate targeted improvements
- Implement data privacy and security as necessary, for example by implementing de-identification of Personally Identifiable Information
- Create tools to continuously monitor, test, and optimize our clinical data pipeline to ensure timely delivery and high quality
- Collaborate with operational and product partners to achieve business and mission outcomes
- Partner with our Data team to maintain and scale data warehousing and analytics as necessary (Redshift, DBT)
- Help enforce best practices and promote testability and maintainability throughout our systems and codebase
- Minimum 4 years of professional software development experience
- Professional experience building and maintaining data pipelines (e.g. Airflow, Prefect, or Luigi)
- Fluency in SQL and at least one other programming language
- Strong knowledge of data modeling
- Experience architecting data systems
- Comfortable with Linux, Docker, and cloud technologies
- Excellent problem solving and debugging skills
- Strong communication skills with the ability to convey complicated systems to both technical and non-technical audiences
- B.S. in Computer Science or related field, or equivalent experience
- Experience building cross functional feedback loops
- Experience with infrastructure as code tools (Ansible, Terraform, etc)
- Experience performance tuning row-based (PostgreSQL) and columnar (e.g. Redshift) data stores
- Experience working with healthcare data (Electronic Health Records, Insurance Claims, etc.)