Data Engineer
About TrialSpark
The biggest bottleneck in bringing new treatments to patients is the clinical trial. On average, getting a drug through the trial process takes nearly a decade and frequently costs $1B+. And the problem is only getting worse.
TrialSpark is a new healthcare company that owns the end-to-end drug development process. Our proprietary technology allows us to integrate and improve clinical research for patients, providers, and sponsors, while executing clinical trials faster and cheaper.
Job Description
As a Data Engineer you will evolve TrialSpark’s data infrastructure and build data-intensive applications to unlock efficiencies in the clinical trial process. You will build pipelines to ingest hundreds of millions of complicated health records, clean and structure this data for analytical and product use cases, and identify patients that will be served by previously inaccessible treatments. You will collaborate with Engineers and stakeholders to achieve targets for data quality and latency and to continuously improve our systems over time. You will evolve our data infrastructure to meet growing operational and data complexity and scale. You will become a domain expert in clinical data and its application to products and operations across the company. As a founding member of the Clinical Data team, you will play a significant role in developing the team’s culture and strategy. Ultimately, you will leverage data to bring treatments to patients who may not have had access otherwise.
Responsibilities
- Design and build data pipelines to clean and structure complex health records
- Deploy tools to continuously monitor, test, and optimize data pipelines to ensure timely delivery and high data quality
- Partner with Data Analysts to assess the quality of our data and automate targeted improvements
- Implement data privacy and security as necessary, for example by implementing de-identification of Personally Identifiable Information
- Partner with our Analytics team to maintain and evolve our modern data stack as necessary (Looker, Redshift, DBT, Stitch)
- Help enforce best practices and promote testability and maintainability throughout our systems and codebase
Qualifications
- One or more years of professional software development experience preferably in a data-oriented role (Data Engineer, Analytics Engineer, etc.)
- Professional experience building and maintaining data pipelines (e.g. Airflow, Prefect, Luigi, AWS Glue and Batch)
- Fluency in SQL and at least one other programming language (Python preferred)
- Experience with Unix, Docker, and cloud technologies
- Strong problem solving and debugging skills
- Strong written and verbal communication skills
Nice to have
- Experience with infrastructure as code tools (e.g. Terraform, Ansible, Pulumi)
- Experience performance tuning row-based (e.g. PostgreSQL) and columnar (e.g. Redshift) data stores
- Experience working with healthcare data (Electronic Health Records, Insurance Claims, etc.)
- B.S. in Computer Science or related field
You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.