Data Engineer at TrialSpark (Greater NYC Area, NY)
TrialSpark is a technology-driven drug development company that runs end-to-end clinical trials, focused on bringing new treatments to patients faster and more efficiently.
The biggest bottleneck in bringing new treatments to patients is the clinical trial. On average, getting a drug through the trial process takes nearly a decade and frequently costs $1B+. To combat this industry problem, TrialSpark has built a technology platform that optimizes all aspects of a clinical trial, enabling more efficient trial design, faster trial completion, and higher trial data quality.
TrialSpark recently raised their Series C, and is putting the capital to work by in-licensing and co-developing drug programs through in-house development, joint ventures, and NewCos. Together with doctors, patients, and communities, TrialSpark is working to develop the treatments of tomorrow.About the Position
As a Senior Data Engineer on our Data Platform Team, you will own TrialSpark’s data infrastructure. You will make key data architecture decisions and lead significant greenfield initiatives to implement the next generation of our data platform and pipelines. You’ll enable product and analytical teams with timely, high quality data from diverse sources including application databases, partner Electronic Health Record systems, and medical devices. You’ll become a domain expert in clinical data and its application to products and operations across the company.
In this role you’ll collaborate with Product Engineers and our Infrastructure Team to build data transformations that unlock efficiencies in the clinical trial process, for example to ingest hundreds of millions of complicated health records, clean and structure this data for analytical and product use cases, and identify patients that will be served by previously inaccessible treatments. You’ll partner with the Analytics, Product, and Medical teams to set and achieve targets for data quality and latency, and build a learning feedback loop to move these needles over time. As a founding member of the Data Platform team, you will play a significant role in developing the team’s culture and strategy. Ultimately, you will leverage data to bring treatments to patients who may not have had access otherwise.Responsibilities
- Design and build data pipelines to clean and structure clinical data
- Deploy tools to continuously monitor, test, and optimize data pipelines to ensure timely delivery and high data quality
- Partner with Analytical stakeholders to assess the quality of our data and automate targeted improvements
- Safeguard patient privacy and trial data integrity by Implementing data privacy and security, for example by implementing de-identification of Personally Identifiable Information
- Partner with our Analytics team to maintain and evolve our modern data stack as necessary (Looker, Redshift, DBT, Stitch)
- Help enforce best practices and promote testability and maintainability throughout our systems and codebase
- At least one year of professional software development experience preferably in a data-oriented role (e.g. Data Engineer)
- Professional experience building and maintaining data pipelines (e.g. Airflow, Prefect, Luigi, AWS Glue or Batch)
- Fluency in SQL and at least one other programming language (Python preferred)
- Experience with Unix, Docker, and cloud technologies
- Strong problem solving and debugging skills
- Strong written and verbal communication skills
Nice to have
- Experience with infrastructure as code tools (e.g. Terraform, Ansible, Pulumi)
- Experience performance tuning row-based (e.g. PostgreSQL) and columnar (e.g. Redshift) data stores
- Experience working with healthcare data (Electronic Health Records, Insurance Claims, etc.)
- B.S. in Computer Science or related field
You will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.