Senior/Staff Software Engineer - Data
Background
Before new medical treatments can be administered to the public, they must demonstrate safety and efficacy in a clinical trial. These trials protect consumers from ineffective and dangerous products, but the clinical trial process also presents a tremendous bottleneck in delivering life-saving treatments to patients. A typical trial involves coordinating between numerous parties and data formats to gather, store, analyse, and audit clinical data. Mistakes and delays are common, and fewer than 10% of trials finish on time.
One of the most difficult challenges in running a clinical trial is patient outreach. At TrialSpark, we believe we can leverage our patient network data to reach patients who may benefit from participating in clinical trials. We are looking for talented software engineers to help us reimagine the clinical trial process from first principles and build the technology data platform to achieve our mission.
Description
This is a hands-on role in an agile and fast-paced environment with technical leadership and growth opportunities. You help inaugurate this role and be responsible for designing, building, and scaling our growing patient network database and underlying data pipeline infrastructure. In addition, you will be working with and integrating a variety of data sources to power our patient network analytics. This is not big data being moved from datastore A to datastore B for batched analysis, but it is complex and nuanced data from which business-critical insights must be extracted. Consequently, you will be actively partnering with our Product, Data, Platform Engineering, Patient Operations, and Medical teams to grow your domain expertise and build data infrastructure to enable data-driven decision making for TrialSpark. Your ability to understand and navigate the complexities of healthcare data and clinical trials will be essential to your success and the success of TrialSpark.
Responsibilities
- Build, maintain, and evolve our data pipeline and overall data architecture to accommodate a growing amount of data and scale our patient database
- Implement reliable data integrations from a variety of sources and develop data quality metrics to assess the trustability of these sources
- Create tools to continuously monitor, test, and optimize our pipeline to ensure timely delivery and high quality
- Work with operational partners and product management to connect business and product needs to clean and robust data model and architecture
- Partner with our Data team to build out a data warehouse for high-performance ad hoc querying
Qualifications
- 5+ years of software development, preferably with tech lead experience
- Fluency in SQL and at least one programming language (Python preferred)
- Strong knowledge of data modeling, pipeline scheduling and flows (e.g. Airflow), database design and architecture, ETL (e.g. dbt), OLAP
- Experience with performance tuning row-based (PostgreSQL) and columnar (e.g. Redshift) data stores
- Comfortable with Linux, Docker, and cloud technologies (AWS)
- Experience with agile development and continuous integration/deployment (CircleCI)
- Excellent problem solving and debugging skills
- Exceptional communication skills with the ability to convey complicated systems to both technical and non-technical audiences
- B.S. in Computer Science or related field, or equivalent experience
Bonus points
- If you have worked with OMOP CDM
- If you have experience working in healthcare data technology and really want to change how it works
- If working with dirty data does not deter you