Prolaio Logo

Prolaio

Data Science Intern

Reposted 7 Days Ago
Easy Apply
Hybrid
Chicago, IL
50-50 Hourly
Internship
Easy Apply
Hybrid
Chicago, IL
50-50 Hourly
Internship
As a Data Science Intern, you will develop and validate LLM pipelines for extracting clinical endpoints from EHR data, ensuring data quality for clinical analysis.
The summary above was generated by AI

Who Are We?

Prolaio believes that continuous learning and collaboration can make a significant difference in how heart care is administered. We are creating smarter ways to address heart disease and heart risks by integrating a connected platform enabled by smart data science to help patients access the care and attention that will inform better treatments and outcomes.

We envision a future where care teams and hospitals can be more effective, the healthcare system can be more efficient, and patients have a better care experience and more fulfilling lives.

This is precision cardiology, and we know it’s within reach.

What Will You Do?

The Overview

As the Data Science Intern, you wil develop, operationalize, and validate Large Language Model (LLM) pipelines capable of extracting high-priority clinical endpoints from longitudinal Electronic Health Record (EHR) data. This role is critical for scaling the EHR study data by automating the extraction of complex clinical phenotypes and validating them against manual clinical review to ensure high-quality data for clinical analysis.

The Specifics

  • Endpoint Extraction Pipeline Development: Develop Python/LLM workflows, including workflows built on purpose-built clinical extraction tools, to ingest unstructured data (clinical notes, discharge summaries) and extract key study endpoints, specifically Clinical Events or “Unified Problem Lists”
  • Validation Framework Execution: Design and conduct a human review validation study comparing LLM-generated abstractions against a “gold standard” dataset derived from manual chart review.
  • Codebase Delivery: Build and maintain a documented code repository that inputs raw EHR data and outputs structured clinical datasets for study data.
  • Performance Analysis & Reporting: Analyze pipeline performance to establish concordance, sensitivity, and specificity metrics, delivering a final validation report with performance metric for multiple approaches.
  • Cross-Functional Collaboration: Collaborate with clinical and technical mentors to translate clinical requirements into technical solutions.

Why Prolaio?

  • Impactful Work: You will join in the fight against heart failure (HF) and hypertrophic cardiomyopathy (HCM) with the goal of extending and saving the lives of our patients while also being at the forefront of changing the healthcare industry through technology.
  • Innovative Environment: You will be part of an organization doing something that’s never been done before.
  • Professional Growth: You will join a growing team and have a substantial impact on our daily and future operations with the opportunity to continuously learn and grow.
  • Collaborative Team: You will be part of a team of collaborative, curious, and committed individuals focused on the collective good, inclusiveness, scientific excellence, and advancing digital health for cardiology.

Who You Are?

  • Currently enrolled in a Master’s or graduate-level program in Computer Science, Data Science, Biomedical Informatics, Bioengineering, Computational Biology, or a related field.
  • Technical Proficiency: Strong proficiency in Python programming with experience using Large Language Model (LLM) APIs.
  • NLP Knowledge: Familiarity with Natural Language Processing (NLP) concepts, specifically Prompt Engineering.
  • Data Handling: Experience handling unstructured text data, cleaning messy real-world data, and/or working with human evaluation datasets.
  • Domain Knowledge: A basic understanding of clinical terminology, Electronic Health Records (EHR), or biomedical data is highly preferred.
  • Analytical Skills: Ability to handle edge cases in text (e.g., negation) and validate one’s own output using standard validation metrics.

Compensation: The expected hourly rate for this internship is $50/hour.

**At this time, relocation and housing stipends are not offered for this internship.**

Similar Jobs at Prolaio

4 Hours Ago
Easy Apply
Hybrid
Easy Apply
159K-159K Annually
Senior level
159K-159K Annually
Senior level
Artificial Intelligence • Big Data • Healthtech • Mobile • Wearables • Analytics
Lead product definition and delivery for AI, real-world data, and advanced analytics in cardiovascular research. Own strategy, discovery, requirements, prototyping, and agile delivery; partner with engineering, data science, clinical operations, and stakeholders to build AI-enabled dashboards, data products, and privacy-preserving analytics that improve trial execution and patient insights.
Top Skills: Advanced AnalyticsAIDe-IdentificationFigmaLovablePseudonymizationRag (Retrieval-Augmented Generation)Real-World Data (Rwd)Tokenization
Yesterday
Easy Apply
Hybrid
Easy Apply
148K-148K Annually
Expert/Leader
148K-148K Annually
Expert/Leader
Artificial Intelligence • Big Data • Healthtech • Mobile • Wearables • Analytics
Design and govern Prolaio's regulated healthcare data platform: define architectures, models, pipelines, and lifecycle management for clinical, device, claims, and patient-generated data; apply healthcare terminologies and common data models; enable secure, interoperable, AI-ready datasets for analytics, research, clinical operations, and regulated workflows.
Top Skills: AirflowAWSAzureBigQueryCdiscDagsterDatabricksDbtFhirFivetranGCPHl7IcdInformaticaKafkaLoincOmopPythonRedshiftRxnormSnomed CtSnowflakeSparkSQLSynapseTalend
4 Days Ago
Easy Apply
Hybrid
Easy Apply
209K-209K Annually
Expert/Leader
209K-209K Annually
Expert/Leader
Artificial Intelligence • Big Data • Healthtech • Mobile • Wearables • Analytics
Lead enterprise AI strategy and execution to design, pilot, deploy, and scale generative AI, conversational assistants, agentic automation, and AI-enabled workflows across clinical, product, and enterprise systems in a regulated healthcare environment. Align cross-functional teams, define success metrics, and ensure compliance, auditability, and continuous improvement of production AI capabilities.
Top Skills: Agentic AiAWSAzureChatbotsConversational AiDatabricksGCPGenerative AiSnowflakeWorkflow Copilots

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account