Egra

Research Engineer

Posted 3 Days Ago

Be an Early Applicant

In-Office

New York City, NY, USA

170K-220K Annually

Mid level

In-Office

New York City, NY, USA

170K-220K Annually

Mid level

As a Research Engineer, you'll design and build the systems for preprocessing EEG data and experiment tracking for machine learning research, optimizing workflows for reproducibility and data integrity.

The summary above was generated by AI

Hi, I'm Brian, Co-Founder of Egra. We just raised $5.5M to build foundation models for brain signals, and We're looking for research engineers to join our founding team.

You'll have complete ownership over your work from day one. No lengthy onboarding, no waiting for permission, no navigating layers of approval. A small founding team, deep technical problems, and the resources to solve them. You'll define the infrastructure architecture, make critical engineering decisions, and build the systems that make our research possible. If you thrive with high agency and want your work to directly shape the company's trajectory, this is that opportunity.

What you'd be doing

EEG — electrical brain activity recorded from the scalp — is one of the hardest real-world signal modalities in ML: low signal-to-noise ratio, massive subject variability, and device inconsistencies. Most people avoid it for these reasons.

As our research engineer, you'd own the systems that make research possible. To ground it with real examples, the kind of projects you'd own:

Building versioned, reproducible preprocessing pipelines for EEG data from multiple sources — handling device-specific normalization, channel mapping across montages, artifact detection, and signal quality checks. If we ask "which preprocessing version produced this result," your systems answer that instantly.
Designing the experiment tracking and training infrastructure so we can run dozens of pretraining experiments in parallel without losing track of what changed. Hyperparameters, data splits, preprocessing versions, and model checkpoints are linked and reproducible.
Building a data ingestion system that can absorb different EEG formats (EDF, BDF, BIDS, proprietary device exports) and normalize them into a clean internal representation.
Optimizing training pipelines for throughput on noisy, variable-length signal data. Mixed precision, smart batching across different recording lengths, efficient data loading for datasets that don't fit neatly into standard loaders.

Where this is going

We're building toward a world where thought is an interface.

You silently compose a message and it types itself. You navigate an AR display without lifting a finger. Software adapts to your cognitive state in real time. A universal interface between human thought and digital action.

The product we're building to get there has three layers:

A Neural Encoder: a foundation model that maps raw EEG into robust, reusable embeddings that work across devices, subjects, and contexts
A Neural API: a stable interface that any app can call: "What is the user's state?" "What intent is most likely?" "What changed?"
Reference applications: proving utility and driving our data collection flywheel

Near-term, the use cases are already real. A limited vocabulary of thought-to-action commands (volume, select, activate, navigate) would feel like magic to consumers. Sleep staging, stress detection, cognitive load monitoring, and engagement measurement are all feasible with today's signal quality. On the clinical side, we're pursuing avenues like epilepsy monitoring and migraine pre-emption as a wedge for high-quality data, credibility, and early revenue.

Hardware matters too. No comfortable, discreet consumer device today covers the brain regions needed for language decoding. We'll eventually design our own. Think a normal-looking baseball cap with dry electrodes hidden in the brim, or something that looks more like AirPods than a medical device. The model needs to be hardware-agnostic, because the form factors will keep evolving.

None of this works without infrastructure.

The majority of failed ML research fails because of infrastructure, not ideas. Bad data splits leak information. Preprocessing bugs silently invalidate months of experiments. Training runs can't be reproduced because no one tracked the right things. Results look great until someone realizes the evaluation was wrong.

EEG makes all of this worse. We're dealing with data from different devices, electrode layouts, and sampling rates. As we scale from public datasets to clinical partnerships to consumer data collection, the infrastructure has to handle all of it cleanly.

Research culture

You'll be embedded in the research, not adjacent to it.

You ship infrastructure, not features. Your users are researchers (including the founders), and your success is measured by how fast and confidently they can run experiments.

Reproducibility is a first-class product. We treat experiment reproducibility the way a good engineering team treats test coverage.

You have a voice in research decisions. You'll see patterns the researchers miss: data quality issues, training instabilities, evaluation blind spots. We expect you to flag them.

Failed experiments are documentation, not waste. We write up what doesn't work with the same care as what does.

Who we're looking for

You've built the systems that make ML research actually work. You care deeply about data integrity, reproducibility, and clean abstractions.

You don't need EEG experience, but you should have worked with data that's messy, heterogeneous, and doesn't fit neatly into standard ML pipelines. Audio, sensor data, medical signals, time-series — anything where the preprocessing is half the battle.

You should have:

Experience building ML training and data pipelines for real-world data
Strong Python skills and comfort with the PyTorch ecosystem
Experience with experiment tracking, data versioning, and reproducible workflows
The ability to debug data and training issues that span the full stack, from raw signal to loss curve

Bonus points for:

Experience with signal processing or time-series data pipelines
Comfort with distributed training or mixed-precision optimization
Having built internal tools that researchers actually loved using
Familiarity with data formats like EDF, BIDS, or HDF5
Experience with EEG/BCI data pipelines or neuroscience data tooling (MNE-Python, MOABB, Braindecode)

You should NOT apply if:

You've only worked with clean, well-structured datasets
You need detailed specs before you can start building
You're not comfortable working in a 3–5 person team with no dedicated manager

Interview process

Our process is three conversations:

30-minute intro call. We'll tell you what we're working on, you'll tell us what you've worked on. Casual, honest, no prep needed.
30-minute technical conversation. We'll work through a real infrastructure design problem together. No right answer. We want to see how you think about tradeoffs, correctness, and iteration speed.
30-minute deep dive. You'll meet both founders. We'll dig into past projects, talk about how you debug hard data problems, and figure out if we'd enjoy working together every day.

Benefits

Competitive salary and meaningful equity
Platinum-tier health insurance
Uncapped compute access
Full engineering autonomy: own the problem, not just a task list
No bureaucracy, no review committees
Conference budget + co-author publication support
Relocation and visa support (flexible on remote)

Top Skills

Bdf

Bids

Edf

Hdf5

Python

PyTorch

Similar Jobs

Pfizer

Machine Learning Engineer

6 Days Ago

Hybrid

106K-177K Annually

Mid level

106K-177K Annually

Mid level

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical

The role involves designing and deploying AI systems for vaccine discovery, leading predictive modeling, integrating datasets, and mentoring colleagues on AI applications in vaccinology.

Top Skills: Cloud EnvironmentsHpc EnvironmentsMlopsPythonPyTorchTensorFlow

Tether.io

AI Research Engineer - Pre training

Yesterday

In-Office or Remote

New York, NY, USA

100K-500K Annually

Expert/Leader

100K-500K Annually

Expert/Leader

Blockchain • Software • Analytics • Financial Services • Cryptocurrency

The AI Research Engineer will develop innovative architectures for AI models, enhance model intelligence, and conduct large-scale pre-training using distributed servers and NVIDIA GPUs, while advancing AI performance through novel techniques.

Top Skills: Hugging FaceLlm ArchitecturesNvidia GpusPre-Training OptimizationPyTorch

Deepgram

Research Engineer, Machine Learning Systems

4 Days Ago

In-Office or Remote

USA

150K-250K Annually

Mid level

150K-250K Annually

Mid level

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI

The Research Engineer will collaborate with scientists to develop and validate machine learning models for speech technologies, manage scalable training systems, and design tools for accessibility to non-technical users.

Top Skills: DockerKubernetesMachine LearningMlflowPrefectSpeech-To-TextText-To-Speech

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory