Protege Logo

Protege

Solutions Engineer (Media)

Posted 2 Days Ago
Remote
Hiring Remotely in USA
Mid level
Remote
Hiring Remotely in USA
Mid level
The Solutions Engineer for media will curate and validate datasets from Protege's catalog, collaborating with sales and partners to meet customer AI data needs effectively.
The summary above was generated by AI

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Role Overview


We’re hiring a Solutions Engineer for our media vertical to connect Protege’s media catalog with customer AI data needs. This is not a traditional modeling role. It is an applied data curation and delivery role for fast-moving, ambiguous environments where both speed and quality matter.

You will work with imperfect, evolving partner datasets and build strategies to normalize, validate, and operationalize them for downstream AI use cases. You’ll become an expert in Protege’s growing catalog of audio, video, and motion capture content — from longform assets with title-level metadata to clip-level content generated with TwelveLabs embeddings.

At a high level, you will understand what customers are building, identify the content that best fits their needs, and deliver datasets that meet both technical and conceptual requirements, often on tight timelines tied to active deals.

What You’ll Do

Own data quality and curate media datasets

  • Partner with Sales and Solutions to translate customer requirements into curation strategies

  • Work with imperfect partner data, including mismatched metadata, schema differences, and incomplete labeling

  • Normalize and standardize datasets for reliable downstream use

  • Query and analyze Protege’s media catalog using SQL, internal APIs, and metadata tools to identify relevant content

  • Build validation checks and workflows to ensure dataset integrity before delivery

  • Identify, debug, and resolve data quality issues across file structures, metadata, and content alignment

  • Use AI tools and transcoded embeddings to surface and refine clip-level content

  • Turn messy, real-world data into structured datasets that meet customer and model requirements

  • Run iterative sample reviews with customers, incorporate feedback, refine selections, and ensure final packages meet spec

Be the catalog expert

  • Build deep expertise in Protege’s media catalog structure, metadata, and growth patterns

  • Track content coverage, diversity, and modality mix, and identify gaps relative to customer demand

  • Partner with Product and Partnerships to share catalog insights that inform sourcing priorities

Operate across product, data, and customer

  • Work cross-functionally to ensure content packaging meets technical, ethical, and licensing requirements

  • Develop methods, scripts, and internal tools that improve curation efficiency and scale

  • Help shape Protege’s delivery platform, including how internal users and customers search, sample, and export data

Drive human-in-the-loop media search and curation

  • Work closely with embedding-based systems to iterate between algorithmic selection and human review

  • Define best practices for embedding queries, relevance evaluation, and content diversity

  • Maintain a high bar for operational excellence and quality assurance throughout the process

What Success Looks Like

30 days: Learn and get operational

  • Build a working understanding of the media catalog, delivery lifecycle, and core tools.

  • Establish strong cross-functional relationships and shadow live curation workflows.

60 days: Deliver and improve

  • Lead dataset sampling and curation for active use cases, and document reusable workflows.

  • Surface early insights on catalog coverage, metadata quality, and process improvements.

90 days: Scale and influence

  • Create repeatable QA and delivery workflows that increase consistency and speed.

  • Provide actionable feedback that shapes platform, sourcing, and catalog roadmap decisions.

What You Bring

  • 4-7 years of experience in data science, media analytics, technical curation, or similarly hands-on data roles.

  • Strong SQL proficiency and comfort querying large, messy datasets to generate insight and action.

  • Experience working with media metadata, embeddings, or unstructured content.

  • Ability to translate nuanced customer or model requirements into concrete dataset specifications.

  • High standard for data quality, operational rigor, and usability of delivered outputs.

  • Clear communicator who can move between technical depth and customer-friendly clarity.

  • Thrive in ambiguous, fast-moving environments and treats teammates with kindness.

Bonus if you also have:

  • Familiarity with video/audio processing, embeddings, or multimodal AI workflows.

  • Prior experience curating or packaging datasets for machine learning.

  • Background in content analysis, recommendation systems, or information retrieval.

Working with Protege

We move fast - thoughtfully. Speed matters in what we're building, but so does intention. We're biased toward action and always learning.

We're a lean, high-trust team. Everyone has real ownership. Clarity and autonomy drive our best work.

We take our work seriously, not ourselves. We solve hard problems with humility and celebrate wins - big and small.

We're kind, direct, and inclusive. We give feedback early and often, with the goal of helping one another grow.

We're builders at heart. Every person at Protege is hands-on, resourceful, and focused on creating momentum.

We grow fast - together. You'll be surrounded by people who care about impact, who challenge you to think bigger, and who are genuinely excited about what comes next.

Top Skills

Ai Tools
Embeddings
Internal Apis
Metadata Tools
SQL

Protege New York, New York, USA Office

New York, New York, United States

Similar Jobs

Yesterday
Remote
NE, USA
152K-209K Annually
Senior level
152K-209K Annually
Senior level
Big Data • Machine Learning • Software • Analytics • Big Data Analytics
The role involves forming client relationships, providing technical guidance, creating custom solutions, and developing skills in solution architecture. Responsibilities include programming, delivering technical propositions, and contributing to community engagement through workshops and seminars.
Top Skills: Big DataCloud ComputingJavaPythonScala
11 Minutes Ago
Easy Apply
Remote
USA
Easy Apply
192K-241K Annually
Senior level
192K-241K Annually
Senior level
Fintech • Social Impact • Software
The Staff Software Engineer will lead SRE initiatives, define observability strategies, and enhance system reliability across teams at ActBlue.
Top Skills: AWSDatadogEksFastlyKubernetes
12 Minutes Ago
Remote
United States
165K-290K Annually
Senior level
165K-290K Annually
Senior level
Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software
Lead the design and implementation of AI-based products, collaborate with stakeholders for strategic guidance, and build scalable systems while mentoring teams. Required to have significant experience in coding and AI technologies.
Top Skills: AngularAWSAzureGCPGenaiGoJavaLlmsPythonReactSQL

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account