Design and implement supervised fine-tuning pipelines, create benchmarks, analyze and improve production LLMs, run experiments and hyperparameter searches, build Python tooling for training/evaluation, and document research results.
Why Join This Team
Appen’s GenAI research team advances how frontier models are evaluated, improved, and deployed in production environments.
The purpose of this role is to design and implement research and engineering workflows that strengthen model performance, create new benchmarks, and improve production models without regressing on core characteristics.
This role provides hands on ownership of training and evaluation pipelines, benchmark development, and model improvement initiatives that directly influence deployed systems.
Your Impact
- Design and implement a lightweight supervised fine tuning training pipeline using open source LLMs.
- Create new benchmarks to evaluate frontier models across defined scientific and performance criteria.
- Analyze production models to identify measurable areas for improvement.
- Improve model performance through targeted retraining and hyperparameter search.
- Deploy improved models while maintaining core model characteristics and avoiding regression.
- Build Python tooling to automate training, evaluation, benchmarking, and experimentation workflows.
- Implement structured evaluation methods, including rubric based scoring and LLM as a judge workflows.
- Document experimental design, benchmark methodology, and performance results with clarity and precision.
- Iterate rapidly in a research driven environment to increase model quality and reliability.
What You Bring
- Current enrollment in or recent completion of a Master’s or PhD in Computer Science, AI, Machine Learning, Computer Engineering, or a closely related technical field.
- Strong experience working with large language models, including supervised fine tuning, prompt engineering, or model evaluation.
- Hands on experience building machine learning pipelines or research infrastructure.
- Experience improving model performance through retraining or hyperparameter tuning.
- Proficiency in Python and comfort working with machine learning frameworks and open source model ecosystems.
- Familiarity with cloud environments such as AWS or Azure.
- Strong technical problem solving ability, including use of LLMs as development aids for building and iteration.
- Ability to work independently with minimal hand holding.
- Strong written communication skills for summarising research and drafting technical documentation.
- Ability to collaborate effectively in a remote research environment.
Additional Details
- Duration: June-August
- Schedule: Full-time
- Work Type: Remote
Why You'll Love Working Here
At Appen, we foster a culture of innovation, collaboration, and excellence. We value curiosity, accountability, and a commitment to delivering the highest quality AI solutions for frontier models.
You’ll work on complex challenges that shape the future of AI across industries and geographies, alongside talented people in a culture that values humility over ego. You’ll have the flexibility to deliver in a way that works for you and your team, supported by tools, resources and development opportunities to continue to build your capability over time.
About Appen
Appen has been a leader in AI training data for over 30 years. We specialise in human generated data to train, fine tune, and evaluate models across generative AI, large language models, computer vision, and speech recognition. Our AI assisted data annotation platform and global crowd of more than 1 million contributors in over 200 countries support model pre training, supervised fine tuning, evaluation and benchmarking, safety and red teaming, and multilingual global expansion.
Top Skills
Python,Large Language Models,Open Source Llms,Supervised Fine Tuning,Prompt Engineering,Hyperparameter Tuning,Machine Learning Frameworks,Aws,Azure
Similar Jobs
Enterprise Web • Fintech • Marketing Tech • Software
The Human Resources Intern will assist in HR operations, onboarding, employee engagement, training design, and maintain HR documentation in various systems.
Top Skills:
ArticulateLearning Management SystemsPaylocitySharepoint
Automotive • Greentech • HR Tech • Sales • Software
The Product Owner will align product strategy with customer needs, manage the product backlog, and ensure effective collaboration with engineering and design teams to deliver customer-focused solutions.
Top Skills:
AgileSaaSScrum
AdTech • Artificial Intelligence • Cloud • Digital Media • Marketing Tech • Analytics • Consulting
The Consultant, Analytics will advise clients on data insights, manage projects in Asana, and maintain client relationships, focusing on Google and Adobe analytics platforms.
Top Skills:
Adobe AnalyticsAdobe LaunchGoogle AnalyticsGoogle Tag ManagerTableauTealium
What you need to know about the NYC Tech Scene
As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.
Key Facts About NYC Tech
- Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
- Key Industries: Artificial intelligence, Fintech
- Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
- Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory



