Roku Logo

Roku

Senior Machine Learning Engineer, DevOps/SRE

Posted 2 Days Ago
Be an Early Applicant
In-Office
Austin, TX
Senior level
In-Office
Austin, TX
Senior level
Design, operate, and scale cloud-native ML infrastructure across GCP and AWS (GPU/TPU), build CI/CD for models, maintain low-latency real-time inference systems, define observability and monitoring for ML models, participate in on-call incident response, and partner with data scientists to improve MLOps and platform usability.
The summary above was generated by AI
Teamwork makes the stream work.
Roku is changing how the world watches TV

Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers.

From your first day at Roku, you'll make a valuable - and valued - contribution. We're a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines.


About the team 

The Advertising Performance group focuses on performance for all participants in the Advertising ecosystem - Advertisers, Publishers, and Roku. The systems and solutions span multiple disciplines and technologies to perform real-time multi-objective optimization across distributed systems at large scale and with low latency. We use Machine Learning, Reinforcement Learning, AI, Control and Optimization Systems, and Auction Dynamics to solve a large set of complex problems. At the core of this is our Machine Learning, Experimentation, and Inference Platform, which powers the entire landscape, and we continuously evolve. 

  

About the role 

We are seeking a talented and experienced Senior Software Engineer, MLOps/DevOps, to join the Advertising Performance team and play a critical role in supporting and scaling our Machine Learning infrastructure. The ideal candidate has a strong background in DevOps/SRE practices, cloud infrastructure management, and MLOps tooling — with a passion for building platforms that accelerate ML experimentation and deployment at internet scale. 

You will partner closely with ML Scientists and Engineers to streamline the end-to-end ML lifecycle across training, evaluation, deployment, and monitoring — on top of a modern, cloud-native stack running on GCP and AWS using Kubernetes, Apache Airflow, Spark, Ray, MLflow, Chronon, etc.

 What you’ll be doing 
  • Lead the design and operation of scalable, production-grade cloud infrastructure for ML workloads across AWS and GCP, including GPU/TPU-based training and inference environments
  • Architect and improve CI/CD systems for ML models and platform services to enable fast, reliable, and safe production releases
  • Own and evolve low-latency infrastructure for real-time model inference, including KV store and vector databases
  • Define and enforce observability standards for ML systems, including model performance monitoring, drift detection, capacity planning, and pipeline health metrics
  • Participate in on-call rotation, leading incident response and root-cause analysis for critical ML training and serving infrastructure
  • Partner with data scientists and ML engineers to improve platform usability, accelerate model iteration, and implement strong MLOps and SRE best practices
  • Champion operational excellence across ML infrastructure through automation, resilience engineering, disaster recovery planning, and continuous improvement

We’re excited if you have 
  • BS or MS in Computer Science, Engineering, or a related quantitative field
  • 8+ years of experience in DevOps, SRE, or ML infrastructure, including 4+ years supporting large-scale ML or AI systems
  • Strong programming skills in Python and/or Scala or Java for platform automation and tooling
  • Deep experience with Kubernetes and container orchestration on GCP (GKE) and/or AWS (EKS)
  • Expertise with NoSQL or low-latency data stores such as Aerospike or similar technologies
  • Hands-on experience with data and orchestration technologies such as Apache Spark, Apache Flink, Apache Airflow, and Kafka
  • Experience building and maintaining CI/CD systems using tools such as Jenkins or GitLab Runner
  • Familiarity with feature engineering platforms such as Chronon and model lifecycle tools such as MLflow
  • Strong infrastructure-as-code experience with Terraform or similar tooling
  • Experience with observability platforms such as Prometheus, Grafana, and Datadog
  • Excellent communication and cross-functional collaboration skills
  • Experience in the Advertising domain is a plus 
#LI-DH2

Our Hybrid Work Approach

Roku fosters an inclusive and collaborative environment where teams work in the office Monday through Thursday. Fridays are flexible for remote work except for employees whose roles are required to be in the office five days a week or employees who are in offices with a five day in office policy.


Benefits

Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Employees are supported in taking time off, in accordance with local leave policies and other personal needs to support their evolving work and life needs. It's important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter.


Accommodations

Roku welcomes applicants of all backgrounds and provides reasonable accommodations and adjustments in accordance with applicable law. If you require reasonable accommodation at any point in the hiring process, please direct your inquiries to [email protected].


The Roku Culture

Roku is a great place for people who want to work in a fast-paced environment where everyone is focused on the company's success rather than their own. We try to surround ourselves with people who are great at their jobs, who are easy to work with, and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. We're independent thinkers with big ideas who act boldly, move fast and accomplish extraordinary things through collaboration and trust. In short, at Roku you'll be part of a company that's changing how the world watches TV. 

We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers, which itself is a two-part idea. We come up with the solution, but the solution isn't real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation, one that has served us well since 2002. 

To learn more about Roku, our global footprint, and how we've grown, visit https://www.weareroku.com/factsheet.

By providing your information, you acknowledge that you want Roku to contact you about job roles, that you have read Roku's Applicant Privacy Notice, and understand that Roku will use your information as described in that notice. If you do not wish to receive any communications from Roku regarding this role or similar roles in the future, you may unsubscribe at any time by emailing [email protected].

Similar Jobs

35 Minutes Ago
Hybrid
Senior level
Senior level
Digital Media • Information Technology • News + Entertainment
Manage a portfolio of mid-market enterprise accounts to drive revenue and retention. Identify and close incremental sales, renew contracts, position Comcast Business products (Advanced Voice, Metro Ethernet, Business Class), maintain customer satisfaction, handle escalations, collaborate cross-functionally, and meet quota, forecast and retention targets.
47 Minutes Ago
Hybrid
New York, NY, USA
147K-278K Annually
Senior level
147K-278K Annually
Senior level
Cloud • Software
Responsible for maintaining FedRAMP-compliant infrastructure, collaborating with software engineers, and ensuring system availability and security. Duties include infrastructure design, automation, monitoring, and incident response.
Top Skills: AWSGoKubernetesPuppetPythonTerraform
50 Minutes Ago
Remote or Hybrid
155K-410K Annually
Senior level
155K-410K Annually
Senior level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Lead finance consulting engagements focused on revenue integrity, CDI, and HIM. Assess and improve financial processes, implement systems and automation, develop financial strategies, and maintain executive client relationships. Drive business development, mentor staff, and ensure adherence to professional and independence standards while delivering operational and cost improvements.

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

  • Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
  • Key Industries: Artificial intelligence, Fintech
  • Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
  • Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account