Peloton is looking for a Site Reliability Engineer with a focus on Kubernetes operations to work with teams across the organization to help build and maintain a monitorable, performant, reliable and highly-scalable deployment platform. We are a growing team of engineers tackling challenging problems with scaling Kubernetes to handle thousands of nodes and pods spread across many deployments.
The Kubernetes working group at Peloton works closely with development teams to ensure that the platform is robust, stable, and delivers features that include the following:
- Automatic, fast autoscaling for live rides and special large events
- Hosting critical infrastructure that ensures that our members have the best experience possible on tens of thousands of pods across multiple clusters
- Provides a platform for machine learning (and other awesome workloads) so that we can be at the forefront of the industry
- Allows developers to move quickly and experiment, without getting in the way
What You'll Be Doing:
- Evangelize best practices for building and operating highly reliable systems
- Serve as subject matter expert in observability and monitoring
- Consult in system design to meet reliability and capacity requirements
- Automate everything, from infrastructure down to day-to-day tasks.
- Conduct timely post-mortems of infrastructure incidents
- Assist with all aspects of operational security and compliance
- Seek out potential threats to security and reliability and advocate solutions
- We work with Amazon Web Services, Chef, Python, Ubuntu, Nginx, Jenkins, and Terraform
What We’re Looking For:
- Experience maintaining scalable and stable Kubernetes clusters.
- Knowledge of best practices when it comes to the observability and monitoring required of running Kubernetes at scale.
- Knowledge of best practices in regards to securing a Kubernetes cluster and its deployments at scale.
- A passion for helping development teams make the transition to a container-native world.
- Experience with CI/CD Systems such as for example: Jenkins, ArgoCD, Harness, Tekton, etc.
- Experience deployment infrastructure using Infrastructure as Code utilities such as Terraform or Pulumi.
- Know when to triage and when to dive down into a root-cause analysis.
- Passion for reliable, scalable, observable software with strong sense of ownership.
- Experience with a programming language like Python, Golang, Java, C.
Peloton is the largest interactive fitness platform in the world with a loyal community of more than 2.6 million Members. The company pioneered connected, technology-enabled fitness, and the streaming of immersive, instructor-led boutique classes for its Members anytime, anywhere. Peloton makes fitness entertaining, approachable, effective, and convenient, while fostering social connections that encourage its Members to be the best versions of themselves. An innovator at the nexus of fitness, technology, and media, Peloton has reinvented the fitness industry by developing a first-of-its-kind subscription platform that seamlessly combines the best equipment, proprietary networked software, and world-class streaming digital fitness and wellness content, creating a product that its Members love. The brand's immersive content is accessible through the Peloton Bike, Peloton Tread, and Peloton App, which allows access to a full slate of fitness classes across disciplines, on any iOS or Android device, Fire TV, Roku, Chromecast and Android TV. Founded in 2012 and headquartered in New York City, Peloton has a growing number of retail showrooms across the US, UK, Canada and Germany. For more information, visit www.onepeloton.com.