Site Reliability Engineer (SRE-Kubernetes platform)
Peloton is looking for a Site Reliability Engineer with a focus on Kubernetes operations to work with teams across the organization to help build and maintain a monitorable, performant, reliable and highly-scalable deployment platform. We are a growing team of engineers tackling challenging problems with scaling Kubernetes to handle thousands of nodes and pods spread across many deployments.
The Kubernetes working group at Peloton works closely with development teams to ensure that the platform is robust, stable, and delivers features that include the following:
- Automatic, fast autoscaling for live rides and special large events
- Hosting critical infrastructure that ensures that our members have the best experience possible on tens of thousands of pods across multiple clusters
- Provides a platform for machine learning (and other awesome workloads) so that we can be at the forefront of the industry
- Allows developers to move quickly and experiment, without getting in the way
What You'll Be Doing:
- Evangelize best practices for building and operating highly reliable systems
- Serve as subject matter expert in observability and monitoring
- Consult in system design to meet reliability and capacity requirements
- Automate everything, from infrastructure down to day-to-day tasks.
- Conduct timely post-mortems of infrastructure incidents
- Assist with all aspects of operational security and compliance
- Seek out potential threats to security and reliability and advocate solutions
- We work with Amazon Web Services, Chef, Python, Ubuntu, Nginx, Jenkins, and Terraform
What We’re Looking For:
- Experience maintaining scalable and stable Kubernetes clusters.
- Knowledge of best practices when it comes to the observability and monitoring required of running Kubernetes at scale.
- Knowledge of best practices in regards to securing a Kubernetes cluster and its deployments at scale.
- A passion for helping development teams make the transition to a container-native world.
- Experience with CI/CD Systems such as for example: Jenkins, ArgoCD, Harness, Tekton, etc.
- Experience deployment infrastructure using Infrastructure as Code utilities such as Terraform or Pulumi.
- Know when to triage and when to dive down into a root-cause analysis.
- Passion for reliable, scalable, observable software with strong sense of ownership.
- Experience with a programming language like Python, Golang, Java, C.
ABOUT PELOTON:
Peloton uses technology + design to connect the world through fitness, empowering people to be the best version of themselves anywhere, anytime. We have reinvented the fitness industry by developing a first-of-its-kind subscription platform. Seamlessly combining hardware, software, and streaming technology, we create digital fitness and wellness content and products that Members love. In 2020 Peloton committed to becoming an antiracist organization with the launch of the Peloton Pledge. Learn more, here.
“Together We Go Far” means that we are greater than the sum of our parts, stronger collectively when each one of us is at our best. In order to be the best version of Peloton, we are deeply committed to building a diverse workforce and inclusive culture where all of our team members can be the best version of themselves. This work has no endpoint; it is the constant work of running an organization that strives to reach its full potential. As a first step in our commitment, we announced the Peloton Pledge to invest $100 million over the next four years to fight racial injustice and inequity in our world, and to promote health and wellbeing for all, from the inside out.
Peloton does not accept unsolicited agency resumes. Agencies should not forward resumes to our jobs alias, Peloton employees or any other organization location. Peloton is not responsible for any agency fees related to unsolicited resumes.