Site Reliability Engineer: Developer Automation
Description:
Discovery hires the very best and brightest talent who are enthusiastic and passionate to fulfill the company’s mission of empowering people to explore their world and satisfy their curiosity.
In exchange for their talent and drive, employees are provided with an engaging, diverse workplace and the resources they need to learn, thrive and grow in their careers.
Job Summary
The Direct to Consumer Group (DTC) is a technology company within Discovery that is responsible for building a global streaming video platform to support a broad collection of Discovery’s diverse brands around the world including Discovery, TLC, Food Network, Investigation Discovery, Animal Planet, Science Channel, HGTV, Eurosport, MotorTrend, and many more.
DTC’s software engineering teams build applications for the web, mobile, tablets, connected TVs, consoles, and other streaming devices. Those applications are backed by a fleet of modern, cloud-native microservices deployed to Kubernetes within AWS. It is a fast-growing, global engineering group crucial to Discovery’s future.
Responsibilities
As an engineer in the Developer Automation group within DTC, you’ll be joining a group that is responsible for building a truly global, self-service platform to enable DTC’s growing number of engineering teams to build, test, deploy, and manage the complete operational life cycle of their services in a fully autonomous fashion.
Your role will focus on the development of the observability services that make up the heart of our common platform services, including metrics, logging, and distributed tracing. You will solve problems related to managing (moving, storing, and querying) large-scale data sets, aggregating data from many Kubernetes clusters, integrating data with alerting and incident management systems, data manipulation (segmenting, tokenizing, normalizing, etc), and you’ll do it all within the architecture of the global platform being built by the teams you collaborate with.
Who You Are
You thrive in an environment where your engineering team members live for delivering great software. You have experience running observability services in production that support multiple systems in a scalable and performant manner. You live for data, are consumer-obsessed, and take immense pride in your work.
Requirements
The ideal candidate for this role will deep expertise in at least one of the following:
• Metrics collection/querying at scale (e.g. Prometheus w/ Cortex or Thanos, InfluxDB)
• Log collection/querying at scale (e.g. distributed ELK/EFK stack)
• Distributed tracing at scale (e.g. OpenTracing, Jaeger)
In addition, your skills should match well to the following:
• You have previous experience as a software developer and have a good understanding of how to implement observability frameworks & tools (e.g. statsd, log4j)
• Understanding of statistical methods commonly used in measuring and decision-making for software applications (e.g. percentiles, anomaly detection, outlier detection)
• Hands-on experience with storage engines used to store time series and/or line-based log data
• Operational experience (i.e. on-call rotation, incident response)
• Ability to collaborate effectively with remote peers across disparate geographies and timezones
• Excellent written and verbal communication skills with particular emphasis on technical documentation (including diagramming)
• Strong CS fundamentals
Discovery Communications, Inc. is an equal opportunity employer. Discovery is committed to being an employer of choice, not just a good place to work, but a great and inclusive place to work. To that end, we strive to recruit and maintain a workforce that meaningfully represents the diverse and culturally rich communities that we serve. Qualified applicants will receive consideration for employment without regard to their race, color, religion, national origin, sex, sexual orientation, gender identity, protected veteran status or disabled status or, genetic information.
We will consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws, including but not limited to all local Fair Chance Ordinances.