Software Engineer - Site Reliability
What You'll Do
- Own the metrics and instrumentation infrastructure (Time Series Database, Metrics Streaming, Visualization and Alerting) and provide guidance to platform / product teams on how metrics are generated and consumed. We believe that our metrics data forms the foundation of effective engineering teams.
- Work with other infrastructure and platform engineers to scale our operations globally across multiple AWS regions and cloud providers.
- Relentlessly improve our site performance, latencies and reliability.
- Design and build developer tools to help analyze code performance and fix problems and to debug production issues.
- Working with development teams to ensure that code/features meet production performance criteria.
What We Look For
- Extensive experience building and owning large-scale, geographically distributed backend systems is a plus.
- Highly skilled at developing and debugging in one or more programming languages.
- Experience with operating system internals, filesystems, databases, and networks.
- You prefer building upon OpenSource solutions to starting from scratch.
- Unquenchable thirst for knowing everything within your platform and learning new technologies.
- You obsess about performance and metrics.
- Python and Linux experience is a plus.
- Experience with AWS and/or other cloud providers is a plus.
- Computer Science Degree with industry experience is required.