Senior Site Reliability Engineer at DoubleVerify
Site Reliability Engineer
Location: New York City
Reporting Relationship: Director, Production Systems
Work with a team of senior engineers to ensure performance and stability of cutting edge infrastructure supporting billions of requests a day.
Adapt creative open source solutions where proprietary products couldn't keep up with the demand.
Think big, plan like an engineer, prepare for catastrophe so that when it hits we will be covered.
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
- Troubleshoot issues across the entire stack: hardware, software, application and network
- Drive standardization efforts across multiple disciplines and services
- Mentor engineers across the organization on best practices for everything from monitoring to troubleshooting infrastructure issues
- Scope and create automation for deployment, management and visibility of our services
- Participate in systems design running on both physical and virtualized platforms
- Take active part in design reviews and operational readiness exercises for new and existing services
- Solid understanding of systems and application design, including the operational trade-offs of various designs
- Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
- Practical, solid knowledge of shell scripting and at least one higher-level language
- Demonstrable knowledge of TCP/IP, HTTP, security, and experience supporting multi-tier web application architectures
- Expert level understanding of Linux and Windows servers
- Deep knowledge and understanding of any of the following: Ansible, TeamCity, Jenkins, Puppet, Chef, etc.
- Minimum 3+ years of managing services in an internet scale *nix environment
- Must work well with and be able to influence countless personalities at all levels
- Ability to prioritize tasks and work independently
- Must be adaptable and able to focus on the simplest, most efficient & reliable solutions