Senior Engineer: Cloud Storage SRE
Have you ever wondered what happens inside the cloud?
Based in New York, DigitalOcean is a dynamic, high-growth technology company that serves a robust and passionate community of developers, teams, and businesses around the world. We believe that today’s entrepreneurs are changing the world through software. Our mission is to empower these entrepreneurs by bringing modern app development within reach for any developer, anywhere in the world.
We want people who are passionate about building the systems, culture, and processes that will improve the resiliency, reliability, scaling, and performance for cloud services.
We are looking for an experienced Site Reliability Engineer to work closely with our product engineering and infrastructure teams. Reporting to the Director of Storage Engineering, the Site Reliability Engineer will be performing a mix of hands-on development, coaching, and collaborating with other teams and stakeholders to help bring DigitalOcean’s engineering systems and culture up to the next level.
This is a key opportunity to make a significant impact in DigitalOcean’s storage engineering systems, contributing to storage monitoring and performance and building high resiliency features. This role is essential to accelerate the improvement of the high expectations our customers have of DigitalOcean as we continue to grow and expand.
What You’ll Be Doing:
- Performing hands on technical work to directly improve the reliability, resiliency, and scaling of our Storage product offerings and architecture.
- Contributing to research and tooling for storage monitoring and performance improvement to provide solid SLAs for our customers.
- Working with stakeholders to develop and implement reliability and performance metrics
- Facilitate DigitalOcean’s culture of learning by providing insight and recommendations for improvement
- Coaching teams and individuals on reliability best practices and solutions
- Working with other SREs and engineering leaders to define the architectures and practices that should be adopted in order to deliver on our engineering and operational goals
- Establishing best practices for development, architecture, deployment, and operations
- Working with peer SREs to improve services and processes (including architecture reviews, incident response, monitoring) in a cross-functional manner throughout the engineering organization
What We’ll Expect From You:
- Distinguished track record as SRE (or similar role) with hands-on experience implementing reliability, process, and scaling solutions
- Expertise in operating large cloud-based storage clusters for cloud data centers and domain knowledge of Networking and Storage stack.
- History of fostering positive relationships with stakeholders and a track record of successful collaboration and coaching
- Clear communication skills (both written and verbal) to document processes and architectures
- Experience implementing disaster recovery best practices
- Demonstrated ability to lead system recovery efforts for a major outage
- Developing robust solutions that facilitate streamlined resolution of customer inquiries through use of technologies for automation, deflection, and issue management
- Adept in Python, Ruby and Go with a broad understanding of the full technology stack for a modern infrastructure
- Advocate of effective development environments with the use of CI/CD tooling and configuration management technologies such as Chef or Ansible
- You’ve been in and/or have worked inside a modern data center, and have war stories to share and learn from
Why You’ll Like Working for DigitalOcean:
- We have amazing people. We can promise you will work with some of the smartest and most interesting people in the industry. We work hard but we always have fun doing it. We care deeply about each other and take our “no jerks” rule very seriously.
- We value development. We are a high-performance organization that is always challenging ourselves to continuously grow. That means we maintain a growth mindset in everything we do and invest deeply in employee development. You’ll need to be great to get hired here and we promise you’ll get even better.
- We care about you. We offer competitive health, dental, and vision benefits for employees and their dependents, a monthly gym reimbursement to support your physical health, and a monthly commute allowance to make your trips to and from work easier.
- We invest in your future. We offer competitive compensation and a 401k plan with up to a 4% employer match. We also provide all employees with Kindles and reimbursement for relevant conferences, training, and education.
- We want you to love where you work. We have great office spaces located in the heart of SoHo NYC and Cambridge and offer daily catered lunches to keep your hunger at bay. We’re also very remote-friendly—we use Slack to communicate across the company—and all remote employees have the opportunity to onboard in-office and take an all-expenses paid trip to our annual company offsite, Shark Week, to get quality in-person time with the team at least once a year. We also allow employees to customize their workstations to meet their needs—whether remote or in office.
- We value diversity and inclusivity. We are an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Want an inside look into life at DO? Clickhere to hear from our employees!