Headquartered in New York but based around the world, DigitalOcean is a dynamic, high-growth technology company that serves a robust and passionate community of developers around the world. Our mission is to simplify cloud computing for every developer. We are working on solving some of the most challenging and interesting technology projects around, on a scale unmatched by most.We want people who are passionate about optimizing and troubleshooting data center hardware at mega-scale so our customers don’t have to worry!We are looking for a well-rounded hardware technologist who lives and breathes data center metal. Reporting to the Manager, Hardware Infrastructure Engineering, the Hardware Infrastructure Engineer will create hardware monitoring systems across DigitalOcean products and components and join a team of engineers who have already paved the way for the many thousands of servers in our infrastructure. The ideal candidate will be eager to face new challenges as DigitalOcean continues to scale its data center footprint. What You’ll Be Doing:
- Improve existing and establish new hardware monitoring tooling, for both hardware platforms and sub-systems (NIC, Storage and BMC)
- Work with operations teams to create actionable processes based on hardware alerts
- Establish failure rate (AFR) metrics reporting for server hardware and examine results (to identify unusual chronic hardware issues)
- Lead projects on internal roadmap to improve team-owned services for both internal stakeholders and customers. E.g. collect and deliver Storage wear level insights to the business to advise capacity planning and product strategy
- Triage, Investigate and resolve system hardware issues with DigitalOcean servers (both customer-facing and internal)
- Mentor other DigitalOcean employees on configuration management best practices, troubleshooting intuition, and clean documentation
- Technical Degree (BS Computer Science/Engineering) or equivalent practical experience
- Experience building, maintaining and scaling hardware data pipelines
- Strong understanding of x86 server hardware architecture and subsystems. Ideally, you’ve worked with non-x86 hardware too!
- Have operated a data monitoring/ingestion stack, e.g. Prometheus, Grafana, ELK, Graphite, etc
- Demonstrated professional proficiency automating server components at large-scale using industry-standard tooling (Redfish, IPMI, etc)
- Adept at Linux (or Unix) operating systems. You’ll be spending a lot of time working in one!
- Comfortable with version control systems (we use Git)
- Ability to participate in 24/7 on-call rotation with other members of the team
- Excellent communication skills, both within the team and with the broader company
- Have an insatiable passion for hardware, both new and old
- We value development. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that is always challenging ourselves to continuously grow. We maintain a growth mindset in everything we do and invest deeply in employee development through formalized mentorship, LinkedIn Learning tracks, and other internal programs. We also provide all employees with reimbursement for relevant conferences, training, and education.
- We care about your physical, financial and mental well-being. We offer competitive health, dental, and vision benefits for employees and their dependents, a monthly gym stipend to support your physical health, and a commute or internet allowance to make your trips to your office or your desk easier. We offer generous parental leave with transition time built-in upon return to work. We offer competitive compensation and a 401k plan with up to a 4% employer match.
- We support our remote employee experience. While we have great office spaces in NYC and Cambridge, we’re very distributed—we use a number of communication tools to connect across the company—and all remote employees have the opportunity to visit our offices and meet their teams face-to-face at team offsites. We also have an annual company offsite, Shark Week, to get quality in-person time with the entire company at least once a year. We also allow employees to outfit their workstations to meet their needs—whether remote or in office.
- We value diversity and inclusivity. We are an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
For all CO residents, please click here
Department: Engineering #LI-Remote
Want to learn more about our Engineering team? Click here!
Want an inside look into life at DO? Click here to hear from our employees!