Site Reliability Engineer II - DBE Automation and Observability
About The Opportunity
Grubhub is dedicated to connecting hungry diners with our wide network of restaurants across the country. Our innovative technology, easy-to-use platforms and streamlined delivery capabilities make us an industry leader today, and in the future of online food ordering.
We strive to create a workplace that reflects the diversity of our customers and the communities we serve. When you join our team, you become part of a community that works together to innovate, solve problems, take risks, grow, work hard and have a ton of fun in the process!
Why Work For Us
We have a fast-paced environment and that is what our teams thrive on. Grubhub believes in empowering people and offering opportunities for development, as well as professional growth. We value strong, positive relationships in all areas: with each other, our customers and our greater community. Want to be a part of a team of diverse collaborators in an authentically fun culture? If so, we want to talk to you - and hear what’s your favorite restaurant for food delivery!
More About the Role
GrubHub is looking for an experienced SRE specialized in managing large critical data persistence platforms including Cassandra and Elasticsearch on AWS. Grubhub platform supports high volume applications in a container based microservice architecture running on multiple AWS regions in fully Active/Active mode. The entire platform is powered by a very large multi-datacenter Cassandra infrastructure for persistence, and Elasticsearch for indexing and scaling search and content experience. You will be working with a team of passionate and skilled engineers responsible for automation, scaling, tuning, and troubleshooting of Elasticsearch and Cassandra databases. You will also collaborate and work with a diverse group of engineers across the organization to design and engineer solutions
The Impact You Will Make
- Manage large critical Cassandra and Elasticsearch clusters supporting Millions of transactions per day
- Build systems to automate all build and maintenance tasks using Ansible and python
- Develop self-service tools to allow engineers to manage and provision resources with GrubHub best practices and standards
- Monitor cluster availability, read/ write latencies, and other key performance metrics to proactively identify SLO misses and help mitigate issues
- Evaluate new technologies, tools, and software versions. Test, plan and develop roadmaps
- Tune Cassandra and ES databases for optimizing throughput and read /write latencies
- 24X7 on-call rotation support with rest of team for rapid incident response
- Implement DR strategies, including backups and recovery techniques with minimal downtime.
- Work with other engineers to manage our data persistence integration and performance with the GrubHub platform.
- Proactively monitor and scale Elasticsearch/Cassandra clusters to handle growth in traffic
What You Bring to the Table
- Experience developing backend applications in Python or Java
- Experience managing, working or developing large Elasticsearch clusters in highly available 24x7 production environments
- Experience automating the maintenance of infrastructure using Python and Ansible or similar tools.
- Strong experience managing automated cloud infrastructures on AWS or other major cloud providers.
- Experience managing large Cassandra clusters in production is a strong plus.
- Experience working with docker is a plus
- Ability to quickly learn new concepts and technologies and adapt to changing needs
Additional Content:
- How Grubhub uses Elasticsearch
- How Grubhub guarantees critical microservice actions
And Of Course, Perks!
- Flexible PTO. Grubhub employees are provided a generous amount of time to recharge their batteries.
- Health and Wellness. We provide programs that support your overall well-being such as generous medical benefits, employee network groups, company-wide fitness challenges, and a comfortable and casual workplace! We also support our parents by offering 8 weeks of paid parent bonding time, a 4-week returnship program, and 6-8 weeks paid medical leave.
- Learning and Career Growth. Your personal and professional development is a priority at Grubhub. From day one, we empower you to lead and be an active participant in your career growth. We provide continuous learning opportunities, training, and coaching and mentorship programs.
- MealPerks. Who’s ready for some lunch? We provide our employees with a weekly Grubhub credit to enjoy and support local restaurants. We also offer company-wide meals several times a year to bring our Grubhub family together.
- Fun. Every Grubhub office has an employee-led Culture Crew that connects people through fun, meaningful events and initiatives. Some of our popular past events include: Wing-eating contests, Grubtoberfest, 5k Runs, Bring Your Child to Work Day, regular happy hours, and more!
- Social Impact. We believe in the importance of serving the communities that support our business. In addition, employees are given paid time off each year to support the causes that are important to them.
Grubhub is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, and other legally protected characteristics. The EEO is the Law poster is available here: DOL Poster. If you are applying for a job in the U.S. and need a reasonable accommodation for any part of the employment process, please send an e-mail to [email protected] and let us know the nature of your request and contact information. Please note that only those inquiries concerning a request for reasonable accommodation will be responded to from this email address.