Staff Site Reliability Engineer at Bread
Bread, a division of Alliance Data, is a technology-driven payments company that partners with merchants and partners to personalize payment options for their customers. Bread integrates directly with merchants on their ecommerce site and gives them a single platform that lets them offer more ways to pay over time. Bread’s full-funnel recommendation engine serves up the right options at the right time, empowering merchants to sell more, improve conversion, and lift average-order-value.
Your role at Bread:
- Work alongside our Product, Quality, Support, Security and DevOps Engineering teams to ensure that we are setting, meeting, and exceeding meaningful measures per system to ensure the highest quality of service to our external and internal clients.
- Work alongside those same teams to help continuously improve our CI/CD pipelines.
- Help enhance existing monitoring, reporting, and alerting systems, as well as build new tools to accomplish your goals.
- Identify bottlenecks both within a single service and across the environment to increase performance and reduce latency anywhere it is found.
- Evaluate, identify, and automate manual or otherwise impactful processes that impede our ability to deploy, scale, and support our platform.
- Act as an advisor and ambassador to ensure that consistent levels of service can be delivered across all systems and teams by performing code, design, documentation, and process reviews as well as participating in post mortems of production incidents.
- Be part of an on-call rotation to ensure incident response automations and runbooks are operating as efficiently as possible, and to serve as an escalation point when needed.
What we are looking for:
- Experience with supporting large and complex microservice platforms in a production cloud environment
- Experience with AWS, Kubernetes, EKS, Docker, Terraform, git & Ansible
- Experience with implementing and using CI/CD platforms like Wercker, Codefresh, CircleCI
- Software development experience in multiple languages (Go is a plus), and an understanding of the SDLC
- Experience with observability platforms like LogicMonitor, DataDog or NewRelic
- Experience with Splunk, Elastic, or similar
- Excellent collaboration, documentation, and communication skills
- To be comfortable and productive working as part of a geographically distributed team