Picture this: It’s Friday evening and you’ve made reservations at your favorite restaurant. But when you arrive, the host tells you there’s no record of a reservation under your name.
It’s now a three-hour wait to get a table. What could have happened?
The app you used to make the reservation probably had a bug the engineers weren’t aware of, but for restaurant technology platform Resy, the above example is not an issue.
Thanks to testing in its staging environment, Resy Engineering Director Eric M. said his team is able to catch bugs before code is deployed. The company even implemented a “Friday Night Simulator” that ensures the app can handle anything thrown its way.
To get a quick pulse of the staging environment best practices his team implements to ensure a bug-free user experience, Built In NYC connected with Eric.
What’s a critical best practice your team follows when developing staging environments?
We use our staging environment as a development sandbox, allowing engineers to test their code in conjunction with other services and projects. Once code is tested locally and merged in source control, it’s automatically deployed to staging.
As in any test environment, it’s important to replicate conditions that are present in production whether related to data, environmental or infrastructure setup. This can often prove to be difficult for many teams, as production is naturally a larger cluster with rapidly changing data sets.
We have created scripts to allow us to seed our staging environments with realistic data to mimic real-world scenarios that exist on our production cluster without having to clone or introduce personally identifiable information. We opt to use our own internal APIs for a good part of these actions to ensure the solution will be less fragile.
We also use seed data as a foundation to build other helpful tools including our “Friday Night Simulator,” which combines the use of seed data with rapid API calls to ensure our app can handle more load than any restaurant or restaurant partner can throw at it. This performance test has helped us identify potential slowness introduced via staging.
Having as much automation on hand as possible can go a long way when it comes to keeping a staging environment up to date and stable.”
What processes does your team have in place for monitoring and maintaining your staging environment?
We utilize continuous integration (specifically Jenkins) to not only automatically deploy to staging when code is merged, but also build, version, monitor and deploy. If staging itself or the code housed within has an issue, we are the first to know via various email or Slack alerts. Since staging is our development sandbox, uptime and reliability are extremely important.
Another form of alerting our test suites, which run when code is merged and deployed to staging via a signal, is from a variety of Github hooks. Often tests themselves can be a great offshoot from traditional monitoring. Additionally, the tests allow a greater level of confidence for developers by regression testing code together with other projects and services currently being worked on. The targeted regression testing, which is done on staging ahead of running our entire suite, allows for fast feedback and changes, saving our developers time and the need for manual intervention.
What’s a common mistake engineering teams make when it comes to staging environments?
No team is perfect. They often have a completely diverse set of strengths and weaknesses when it comes to overall testing and deployment infrastructure. Keeping data in a reliable state to simulate production scenarios is a frequent problem test engineering teams face. Data is often a difficult area to maneuver since it’s constantly changing and simply cloning from production is not feasible in every case due to privacy and security concerns. Having as much automation on hand as possible can go a long way when it comes to keeping a staging environment up to date and stable.
To address these concerns, we have taken the data generated from the scripts mentioned above and integrated them in our continuous integration server as a variety of jobs. The newly created jobs accept a wide range of arguments to create test data on demand. Our data creation jobs are also exposed to developers and test engineers who can either run them manually as needed to test a specific scenario or integrate the job in a test case to create data during test execution with a quick API call, allowing for more robust and automated testing infrastructure.