By Clay Smith
In the latest Docker release, version 1.12, the new health check instruction for Dockerfiles reflects the changing nature of application development. The combination of dynamic infrastructures spanning multiple data centers, complex service dependencies, and stringent uptime requirements is dramatically affecting how individual applications are designed and deployed. At the same time, with the increased popularity of microservices, dev and ops teams are working with smaller, more frequently deployed services.
Docker’s health check instruction supplies important additional information for running containers in these increasingly complex environments. While the “docker ps
” command makes it easy to determine if a container is running, a health check lets you specify a command in a Dockerfile for a container-specific way to determine readiness. We’ll go through a simple example of using the instruction with an existing Node.js application that we had created for this post on The Modern Developer Workstation on MacOS With Docker.
Your container is running. But is it healthy?
Apart from processing exit codes, Docker, by design, doesn’t know much about the internal workings of containerized applications. When “docker run
” is invoked from the command-line interface (CLI), it often starts a single process specified using the CMD
instruction in a Dockerfile. That determines the result of the “STATUS” column in the docker CLI with the ps
command. The -a
flag lists all containers, running or not, as shown here:
Output of “docker ps -a”
This output indicates that some containers are actively running while others have exited, either successfully or with an error (“Exited (1)
”). Unfortunately, even applications that show a status of “Up 16 minutes” may serve 500 errors or could be stuck in an infinite loop. Adding HEALTHCHECK
to the container Dockerfile helps address this issue.
Writing your first health check
Health checks can be any single command. They run inside the container and if the command exit code is 0, the container is reported as healthy, and if the output is 1, the container is marked as unhealthy.
By default, as of the latest Docker 1.21.1 RC release, the health check command runs every 30 seconds, retries three times before failing, and times out after 30 seconds. These values are all configurable and should ideally be related to a service’s SLA requirements and tuned during game-day simulations.
Unless specifically disabled or overridden in the Dockerfile, it is also inherited from base images. This lets developers define a standard health check for similar applications.
For example, a common verification for web-facing application containers checks the HTTP status code from an app running in a container. Using curl’s --fail
option, this command can return a status code 1 if an HTTP error is returned from the service (running on port 3000 in this example):
HEALTHCHECK CMD curl --fail https://localhost:3000/ || exit 1
After the container image is rebuilt and run, the health state is returned in parentheses in the output of docker ps
:
If a code change causes the application to send an error status code (such as 500), the container health status changes to unhealthy after three failures (the default) because the check command returns “1”:
Health checks take time to complete. When a health check is executing for the first time, the status is shown as “starting”:
Any health changes trigger a Docker event (heath_status
) so you can react to changes without resorting to polling the Docker engine. Although this instruction is still new, it promises to help developers build more resilient software in a variety of scenarios: deployment and software load balancing, for example.
Finally, for debugging health checks, the Docker inspect
command lets you view the output of commands that succeed or fail (here in JSON format):
docker inspect --format='{{json .State.Health}}' your-container-name
Health check meets complex deploys and orchestration
As of early July 2016, the new orchestration features in Docker Swarm mode services are utilizing a health check to manage zero-downtime deployments. Only when the status changes to “healthy” will it start routing traffic to container instances. This pattern is also used by other popular orchestration frameworks. For development and operations teams, health checks can help avoid an unfortunate failure scenario—code deployed to production that runs but always returns a mysterious internal server error due to a configuration error.
Regardless of whether your team is using bleeding-edge orchestration or not, Docker’s configurable health check instruction allows teams to define their own idea of what makes their application running in a container “ready.” As discussed in my article Writing Better HTTP Health Checks for Complex Infrastructure, the time and effort put into this pattern is ultimately about team health. Service health checks can make increasingly large and complex systems easier to manage, deploy, and troubleshoot.
Thanks to Mike Goelzer of Docker for his invaluable suggestions for this post.
About the Author
Clay Smith is a Developer Advocate at New Relic in San Francisco. He previously has worked at early stage software companies as a senior software engineer, including founding the mobile engineering team at PagerDuty and shipping one of the first iOS apps written in Swift. View posts by Clay Smith.