Apgar is a quick and dirty health checker driver written in go.
MIT License
We wanted a quick, simple and standardized way of doing health checks for the various services in our environment.
Apgar walks a directory tree (by default /etc/apgar/healthchecks
), runs the healthCheck scripts it finds there in parallel, to keep the run time as short as possible, and aggregates the results into a directory (/var/lib/apgar
by default).
The status directory is then served by a simple standalone web server so the results can be used as health checks by Amazon Load Balancers and Auto Scaling Groups.
Apgar consists of two parts, apgar-server
which serves the health information, and apgar-probe
which collects & aggregates the individual server health checks.
apgar-probe
runs the health checks in parallel, and will report failed status as soon as any of them fail - it does not wait until all checks are complete to report failure. Status is scrapable at hostname:9000/status
, but you can override the port in config.toml
.
An apgar health check must be:
apgar-probe
relies on the exit code of the check to determine OK/FAIL, not any text output of the health check script.--verbose
is passed on the command line. If called with --quiet
, it may print nothing at all, but must still exit zero or non-zero.
apgar-probe
walks the healthchecks directory and immediately runs any check scripts it finds in parallel as soon as it finds them, so check scripts must not assume that they will be run in a particular order, or that other check scripts will not be running simultaneously with them.apgar-probe
will run all the checks in parallel, it is better to have 3 separate tests that each run in N milliseconds than one test that runs in 3N milliseconds.To make it easy to install on both Debian and CentOS based systems, the included Rakefile
can build both deb and rpm files - rake deb
will build a deb file, and rake rpm
will build a rpm file. This requires rake
and bundler
, but only on your build machine, not on machines you're going to install Apgar on.
Virginia Apgar invented the Apgar score as a method to quickly summarize the health of newborn children. This seemed like an appropriate name for a quick health check system.
No. apgar-probe
is designed to run all the health check scripts it finds in parallel. This allows it to fail the health check as fast as possible - the longest time that apgar-probe
will take to determine a machine has failed its health check is the time it takes to run the slowest health check script. You should break up your check scripts into small scripts that each check one aspect of your service instead of large scripts that sequentially test multiple parts of your services.
You don't. The work around is to have a single large check script that runs multiple tests in the order you want, but that will slow down the apgar
run.
You are of course free to use your own webserver instead of apgar-server
, but we opted to provide a stand-alone server for the following reasons:
apgar-server
and apgar-probe
are written in golang so that using Apgar doesn't pull in any dependencies that might conflict with those needed by the services you actually care about on a given system.I wanted Apgar to have as little impact on the host system as possible. Go gives us static binaries, so we don't have to worry about dependency conflicts with other services on machines. And it was a good excuse to start learning Go.