{% extends "front/base_docs.html" %} {% load compress staticfiles hc_extras %} {% block title %}Documentation - {% site_name %}{% endblock %} {% block docs_content %}
Each check in My Checks page has an unique "ping" URL. Whenever you access this URL, the "Last Ping" value of corresponding check is updated.
When a certain amount of time passes since last received ping, the check is considered "late", and {% site_name %} sends an email alert. It is a simple idea.
At the end of your batch job, add a bit of code to request your ping URL.
The response will have status code "200 OK" and response body will be a short and simple string "OK".
Here are examples of executing pings from different environments.
When using cron, probably the easiest is to append a curl or wget call after your command. The scheduled time comes, and your command runs. If it completes successfully (exit code 0), curl or wget runs a HTTP GET call to the ping URL.
{% include "front/snippets/crontab.html" %}With this simple modification, you monitor several failure scenarios:
Either way, when your task doesn't finish successfully, you will soon know about it.
The extra options to curl are meant to suppress any output, unless it hits an error. This is to prevent cron from sending an email every time the task runs. Feel free to adjust the curl options to your liking.
&& | Run curl only if /home/user/backup.sh succeeds |
---|---|
-f, --fail | Makes curl treat non-200 responses as errors |
-s, --silent | Silent or quiet mode. Don't show progress meter or error messages. |
-S, --show-error | When used with -s it makes curl show error message if it fails. |
--retry <num> | If a transient error is returned when curl tries to perform a transfer, it will retry this number of times before giving up. Setting the number to 0 makes curl do no retries (which is the default). Transient error means either: a timeout, an FTP 4xx response code or an HTTP 5xx response code. |
> /dev/null | Redirect curl's stdout to /dev/null (error messages go to stderr,) |
Both curl
and wget
examples accomplish the same
thing: they fire off a HTTP GET method.
If using curl
, make sure it is installed on your target system.
Ubuntu, for example, does not have curl installed out of the box.
{% site_name %} includes Access-Control-Allow-Origin:*
CORS header in its ping responses, so cross-domain AJAX requests
should work.
You can use PowerShell and Windows Task Scheduler to automate various tasks on a Windows system. From within a PowerShell script it is also easy to ping {% site_name %}.
Here is a simple PowerShell script that pings {% site_name %}. When scheduled to run with Task Scheduler, it will essentially just send regular "I'm alive" messages. You can of course extend it to do more things.
{% include "front/snippets/powershell.html" %}Save the above to e.g. C:\Scripts\healthchecks.ps1
. Then use
the following command in a Scheduled Task to run the script:
powershell.exe -ExecutionPolicy bypass -File C:\Scripts\healthchecks.ps1
As an alternative to HTTP/HTTPS requests, you can "ping" this check by sending an email message to {{ check.email }}
This is useful for end-to-end testing weekly email delivery.
An example scenario: you have a cron job which runs weekly and sends weekly email reports to a list of e-mail addresses. You have already set up a check to get alerted when your cron job fails to run. But what you ultimately want to check is your emails get sent and get delivered.
The solution: set up another check, and add its @hchk.io address to your list of recipient email addresses. Set its Period to 1 week. As long as your weekly email script runs correctly, the check will be regularly pinged and will stay up.
Each check has a configurable Period parameter, with the default value of one day. For periodic tasks, this is the expected time gap between two runs.
Additionally, each check has a Grace parameter, with default value of one hour. You can use this parameter to account for run time variance of tasks. For example, if a backup task completes in 50 seconds one day, and completes in 60 seconds the following day, you might not want to get alerted because the backups are 10 seconds late.
Each check can be in one of the following states:
New. A check that has been created, but has not received any pings yet. | |
Monitoring Paused. You can resume monitoring of a paused check by pinging it. | |
Up. Time since last ping has not exceeded Period. | |
Late. Time since last ping has exceeded Period, but has not yet exceeded Period + Grace. | |
Down. Time since last ping has exceeded Period + Grace. When check goes from "Late" to "Down", {% site_name %} sends you an alert. |