|
|
@ -1,368 +0,0 @@ |
|
|
|
{% extends "front/base_docs.html" %} |
|
|
|
{% load compress static hc_extras %} |
|
|
|
|
|
|
|
{% block title %}Documentation - {% site_name %}{% endblock %} |
|
|
|
{% block description %} |
|
|
|
<meta name="description" content="Monitor any service that can make a HTTP request or send an email: cron jobs, Bash scripts, Python, Ruby, Node, PHP, JS, ..."> |
|
|
|
{% endblock %} |
|
|
|
{% block keywords %} |
|
|
|
<meta name="keywords" content="healthchecks, crontab monitoring, python health check, bash health check, cron monitoring, cron tutorial, cron howto, api health check, open source"> |
|
|
|
{% endblock %} |
|
|
|
|
|
|
|
|
|
|
|
{% block docs_content %} |
|
|
|
|
|
|
|
<h2>How {% site_name %} Works</h2> |
|
|
|
<p> |
|
|
|
Each check in <a href="{% url 'hc-index' %}">My Checks</a> |
|
|
|
page has a unique "ping" URL. Whenever you make a HTTP request to this URL, |
|
|
|
{% site_name %} records the request and updates the "Last Ping" value of |
|
|
|
the corresponding check. |
|
|
|
</p> |
|
|
|
|
|
|
|
<p>When a certain, configurable amount of time passes since last received ping, |
|
|
|
the check is considered "late". {% site_name %} then |
|
|
|
waits for additional time (configured with the "Grace Time" parameter) and, |
|
|
|
if still no ping, sends you an alert.</p> |
|
|
|
|
|
|
|
<p>As long as the monitored service sends pings on time, you receive no |
|
|
|
alerts. As soon as it fails to check in on time, you get notified. |
|
|
|
It is a simple idea.</p> |
|
|
|
|
|
|
|
<h2 class="rule">Signalling a Success</h2> |
|
|
|
|
|
|
|
<p> |
|
|
|
At the end of your batch job, add a bit of code to request |
|
|
|
your ping URL. |
|
|
|
</p> |
|
|
|
<ul> |
|
|
|
<li>HTTP and HTTPS protocols both work. |
|
|
|
Prefer HTTPS, but on old systems you may need to fall back to HTTP.</li> |
|
|
|
<li>Request method can be GET, POST or HEAD</li> |
|
|
|
<li>Both IPv4 and IPv6 work</li> |
|
|
|
<li> |
|
|
|
For HTTP POST requests, you can include additional diagnostic information |
|
|
|
for your own reference in the request body. If the request body looks |
|
|
|
like a UTF-8 string, {% site_name %} will log the first 10 kilobytes |
|
|
|
of the request body, so you can inspect it later. |
|
|
|
</li> |
|
|
|
</ul> |
|
|
|
|
|
|
|
<p>The response will have status code "200 OK" and response body will be a |
|
|
|
short and simple string "OK".</p> |
|
|
|
|
|
|
|
<a name="fail-event"></a> |
|
|
|
<h2 class="rule">Signalling a Failure</h2> |
|
|
|
<p> |
|
|
|
Append <code>/fail</code> to a ping URL and use it to actively signal a |
|
|
|
failure. Requesting the <code>/fail</code> URL will immediately mark the |
|
|
|
check as "down". You can use this feature to minimize the delay from |
|
|
|
your monitored service failing to you getting a notification. |
|
|
|
</p> |
|
|
|
<p>Below is a skeleton code example in Python which signals a failure when the |
|
|
|
work function returns an unexpected value or throws an exception:</p> |
|
|
|
|
|
|
|
{% include "front/snippets/python_requests_fail.html" %} |
|
|
|
|
|
|
|
<a name="start-event"></a> |
|
|
|
<h2 class="rule">Measuring Job Execution Time</h2> |
|
|
|
<p> |
|
|
|
Append <code>/start</code> to a ping URL and use it to signal |
|
|
|
when a job starts. After receiving a start signal, {% site_name %} |
|
|
|
will show the check as "Started". It will store the "start" events and |
|
|
|
display the job execution times. The job execution times are calculated as the time |
|
|
|
gaps between adjacent "start" and "complete" events. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
Signalling a start kicks off a separate timer: the job |
|
|
|
now <strong>must</strong> signal a success within its configured |
|
|
|
"Grace Time", or it will get marked as "down". |
|
|
|
</p> |
|
|
|
|
|
|
|
<p>Below is a code example in Python:</p> |
|
|
|
|
|
|
|
{% include "front/snippets/python_requests_start.html" %} |
|
|
|
|
|
|
|
<h2 class="rule">Examples</h2> |
|
|
|
|
|
|
|
<p> |
|
|
|
Jump to example: |
|
|
|
<a href="#crontab">Crontab</a>, |
|
|
|
<a href="#bash">Bash</a>, |
|
|
|
<a href="#python">Python</a>, |
|
|
|
<a href="#ruby">Ruby</a>, |
|
|
|
<a href="#node">Node</a>, |
|
|
|
<a href="#php">PHP</a>, |
|
|
|
<a href="#cs">C#</a>, |
|
|
|
<a href="#browser">Browser</a>, |
|
|
|
<a href="#powershell">PowerShell</a>, |
|
|
|
<a href="#email">Email</a>. |
|
|
|
</p> |
|
|
|
|
|
|
|
<a name="crontab"></a> |
|
|
|
<h3 class="docs-example">Crontab</h3> |
|
|
|
|
|
|
|
<p> |
|
|
|
When using cron, probably the easiest is to append a curl |
|
|
|
or wget call after your command. The scheduled time comes, |
|
|
|
and your command runs. If it completes successfully (exit code 0), |
|
|
|
curl or wget runs a HTTP GET call to the ping URL. |
|
|
|
</p> |
|
|
|
|
|
|
|
{% include "front/snippets/crontab.html" %} |
|
|
|
|
|
|
|
<p>With this simple modification, you monitor several failure |
|
|
|
scenarios:</p> |
|
|
|
|
|
|
|
<ul> |
|
|
|
<li>The whole machine has stopped working (power outage, janitor stumbles on wires, VPS provider problems, etc.) </li> |
|
|
|
<li>cron daemon is not running, or has invalid configuration</li> |
|
|
|
<li>cron does start your task, but the task exits with non-zero exit code</li> |
|
|
|
</ul> |
|
|
|
|
|
|
|
<p>Either way, when your task doesn't finish successfully, you will soon |
|
|
|
know about it.</p> |
|
|
|
|
|
|
|
<p>The extra options to curl are meant to suppress any output, unless it hits |
|
|
|
an error. This is to prevent cron from sending an email every time the |
|
|
|
task runs. Feel free to adjust the curl options to your liking. |
|
|
|
</p> |
|
|
|
<table class="table curl-opts"> |
|
|
|
<tr> |
|
|
|
<th>&&</th> |
|
|
|
<td>Run curl only if <code>/home/user/backup.sh</code> succeeds</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<th> |
|
|
|
-f, --fail |
|
|
|
</th> |
|
|
|
<td>Makes curl treat non-200 responses as errors</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<th>-s, --silent</th> |
|
|
|
<td>Silent or quiet mode. Don't show progress meter or error messages.</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<th>-S, --show-error</th> |
|
|
|
<td>When used with -s it makes curl show error message if it fails.</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<th>--retry <num></th> |
|
|
|
<td> |
|
|
|
If a transient error is returned when curl tries to perform a |
|
|
|
transfer, it will retry this number of times before giving up. |
|
|
|
Setting the number to 0 makes curl do no retries |
|
|
|
(which is the default). Transient error means either: a timeout, |
|
|
|
an FTP 4xx response code or an HTTP 5xx response code. |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<th>> /dev/null</th> |
|
|
|
<td> |
|
|
|
Redirect curl's stdout to /dev/null (error messages go to stderr,) |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
</table> |
|
|
|
|
|
|
|
<a name="bash"></a> |
|
|
|
<h3 class="docs-example">Bash or a shell script</h3> |
|
|
|
|
|
|
|
<p>Both <code>curl</code> and <code>wget</code> examples accomplish the same |
|
|
|
thing: they fire off a HTTP GET method.</p> |
|
|
|
|
|
|
|
<p> |
|
|
|
If using <code>curl</code>, make sure it is installed on your target system. |
|
|
|
Ubuntu, for example, does not have curl installed out of the box. |
|
|
|
</p> |
|
|
|
|
|
|
|
{% include "front/snippets/bash_curl.html" %} |
|
|
|
{% include "front/snippets/bash_wget.html" %} |
|
|
|
|
|
|
|
<a name="python"></a> |
|
|
|
<h3 class="docs-example">Python</h3> |
|
|
|
|
|
|
|
<p> |
|
|
|
If you are already using the |
|
|
|
<a href="http://docs.python-requests.org/en/master/">requests</a> library, |
|
|
|
it's convenient to also use it here: |
|
|
|
</p> |
|
|
|
{% include "front/snippets/python_requests.html" %} |
|
|
|
|
|
|
|
<p> |
|
|
|
Otherwise, you can use the <code>urllib</code> standard module. |
|
|
|
</p> |
|
|
|
|
|
|
|
{% include "front/snippets/python_urllib2.html" %} |
|
|
|
|
|
|
|
<p> |
|
|
|
You can include additional diagnostic information in the |
|
|
|
in the request body (for POST requests), or in the "User-Agent" |
|
|
|
request header: |
|
|
|
</p> |
|
|
|
|
|
|
|
{% include "front/snippets/python_requests_payload.html" %} |
|
|
|
|
|
|
|
<a name="ruby"></a> |
|
|
|
<h3 class="docs-example">Ruby</h3> |
|
|
|
{% include "front/snippets/ruby.html" %} |
|
|
|
|
|
|
|
<a name="node"></a> |
|
|
|
<h3 class="docs-example">Node</h3> |
|
|
|
{% include "front/snippets/node.html" %} |
|
|
|
|
|
|
|
<a name="php"></a> |
|
|
|
<h3 class="docs-example">PHP</h3> |
|
|
|
{% include "front/snippets/php.html" %} |
|
|
|
|
|
|
|
<a name="cs"></a> |
|
|
|
<h3 class="docs-example">C#</h3> |
|
|
|
{% include "front/snippets/cs.html" %} |
|
|
|
|
|
|
|
<a name="browser"></a> |
|
|
|
<h3>Browser</h3> |
|
|
|
<p> |
|
|
|
{% site_name %} includes <code>Access-Control-Allow-Origin:*</code> |
|
|
|
CORS header in its ping responses, so cross-domain AJAX requests |
|
|
|
should work. |
|
|
|
</p> |
|
|
|
{% include "front/snippets/browser.html" %} |
|
|
|
|
|
|
|
<a name="powershell"></a> |
|
|
|
<h3 class="docs-example">PowerShell</h3> |
|
|
|
<p> |
|
|
|
You can use <a href="https://msdn.microsoft.com/en-us/powershell/mt173057.aspx">PowerShell</a> |
|
|
|
and Windows Task Scheduler to automate various tasks on a Windows system. |
|
|
|
From within a PowerShell script it is also easy to ping {% site_name %}. |
|
|
|
</p> |
|
|
|
<p>Here is a simple PowerShell script that pings {% site_name %}. |
|
|
|
When scheduled to run with Task Scheduler, it will essentially |
|
|
|
just send regular "I'm alive" messages. You can of course extend it to |
|
|
|
do more things.</p> |
|
|
|
|
|
|
|
{% include "front/snippets/powershell.html" %} |
|
|
|
|
|
|
|
<p>Save the above to e.g. <code>C:\Scripts\healthchecks.ps1</code>. Then use |
|
|
|
the following command in a Scheduled Task to run the script: |
|
|
|
</p> |
|
|
|
|
|
|
|
<div class="highlight"> |
|
|
|
<pre>powershell.exe -ExecutionPolicy bypass -File C:\Scripts\healthchecks.ps1</pre> |
|
|
|
</div> |
|
|
|
|
|
|
|
<p>In simple cases, you can also pass the script to PowerShell directly, |
|
|
|
using the "-command" argument:</p> |
|
|
|
|
|
|
|
{% include "front/snippets/powershell_inline.html" %} |
|
|
|
|
|
|
|
<a name="email"></a> |
|
|
|
<h3 class="docs-example">Email</h3> |
|
|
|
<p> |
|
|
|
As an alternative to HTTP/HTTPS requests, |
|
|
|
you can "ping" this check by sending an |
|
|
|
email message to <strong>{{ ping_email }}</strong> |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
This is useful for end-to-end testing weekly email delivery. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
An example scenario: you have a cron job which runs weekly and |
|
|
|
sends weekly email reports to a list of e-mail addresses. You have already |
|
|
|
set up a check to get alerted when your cron job fails to run. |
|
|
|
But what you ultimately want to check is your emails <em>get sent and |
|
|
|
get delivered</em>. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
The solution: set up another check, and add its |
|
|
|
@{{ ping_email_domain }} address to your list of recipient email addresses. |
|
|
|
Set its Period to 1 week. As long as your weekly email script runs |
|
|
|
correctly, the check will be regularly pinged and will stay up. |
|
|
|
</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2 class="rule">When Alerts Are Sent</h2> |
|
|
|
<p> |
|
|
|
Each check has a configurable <strong>Period</strong> parameter, with the default value of one day. |
|
|
|
For periodic tasks, this is the expected time gap between two runs. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
Additionally, each check has a <strong>Grace</strong> parameter, with default value of one hour. |
|
|
|
You can use this parameter to account for run time variance of tasks. |
|
|
|
For example, if a backup task completes in 50 seconds one day, and |
|
|
|
completes in 60 seconds the following day, you might not want to get |
|
|
|
alerted because the backups are 10 seconds late. |
|
|
|
</p> |
|
|
|
<p>Each check can be in one of the following states:</p> |
|
|
|
|
|
|
|
<table class="table"> |
|
|
|
<tr> |
|
|
|
<td> |
|
|
|
<span class="status icon-new"></span> |
|
|
|
</td> |
|
|
|
<td> |
|
|
|
<strong>New.</strong> |
|
|
|
A check that has been created, but has not received any pings yet. |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<td> |
|
|
|
<span class="status icon-paused"></span> |
|
|
|
</td> |
|
|
|
<td> |
|
|
|
<strong>Monitoring Paused.</strong> |
|
|
|
You can resume monitoring of a paused check by pinging it. |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<td> |
|
|
|
<span class="status icon-started"></span> |
|
|
|
</td> |
|
|
|
<td> |
|
|
|
<strong>Started.</strong> |
|
|
|
The check has received a "start" signal, and is currently running. |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<td> |
|
|
|
<span class="status icon-up"></span> |
|
|
|
</td> |
|
|
|
<td> |
|
|
|
<strong>Up.</strong> |
|
|
|
Time since last ping has not exceeded <strong>Period</strong>. |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<td> |
|
|
|
<span class="status icon-grace"></span> |
|
|
|
</td> |
|
|
|
<td> |
|
|
|
<strong>Late.</strong> |
|
|
|
Time since last ping has exceeded <strong>Period</strong>, |
|
|
|
but has not yet exceeded <strong>Period</strong> + <strong>Grace</strong>. |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
<tr> |
|
|
|
<td> |
|
|
|
<span class="status icon-down"></span> |
|
|
|
</td> |
|
|
|
<td> |
|
|
|
<p><strong>Down.</strong> The check has not received a "success" |
|
|
|
ping in time, or it has received an explicit "fail" signal. |
|
|
|
</p> |
|
|
|
<p> |
|
|
|
When a check goes into the "Down" state, {% site_name %} |
|
|
|
sends you an alert. |
|
|
|
</p> |
|
|
|
</td> |
|
|
|
</tr> |
|
|
|
</table> |
|
|
|
|
|
|
|
{% endblock %} |
|
|
|
|
|
|
|
{% block scripts %} |
|
|
|
{% compress js %} |
|
|
|
<script src="{% static 'js/jquery-2.1.4.min.js' %}"></script> |
|
|
|
<script src="{% static 'js/bootstrap.min.js' %}"></script> |
|
|
|
<script src="{% static 'js/clipboard.min.js' %}"></script> |
|
|
|
<script src="{% static 'js/snippet-copy.js' %}"></script> |
|
|
|
{% endcompress %} |
|
|
|
{% endblock %} |