You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

368 lines
12 KiB

10 years ago
10 years ago
8 years ago
8 years ago
8 years ago
8 years ago
  1. {% extends "front/base_docs.html" %}
  2. {% load compress static hc_extras %}
  3. {% block title %}Documentation - {% site_name %}{% endblock %}
  4. {% block description %}
  5. <meta name="description" content="Monitor any service that can make a HTTP request or send an email: cron jobs, Bash scripts, Python, Ruby, Node, PHP, JS, ...">
  6. {% endblock %}
  7. {% block keywords %}
  8. <meta name="keywords" content="healthchecks, crontab monitoring, python health check, bash health check, cron monitoring, cron tutorial, cron howto, api health check, open source">
  9. {% endblock %}
  10. {% block docs_content %}
  11. <h2>How {% site_name %} Works</h2>
  12. <p>
  13. Each check in <a href="{% url 'hc-index' %}">My Checks</a>
  14. page has a unique "ping" URL. Whenever you make a HTTP request to this URL,
  15. {% site_name %} records the request and updates the "Last Ping" value of
  16. the corresponding check.
  17. </p>
  18. <p>When a certain, configurable amount of time passes since last received ping,
  19. the check is considered "late". {% site_name %} then
  20. waits for additional time (configured with the "Grace Time" parameter) and,
  21. if still no ping, sends you an alert.</p>
  22. <p>As long as the monitored service sends pings on time, you receive no
  23. alerts. As soon as it fails to check in on time, you get notified.
  24. It is a simple idea.</p>
  25. <h2 class="rule">Signalling a Success</h2>
  26. <p>
  27. At the end of your batch job, add a bit of code to request
  28. your ping URL.
  29. </p>
  30. <ul>
  31. <li>HTTP and HTTPS protocols both work.
  32. Prefer HTTPS, but on old systems you may need to fall back to HTTP.</li>
  33. <li>Request method can be GET, POST or HEAD</li>
  34. <li>Both IPv4 and IPv6 work</li>
  35. <li>
  36. For HTTP POST requests, you can include additional diagnostic information
  37. for your own reference in the request body. If the request body looks
  38. like a UTF-8 string, {% site_name %} will log the first 10 kilobytes
  39. of the request body, so you can inspect it later.
  40. </li>
  41. </ul>
  42. <p>The response will have status code "200 OK" and response body will be a
  43. short and simple string "OK".</p>
  44. <a name="fail-event"></a>
  45. <h2 class="rule">Signalling a Failure</h2>
  46. <p>
  47. Append <code>/fail</code> to a ping URL and use it to actively signal a
  48. failure. Requesting the <code>/fail</code> URL will immediately mark the
  49. check as "down". You can use this feature to minimize the delay from
  50. your monitored service failing to you getting a notification.
  51. </p>
  52. <p>Below is a skeleton code example in Python which signals a failure when the
  53. work function returns an unexpected value or throws an exception:</p>
  54. {% include "front/snippets/python_requests_fail.html" %}
  55. <a name="start-event"></a>
  56. <h2 class="rule">Measuring Job Execution Time</h2>
  57. <p>
  58. Append <code>/start</code> to a ping URL and use it to signal
  59. when a job starts. After receiving a start signal, {% site_name %}
  60. will show the check as "Started". It will store the "start" events and
  61. display the job execution times. The job execution times are calculated as the time
  62. gaps between adjacent "start" and "complete" events.
  63. </p>
  64. <p>
  65. Signalling a start kicks off a separate timer: the job
  66. now <strong>must</strong> signal a success within its configured
  67. "Grace Time", or it will get marked as "down".
  68. </p>
  69. <p>Below is a code example in Python:</p>
  70. {% include "front/snippets/python_requests_start.html" %}
  71. <h2 class="rule">Examples</h2>
  72. <p>
  73. Jump to example:
  74. <a href="#crontab">Crontab</a>,
  75. <a href="#bash">Bash</a>,
  76. <a href="#python">Python</a>,
  77. <a href="#ruby">Ruby</a>,
  78. <a href="#node">Node</a>,
  79. <a href="#php">PHP</a>,
  80. <a href="#cs">C#</a>,
  81. <a href="#browser">Browser</a>,
  82. <a href="#powershell">PowerShell</a>,
  83. <a href="#email">Email</a>.
  84. </p>
  85. <a name="crontab"></a>
  86. <h3 class="docs-example">Crontab</h3>
  87. <p>
  88. When using cron, probably the easiest is to append a curl
  89. or wget call after your command. The scheduled time comes,
  90. and your command runs. If it completes successfully (exit code 0),
  91. curl or wget runs a HTTP GET call to the ping URL.
  92. </p>
  93. {% include "front/snippets/crontab.html" %}
  94. <p>With this simple modification, you monitor several failure
  95. scenarios:</p>
  96. <ul>
  97. <li>The whole machine has stopped working (power outage, janitor stumbles on wires, VPS provider problems, etc.) </li>
  98. <li>cron daemon is not running, or has invalid configuration</li>
  99. <li>cron does start your task, but the task exits with non-zero exit code</li>
  100. </ul>
  101. <p>Either way, when your task doesn't finish successfully, you will soon
  102. know about it.</p>
  103. <p>The extra options to curl are meant to suppress any output, unless it hits
  104. an error. This is to prevent cron from sending an email every time the
  105. task runs. Feel free to adjust the curl options to your liking.
  106. </p>
  107. <table class="table curl-opts">
  108. <tr>
  109. <th>&amp;&amp;</th>
  110. <td>Run curl only if <code>/home/user/backup.sh</code> succeeds</td>
  111. </tr>
  112. <tr>
  113. <th>
  114. -f, --fail
  115. </th>
  116. <td>Makes curl treat non-200 responses as errors</td>
  117. </tr>
  118. <tr>
  119. <th>-s, --silent</th>
  120. <td>Silent or quiet mode. Don't show progress meter or error messages.</td>
  121. </tr>
  122. <tr>
  123. <th>-S, --show-error</th>
  124. <td>When used with -s it makes curl show error message if it fails.</td>
  125. </tr>
  126. <tr>
  127. <th>--retry &lt;num&gt;</th>
  128. <td>
  129. If a transient error is returned when curl tries to perform a
  130. transfer, it will retry this number of times before giving up.
  131. Setting the number to 0 makes curl do no retries
  132. (which is the default). Transient error means either: a timeout,
  133. an FTP 4xx response code or an HTTP 5xx response code.
  134. </td>
  135. </tr>
  136. <tr>
  137. <th>&gt; /dev/null</th>
  138. <td>
  139. Redirect curl's stdout to /dev/null (error messages go to stderr,)
  140. </td>
  141. </tr>
  142. </table>
  143. <a name="bash"></a>
  144. <h3 class="docs-example">Bash or a shell script</h3>
  145. <p>Both <code>curl</code> and <code>wget</code> examples accomplish the same
  146. thing: they fire off a HTTP GET method.</p>
  147. <p>
  148. If using <code>curl</code>, make sure it is installed on your target system.
  149. Ubuntu, for example, does not have curl installed out of the box.
  150. </p>
  151. {% include "front/snippets/bash_curl.html" %}
  152. {% include "front/snippets/bash_wget.html" %}
  153. <a name="python"></a>
  154. <h3 class="docs-example">Python</h3>
  155. <p>
  156. If you are already using the
  157. <a href="http://docs.python-requests.org/en/master/">requests</a> library,
  158. it's convenient to also use it here:
  159. </p>
  160. {% include "front/snippets/python_requests.html" %}
  161. <p>
  162. Otherwise, you can use the <code>urllib</code> standard module.
  163. </p>
  164. {% include "front/snippets/python_urllib2.html" %}
  165. <p>
  166. You can include additional diagnostic information in the
  167. in the request body (for POST requests), or in the "User-Agent"
  168. request header:
  169. </p>
  170. {% include "front/snippets/python_requests_payload.html" %}
  171. <a name="ruby"></a>
  172. <h3 class="docs-example">Ruby</h3>
  173. {% include "front/snippets/ruby.html" %}
  174. <a name="node"></a>
  175. <h3 class="docs-example">Node</h3>
  176. {% include "front/snippets/node.html" %}
  177. <a name="php"></a>
  178. <h3 class="docs-example">PHP</h3>
  179. {% include "front/snippets/php.html" %}
  180. <a name="cs"></a>
  181. <h3 class="docs-example">C#</h3>
  182. {% include "front/snippets/cs.html" %}
  183. <a name="browser"></a>
  184. <h3>Browser</h3>
  185. <p>
  186. {% site_name %} includes <code>Access-Control-Allow-Origin:*</code>
  187. CORS header in its ping responses, so cross-domain AJAX requests
  188. should work.
  189. </p>
  190. {% include "front/snippets/browser.html" %}
  191. <a name="powershell"></a>
  192. <h3 class="docs-example">PowerShell</h3>
  193. <p>
  194. You can use <a href="https://msdn.microsoft.com/en-us/powershell/mt173057.aspx">PowerShell</a>
  195. and Windows Task Scheduler to automate various tasks on a Windows system.
  196. From within a PowerShell script it is also easy to ping {% site_name %}.
  197. </p>
  198. <p>Here is a simple PowerShell script that pings {% site_name %}.
  199. When scheduled to run with Task Scheduler, it will essentially
  200. just send regular "I'm alive" messages. You can of course extend it to
  201. do more things.</p>
  202. {% include "front/snippets/powershell.html" %}
  203. <p>Save the above to e.g. <code>C:\Scripts\healthchecks.ps1</code>. Then use
  204. the following command in a Scheduled Task to run the script:
  205. </p>
  206. <div class="highlight">
  207. <pre>powershell.exe -ExecutionPolicy bypass -File C:\Scripts\healthchecks.ps1</pre>
  208. </div>
  209. <p>In simple cases, you can also pass the script to PowerShell directly,
  210. using the "-command" argument:</p>
  211. {% include "front/snippets/powershell_inline.html" %}
  212. <a name="email"></a>
  213. <h3 class="docs-example">Email</h3>
  214. <p>
  215. As an alternative to HTTP/HTTPS requests,
  216. you can "ping" this check by sending an
  217. email message to <strong>{{ ping_email }}</strong>
  218. </p>
  219. <p>
  220. This is useful for end-to-end testing weekly email delivery.
  221. </p>
  222. <p>
  223. An example scenario: you have a cron job which runs weekly and
  224. sends weekly email reports to a list of e-mail addresses. You have already
  225. set up a check to get alerted when your cron job fails to run.
  226. But what you ultimately want to check is your emails <em>get sent and
  227. get delivered</em>.
  228. </p>
  229. <p>
  230. The solution: set up another check, and add its
  231. @hchk.io address to your list of recipient email addresses. Set its
  232. Period to 1 week. As long as your weekly email script runs correctly,
  233. the check will be regularly pinged and will stay up.
  234. </p>
  235. <h2 class="rule">When Alerts Are Sent</h2>
  236. <p>
  237. Each check has a configurable <strong>Period</strong> parameter, with the default value of one day.
  238. For periodic tasks, this is the expected time gap between two runs.
  239. </p>
  240. <p>
  241. Additionally, each check has a <strong>Grace</strong> parameter, with default value of one hour.
  242. You can use this parameter to account for run time variance of tasks.
  243. For example, if a backup task completes in 50 seconds one day, and
  244. completes in 60 seconds the following day, you might not want to get
  245. alerted because the backups are 10 seconds late.
  246. </p>
  247. <p>Each check can be in one of the following states:</p>
  248. <table class="table">
  249. <tr>
  250. <td>
  251. <span class="status icon-new"></span>
  252. </td>
  253. <td>
  254. <strong>New.</strong>
  255. A check that has been created, but has not received any pings yet.
  256. </td>
  257. </tr>
  258. <tr>
  259. <td>
  260. <span class="status icon-paused"></span>
  261. </td>
  262. <td>
  263. <strong>Monitoring Paused.</strong>
  264. You can resume monitoring of a paused check by pinging it.
  265. </td>
  266. </tr>
  267. <tr>
  268. <td>
  269. <span class="status icon-started"></span>
  270. </td>
  271. <td>
  272. <strong>Started.</strong>
  273. The check has received a "start" signal, and is currently running.
  274. </td>
  275. </tr>
  276. <tr>
  277. <td>
  278. <span class="status icon-up"></span>
  279. </td>
  280. <td>
  281. <strong>Up.</strong>
  282. Time since last ping has not exceeded <strong>Period</strong>.
  283. </td>
  284. </tr>
  285. <tr>
  286. <td>
  287. <span class="status icon-grace"></span>
  288. </td>
  289. <td>
  290. <strong>Late.</strong>
  291. Time since last ping has exceeded <strong>Period</strong>,
  292. but has not yet exceeded <strong>Period</strong> + <strong>Grace</strong>.
  293. </td>
  294. </tr>
  295. <tr>
  296. <td>
  297. <span class="status icon-down"></span>
  298. </td>
  299. <td>
  300. <p><strong>Down.</strong> The check has not received a "success"
  301. ping in time, or it has received an explicit "fail" signal.
  302. </p>
  303. <p>
  304. When a check goes into the "Down" state, {% site_name %}
  305. sends you an alert.
  306. </p>
  307. </td>
  308. </tr>
  309. </table>
  310. {% endblock %}
  311. {% block scripts %}
  312. {% compress js %}
  313. <script src="{% static 'js/jquery-2.1.4.min.js' %}"></script>
  314. <script src="{% static 'js/bootstrap.min.js' %}"></script>
  315. <script src="{% static 'js/clipboard.min.js' %}"></script>
  316. <script src="{% static 'js/snippet-copy.js' %}"></script>
  317. {% endcompress %}
  318. {% endblock %}