|
@ -24,4 +24,68 @@ run continuously or on a regular, known schedule. For example:</p> |
|
|
<li>collecting application performance metrics</li> |
|
|
<li>collecting application performance metrics</li> |
|
|
<li>error tracking</li> |
|
|
<li>error tracking</li> |
|
|
<li>log aggregation</li> |
|
|
<li>log aggregation</li> |
|
|
</ul> |
|
|
|
|
|
|
|
|
</ul> |
|
|
|
|
|
<h2>Concepts</h2> |
|
|
|
|
|
<p>A <strong>Check</strong> represents a single service you want to monitor. For example, when |
|
|
|
|
|
<a href="monitoring_cron_jobs/">monitoring cron jobs</a>, you would create a separate check for |
|
|
|
|
|
each cron job to be monitored. Each check has a unique ping URL, a set schedule, |
|
|
|
|
|
and associated integrations. For the available configuration options, see |
|
|
|
|
|
<a href="configuring_checks/">Configuring checks</a>.</p> |
|
|
|
|
|
<p>Each check is always in one of the following states, depicted by a status icon:</p> |
|
|
|
|
|
<dl> |
|
|
|
|
|
<dt><span class="status ic-new"></span></dt> |
|
|
|
|
|
<dd><strong>New</strong>. A newly created check that has not received any pings yet. Each new |
|
|
|
|
|
check you create will start in this state.</dd> |
|
|
|
|
|
<dt><span class="status ic-up"></span></dt> |
|
|
|
|
|
<dd><strong>Up</strong>. All is well, the last "success" signal has arrived on time.</dd> |
|
|
|
|
|
<dt><span class="status ic-grace"></span></dt> |
|
|
|
|
|
<dd><strong>Late</strong>. The "success" signal is due but has not arrived yet. |
|
|
|
|
|
It is not yet late by more than the check's configured <strong>Grace Time</strong>.</dd> |
|
|
|
|
|
<dt><span class="status ic-down"></span></dt> |
|
|
|
|
|
<dd><strong>Down</strong>. The "success" signal has not arrived yet, and the Grace Time has elapsed. |
|
|
|
|
|
When a check transitions into the "Down" state, SITE_NAME sends out alert |
|
|
|
|
|
messages via the configured integrations.</dd> |
|
|
|
|
|
<dt><span class="status ic-paused"></span></dt> |
|
|
|
|
|
<dd><strong>Paused</strong>. You can manually pause the monitoring of specific checks. For example, |
|
|
|
|
|
if a frequently running cron job has a known problem, and a fix is scheduled but |
|
|
|
|
|
not yet ready, you can pause monitoring of the corresponding check temporarily to |
|
|
|
|
|
avoid unwanted alerts about a known issue.</dd> |
|
|
|
|
|
<dt><span class="status ic-up"></span><div class="spinner started"><div class="d1"></div><div class="d2"></div><div class="d3"></div></div></dt> |
|
|
|
|
|
<dd>Additionally, if the most recent received signal is a "start" signal, |
|
|
|
|
|
this will be indicated by three animated dots under check's status icon.</dd> |
|
|
|
|
|
</dl> |
|
|
|
|
|
<hr /> |
|
|
|
|
|
<p><strong>Ping URL</strong>. Each check has a unique <strong>Ping URL</strong>. Clients (cron jobs, background |
|
|
|
|
|
workers, batch scripts, scheduled tasks, web services) make HTTP requests to the |
|
|
|
|
|
ping URL to signal a start of the execution, a success, or a failure.</p> |
|
|
|
|
|
<p>While the "success" signals are essential, "start" and "failure" are optional. |
|
|
|
|
|
You don't have to use them, but you can gain additional monitoring insights |
|
|
|
|
|
by using them. See <a href="measuring_script_run_time/">Measuring script run time</a> and |
|
|
|
|
|
<a href="signaling_failures/">Signaling failures</a> for details.</p> |
|
|
|
|
|
<p>You should treat ping URLs as secrets. If you make them public, anybody can send |
|
|
|
|
|
telemetry signals to your checks and mess with your monitoring.</p> |
|
|
|
|
|
<hr /> |
|
|
|
|
|
<p><strong>Grace Time</strong> is one of the configuration parameters you can set for each check. |
|
|
|
|
|
It is the additional time to wait before sending an alert when a check |
|
|
|
|
|
is late. Use this parameter to account for small, expected deviations in job |
|
|
|
|
|
execution times. If you use "start" signals to |
|
|
|
|
|
<a href="measuring_script_run_time/">measure job execution time</a>, Grace Time also sets the |
|
|
|
|
|
maximum allowed time gap between "start" and "success" signals. If a job |
|
|
|
|
|
sends a "start" signal but then does not send a "success" signal within grace time, |
|
|
|
|
|
SITE_NAME will assume the job has failed, and send out alerts.</p> |
|
|
|
|
|
<hr /> |
|
|
|
|
|
<p>An <strong>Integration</strong> is a specific method for delivering monitoring alerts when checks |
|
|
|
|
|
change states. SITE_NAME supports many different types of integrations: email, |
|
|
|
|
|
webhooks, SMS, Slack, PagerDuty, etc. You can set up multiple integrations. |
|
|
|
|
|
For each check, you can specify which integrations it should use.</p> |
|
|
|
|
|
<p>For more information on integrations, see |
|
|
|
|
|
<a href="configuring_notifications/">Configuring notifications</a>.</p> |
|
|
|
|
|
<hr /> |
|
|
|
|
|
<p><strong>Project</strong>. To keep things organized, you can group checks and integrations in <strong>Projects</strong>. |
|
|
|
|
|
Your account starts with a single default project, but you can create any number |
|
|
|
|
|
of additional projects as needed. You can transfer existing checks between projects |
|
|
|
|
|
while preserving their configuration and ping URL.</p> |
|
|
|
|
|
<p>Each project has a configurable name, a separate set of API keys, and a separate |
|
|
|
|
|
project team. The project's team is the set of people you have granted read-only or |
|
|
|
|
|
read-write access to the project.</p> |
|
|
|
|
|
<p>For more information on projects, see <a href="projects_teams/">Projects and teams</a>.</p> |