diff --git a/static/css/docs.css b/static/css/docs.css index 4d3d8fe8..90950734 100644 --- a/static/css/docs.css +++ b/static/css/docs.css @@ -109,6 +109,7 @@ h2.rule { .page-docs dt, .page-docs dd { border-top: 1px solid var(--border-color); padding: 8px 0; + line-height: 1.8; } .rule + p code { @@ -116,3 +117,11 @@ h2.rule { font-weight: bold; padding: 2px 4px; } + +.docs-introduction dl { + grid-template-columns: 48px auto; +} + +.docs-introduction dt { + text-align: center; +} diff --git a/templates/docs/configuring_checks.html b/templates/docs/configuring_checks.html index fbb6ae9b..d299e833 100644 --- a/templates/docs/configuring_checks.html +++ b/templates/docs/configuring_checks.html @@ -3,7 +3,7 @@ monitor. For example, when monitoring cron jobs, you would create a separate check for each cron job to be monitored. SITE_NAME pricing plans are structured primarily around how many checks you can have in your account. You can create checks -either in SITE_NAME web interface or by calling Management API.
+either in SITE_NAME web interface or via Management API.Describe each check using an optional name, tags, and description fields.
@@ -33,10 +33,11 @@ is late. Use this parameter to account for small, expected deviations in job execution times.Note: if you use the "start" signal to measure job run times, -then Grace Time also specifies how long the job is expected to run. Whenever SITE_NAME -receives a "start" signal, it expects to receive a subsequent "success" signal -within Grace Time. If the success signal does not arrive within the configured -Grace Time, SITE_NAME will mark the check as failed and send out alerts.
+then Grace Time also specifies the maximum allowed time gap between "start" and +"success" signals. Whenever SITE_NAME receives a "start" signal, it expects to +receive a subsequent "success" signal within Grace Time. If the success signal does +not arrive within the configured Grace Time, SITE_NAME will mark the check as failed +and send out alerts.Use "cron" for monitoring processes with more complex schedules. This monitoring mode ensures that jobs run at the correct time, and not just at correct time intervals.
diff --git a/templates/docs/configuring_checks.md b/templates/docs/configuring_checks.md index d0a9082f..5c88ad34 100644 --- a/templates/docs/configuring_checks.md +++ b/templates/docs/configuring_checks.md @@ -4,7 +4,7 @@ In SITE_NAME, a **Check** represents a single service you want to monitor. For example, when monitoring cron jobs, you would create a separate check for each cron job to be monitored. SITE_NAME pricing plans are structured primarily around how many checks you can have in your account. You can create checks -either in SITE_NAME web interface or by calling [Management API](../api/). +either in SITE_NAME web interface or via [Management API](../api/). ## Name, Tags, Description @@ -40,10 +40,11 @@ is late. Use this parameter to account for small, expected deviations in job execution times. Note: if you use the "start" signal to [measure job run times](../measuring_script_run_time/), -then Grace Time also specifies how long the job is expected to run. Whenever SITE_NAME -receives a "start" signal, it expects to receive a subsequent "success" signal -within Grace Time. If the success signal does not arrive within the configured -Grace Time, SITE_NAME will mark the check as failed and send out alerts. +then Grace Time also specifies the maximum allowed time gap between "start" and +"success" signals. Whenever SITE_NAME receives a "start" signal, it expects to +receive a subsequent "success" signal within Grace Time. If the success signal does +not arrive within the configured Grace Time, SITE_NAME will mark the check as failed +and send out alerts. ## Cron Schedules diff --git a/templates/docs/introduction.html b/templates/docs/introduction.html index 60ec6a4b..24d46043 100644 --- a/templates/docs/introduction.html +++ b/templates/docs/introduction.html @@ -24,4 +24,68 @@ run continuously or on a regular, known schedule. For example:A Check represents a single service you want to monitor. For example, when +monitoring cron jobs, you would create a separate check for +each cron job to be monitored. Each check has a unique ping URL, a set schedule, +and associated integrations. For the available configuration options, see +Configuring checks.
+Each check is always in one of the following states, depicted by a status icon:
+Ping URL. Each check has a unique Ping URL. Clients (cron jobs, background +workers, batch scripts, scheduled tasks, web services) make HTTP requests to the +ping URL to signal a start of the execution, a success, or a failure.
+While the "success" signals are essential, "start" and "failure" are optional. +You don't have to use them, but you can gain additional monitoring insights +by using them. See Measuring script run time and +Signaling failures for details.
+You should treat ping URLs as secrets. If you make them public, anybody can send +telemetry signals to your checks and mess with your monitoring.
+Grace Time is one of the configuration parameters you can set for each check. +It is the additional time to wait before sending an alert when a check +is late. Use this parameter to account for small, expected deviations in job +execution times. If you use "start" signals to +measure job execution time, Grace Time also sets the +maximum allowed time gap between "start" and "success" signals. If a job +sends a "start" signal but then does not send a "success" signal within grace time, +SITE_NAME will assume the job has failed, and send out alerts.
+An Integration is a specific method for delivering monitoring alerts when checks +change states. SITE_NAME supports many different types of integrations: email, +webhooks, SMS, Slack, PagerDuty, etc. You can set up multiple integrations. +For each check, you can specify which integrations it should use.
+For more information on integrations, see +Configuring notifications.
+Project. To keep things organized, you can group checks and integrations in Projects. +Your account starts with a single default project, but you can create any number +of additional projects as needed. You can transfer existing checks between projects +while preserving their configuration and ping URL.
+Each project has a configurable name, a separate set of API keys, and a separate +project team. The project's team is the set of people you have granted read-only or +read-write access to the project.
+For more information on projects, see Projects and teams.
\ No newline at end of file diff --git a/templates/docs/introduction.md b/templates/docs/introduction.md index d616a916..78230575 100644 --- a/templates/docs/introduction.md +++ b/templates/docs/introduction.md @@ -25,3 +25,88 @@ SITE_NAME is *not* the right tool for: * collecting application performance metrics * error tracking * log aggregation + +## Concepts + +A **Check** represents a single service you want to monitor. For example, when +[monitoring cron jobs](monitoring_cron_jobs/), you would create a separate check for +each cron job to be monitored. Each check has a unique ping URL, a set schedule, +and associated integrations. For the available configuration options, see +[Configuring checks](configuring_checks/). + +Each check is always in one of the following states, depicted by a status icon: + + +: **New**. A newly created check that has not received any pings yet. Each new + check you create will start in this state. + + +: **Up**. All is well, the last "success" signal has arrived on time. + + +: **Late**. The "success" signal is due but has not arrived yet. + It is not yet late by more than the check's configured **Grace Time**. + + +: **Down**. The "success" signal has not arrived yet, and the Grace Time has elapsed. + When a check transitions into the "Down" state, SITE_NAME sends out alert + messages via the configured integrations. + + +: **Paused**. You can manually pause the monitoring of specific checks. For example, + if a frequently running cron job has a known problem, and a fix is scheduled but + not yet ready, you can pause monitoring of the corresponding check temporarily to + avoid unwanted alerts about a known issue. + +SITE_NAME applies an additional alerting rule for jobs that use the /start
signal.
If a job sends a "start" signal, but then does not send a "complete" +
If a job sends a "start" signal, but then does not send a "success" signal within its configured grace time, SITE_NAME will assume the job has failed. It will mark the job as "down" and send out alerts.