You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

112 lines
4.9 KiB

  1. # SITE_NAME Documentation
  2. SITE_NAME is a service for monitoring cron jobs and similar periodic processes:
  3. * SITE_NAME **listens for HTTP requests ("pings")** from your cron jobs and scheduled
  4. tasks.
  5. * It **keeps silent** as long as pings arrive on time.
  6. * It **raises an alert** as soon as a ping does not arrive on time.
  7. SITE_NAME works as a [dead man's switch](https://en.wikipedia.org/wiki/Dead_man%27s_switch) for processes that need to
  8. run continuously or on a regular, known schedule. For example:
  9. * filesystem backups, database backups
  10. * task queues
  11. * database replication status
  12. * report generation scripts
  13. * periodic data import and sync jobs
  14. * periodic antivirus scans
  15. * DDNS updater scripts
  16. * SSL renewal scripts
  17. SITE_NAME is *not* the right tool for:
  18. * monitoring website uptime by probing it with HTTP requests
  19. * collecting application performance metrics
  20. * error tracking
  21. * log aggregation
  22. ## Concepts
  23. A **Check** represents a single service you want to monitor. For example, when
  24. [monitoring cron jobs](monitoring_cron_jobs/), you would create a separate check for
  25. each cron job to be monitored. Each check has a unique ping URL, a set schedule,
  26. and associated integrations. For the available configuration options, see
  27. [Configuring checks](configuring_checks/).
  28. Each check is always in one of the following states, depicted by a status icon:
  29. <span class="status ic-new"></span>
  30. : **New**. A newly created check that has not received any pings yet. Each new
  31. check you create will start in this state.
  32. <span class="status ic-up"></span>
  33. : **Up**. All is well, the last "success" signal has arrived on time.
  34. <span class="status ic-grace"></span>
  35. : **Late**. The "success" signal is due but has not arrived yet.
  36. It is not yet late by more than the check's configured **Grace Time**.
  37. <span class="status ic-down"></span>
  38. : **Down**. The "success" signal has not arrived yet, and the Grace Time has elapsed.
  39. When a check transitions into the "Down" state, SITE_NAME sends out alert
  40. messages via the configured integrations.
  41. <span class="status ic-paused"></span>
  42. : **Paused**. You can manually pause the monitoring of specific checks. For example,
  43. if a frequently running cron job has a known problem, and a fix is scheduled but
  44. not yet ready, you can pause monitoring of the corresponding check temporarily to
  45. avoid unwanted alerts about a known issue.
  46. <span class="status ic-up"></span><div class="spinner started"><div class="d1"></div><div class="d2"></div><div class="d3"></div></div>
  47. : Additionally, if the most recent received signal is a "start" signal,
  48. this will be indicated by three animated dots under check's status icon.
  49. ---
  50. **Ping URL**. Each check has a unique **Ping URL**. Clients (cron jobs, background
  51. workers, batch scripts, scheduled tasks, web services) make HTTP requests to the
  52. ping URL to signal a start of the execution, a success, or a failure.
  53. While the "success" signals are essential, "start" and "failure" are optional.
  54. You don't have to use them, but you can gain additional monitoring insights
  55. by using them. See [Measuring script run time](measuring_script_run_time/) and
  56. [Signaling failures](signaling_failures/) for details.
  57. You should treat ping URLs as secrets. If you make them public, anybody can send
  58. telemetry signals to your checks and mess with your monitoring.
  59. ---
  60. **Grace Time** is one of the configuration parameters you can set for each check.
  61. It is the additional time to wait before sending an alert when a check
  62. is late. Use this parameter to account for small, expected deviations in job
  63. execution times. If you use "start" signals to
  64. [measure job execution time](measuring_script_run_time/), Grace Time also sets the
  65. maximum allowed time gap between "start" and "success" signals. If a job
  66. sends a "start" signal but then does not send a "success" signal within grace time,
  67. SITE_NAME will assume the job has failed, and send out alerts.
  68. ---
  69. An **Integration** is a specific method for delivering monitoring alerts when checks
  70. change states. SITE_NAME supports many different types of integrations: email,
  71. webhooks, SMS, Slack, PagerDuty, etc. You can set up multiple integrations.
  72. For each check, you can specify which integrations it should use.
  73. For more information on integrations, see
  74. [Configuring notifications](configuring_notifications/).
  75. ---
  76. **Project**. To keep things organized, you can group checks and integrations in **Projects**.
  77. Your account starts with a single default project, but you can create any number
  78. of additional projects as needed. You can transfer existing checks between projects
  79. while preserving their configuration and ping URL.
  80. Each project has a configurable name, a separate set of API keys, and a separate
  81. project team. The project's team is the set of people you have granted read-only or
  82. read-write access to the project.
  83. For more information on projects, see [Projects and teams](projects_teams/).