Pinging Reliability Tips

Sending monitoring signals over public internet is inherently unreliable. HTTP requests can sometimes take excessively long or fail completely for a variety of reasons. Here are some general tips to make your monitoring code more robust.

Specify HTTP Request Timeout

Put a time limit on how long each ping is allowed to take. This is especially important when sending a "start" signal at the start of a job: you don't want a stuck ping prevent the actual job from running. Another case is a continuously running worker process which pings SITE_NAME after each completed item. A stuck request would block the whole process, so it is important to guard against.

Specifying the timeout depends on the tool you use. curl, for example, has the --max-time (shorthand: -m) parameter:

# Send a HTTP, 10 second timeout:
curl -m 10 PING_URL

Use Retries

To minimize the amount of false alerts you get from SITE_NAME, instruct your HTTP client to retry failed requests several times.

Specifying the retry policy depends on the tool you use. curl, for example, has the --retry parameter:

# Retry up to 5 times, uses an increasing delay between each retry (1s, 2s, 4s, 8s, ...)
curl --retry 5 PING_URL

Handle Exceptions

Make sure you know how your HTTP client handles failed requests. For example, if you use a HTTP library which raises exceptions, decide if you want to catch the exceptions, or let them bubble up.