Heartbeat Monitoring with Webscript

At Webscript, we love getting email that begins, "I am writing to let you know that I find Webscript really cool and useful!" It gets even better when the correspondent has put Webscript to good use and blogged about it. This happened last week when Charalampos Doukas wrote us about his post on how he was checking the status of his Arduino device. (Arduino is an open-source electronics prototyping platform based on flexible, easy-to-use hardware and software.) This application is a great match for Webscript and nicely captures the usefulness of cron jobs. Charalampos kindly permitted us to share (and tweak) his code, and I'll explain how it works.

Suppose that you want to monitor the health of a remote but network-attached device. If that device can accept incoming traffic, you can simply use a Webscript cron job to ping it every few minutes to check for a healthy response.

But what if the device cannot accept incoming requests? Faced with this limitation, you can have the device make an outgoing "heartbeat" HTTP request every N minutes to a monitoring service that will report missing heartbeats.

To determine if there's a missing heartbeat, you can use a cron job to check how long it's been since the last recorded heartbeat. If that time is over N minutes, then you know that there's a problem.

Handling these two jobs with Webscript is cleanly done with two scripts. The first script will accept a heartbeat ping and record it's arrival time in persistent storage:

heartbeat.webscript.io/ping

storage.lastping = os.time()
return "ok"

The second, monitoring script is a cron job that checks if the time recorded for the last ping is within the last N minutes. Checking that is straightforward. Its only complication is handling the first case where there might not have been any heartbeats yet, so it defaults (pessimistically) to the beginning of time (aka 0):

heartbeat.webscript.io/check (incomplete)

local N = <INTERVAL IN MINUTES>
local THRESHOLD = N * 60 -- N minutes
local lastping = tonumber(storage.lastping or 0)
local dead = os.time() - lastping > THRESHOLD
if dead then
        -- do something here
end
return { lastping=lastping, dead=dead }

Strictly speaking, we did not need to return anything here, but I chose to return the lastping and dead status in a JSON object because they would automatically be stored in Webscript's logs and I can access them should I need them for debugging purposes.

Of course, the script isn't complete without some useful reaction to a dead device. Getting a text message from Twilio seems like an appropriate consequence:

heartbeat.webscript.io/check

local twilio = require 'twilio'
local ACCOUNTSID = '<YOUR TWILIO ACCOUNT SID>'
local AUTHTOKEN = '<YOUR TWILIO AUTH TOKEN>'
local TWILIONUMBER = '<YOUR TWILIO ACCOUNT PHONE NUMBER>'
local RECIPIENT = '<YOUR PHONE NUMBER>'
local N = <INTERVAL IN MINUTES>
local THRESHOLD = N * 60 -- N minutes
local lastping = tonumber(storage.lastping or 0)
local dead = os.time() - lastping > THRESHOLD
if dead then
        twilio.sms(ACCOUNTSID, AUTHTOKEN, TWILIONUMBER, RECIPIENT, 'Arduino is Down!')
end
return { lastping=lastping, dead=dead }

The final step, of course, is to set the cron job to call the checking script every few minutes. The cron job interval will determine how quickly a missed heartbeat will be recognized.

With these two simple scripts and the cron job, it's easy to monitor the heartbeat of a device. We thank Charalampos for sharing this use with us.

(Note that once the heartbeat stops, this example will send a text every time the check is executed until the heartbeat resumes. To avoid the repetition, use persistent storage to track the previous state of the device and only report transitions from healthy to dead.)

The code from this post is also available on our examples page.

We are always interested in new uses for Webscript. Let us know what you are up to at support@webscript.io.