Hardware watchdog

Are there any plans for making the hardware watchdog of the Cortex M3 available through the API?

I had an Imp at a remote location go offline today and it never came back online - and still hasn’t. This appears be a well-known issue (see here) - hopefully with a forthcoming fix. However, it would be very useful to have the watchdog as a last resort for when something unexpected happens. Especially in those situations where power-cycling isn’t feasible, e.g. when Imps have been deployed off-site.

The watchdog is likely to be enabled in a future release; however, it wouldn’t help with the issue that you’re seeing, because the imp is quite happily running - it’s just offline and doesn’t appear to feel the need to reconnect.

We can’t tie the watchdog to “being online” because, apart from affecting people who don’t want to be online continuously, it could abort connections in progress too.

The main thing here is that you expect connected imps to stay online in all circumstances, and aggressively reconnect as necessary. This is what they are supposed to do and what we’re further augmenting our automated test rigs to ensure.

It would generally make good sense if the hardware watchdog was configurable directly through Squirrel code (and not just used “internally” by the OS - as you are perhaps hinting at). It would then also work for the specific issue at hand; the watchdog would just have to be kicked by some part of the code receiving heartbeats from the server. Even a falsely happy Imp would then be forced to restart. :slight_smile:

I also had to visit my 3 running imp cards and do a power cycle by hand. on July 8-9
One was fast blinking Red and another other had no lights. I don’t remember what the 3rd one was doing. Both appeared to operate normally after I ejected and re-seated the cards.

imp.reboot() ??
Or even better
imp.reboot(“Optional Emergency Model”);

Maybe it is possible now to generate an exception error on demand, so that the imp will be forced to reboot (and name that function reboot()).
Dirty coding, but what the hack…
Better than visit the Mojave Desert twice a month, to restart my Imp-temperature-logger :wink:

If you call a non-existing function, you will have your exception error and the imp will reboot…

So, just call reboot();

… and it will… :))

Letting the Imp reboot itself has it’s limitations. It will only work if the Imp is able to detect something being wrong, and that might not always be the case, e.g. OS crashes, endless loops or in situations such as the one we saw yesterday. The advantage of hardware watchdog timers is that they operate independently of what the Imp is doing or not doing, and they will only be triggered if neglected (e.g. by a hanging Imp).

right, but i’m talking about a falsely happy Imp…

You can force a reboot with imp.deepsleepfor() (it’ll sleep, then it’s a full reboot).

The issue with exposing the watchdog to squirrel is that the maximum HW watchdog period is 70ms; it’s very possible that you may not be able to service it fast enough from inside the VM especially if you end up blocking on an I/O operation.

Yes, the 70ms would hardly be of any use on the application-side. How do you get that number? Using a suitable prescaler value for the watchdog timer, I would have thought you should be able to achieve considerably higher values.

There are two hardware watchdogs on the STM32 that we use: WWDG has a maximum period of 70ms, IWDG 32s. Most applications will be able to service a 32s watchdog from Squirrel.

Although, having said that, if Squirrel is still running event handlers then you don’t need a hardware watchdog; a software one would suffice:
function watchdog() { if (!server.isconnected()) { reboot(); } imp.wakeup(30.0, watchdog); }

(Using imp.deepsleepfor instead would cause a successful warm boot, which unlike a Squirrel error does not force a reconnection.)

Peter

Just checked the datasheet… the prescalable IWDG is exactly what I had in mind.

The suggested software watchdog will go a long way, and I will also be using something like that. However, for recovering from the odd OS/VM hangs, we would still need the hardware watchdog, and I’m hoping to see some kind of support for this added to the API - at some point. :slight_smile:

thanks Peter.
I love it, when the pro’s come in…

reboot isn’t defined. Peter, what do you suggest we do for this? Or is that the point. There error (of not having reboot) would cause a reboot?

...The error (of not having reboot) would cause a reboot...