A "real" watchdog function is strongly needed for IMP

Hi,
I have been using IMP for two years as server&controller for my heating system . It has been a real pleasure with all advantages on software handling but…
Unfortunately I now decided to give it up ( at least for the moment) after spending too many hours trying to tune my code to avoid &handle any unforeseen hangups.
E.g. today the IMP control stopped at around 2 in the night and the heating was on nightlevel when I got up…
I have built in all possible counters and server.log s but it all comes down to that in rare occasions ( like once a month) a reset of IMP is the only thing that can get it starting on my code again.
I therefore feel I cannot thrust IMP for critical control functions without a real watchdog function ( like found in any u-controlled systems I have been using over the years). For the moment I am now porting the Squirrel to an AVR setup which I know can run 24/7 for years.
I noted recently that there stills seems to be some plans for “real” watchdog . If it happens I really would like to test it firsthand.
Looking in the forum I note that I am not alone in the crowd.

Still IMP fan…

I am not sure the imp is even made to handle critical functions. I am using a few imps around my house, but all of them are for things that does not really matter if they for some reason fail.

Would be curious to know how you designed your watchdog features.

As we all know, every system eventually fails unless it has built in redundancy for hot swapping, which in itself relies on some sort of watchdog functionality. So to introduce watchdog features I would have started by looking at likely availability levels of the cloud-based agent/server versus device. As such it could be assumed that agent would typically be available at the 99% levels while with devices it would typically be a lot less.

You would then work from there in terms of developing your localised and remote notification and alarming functionality.

FWIW (and seriously, I’m not affiliated with GroveStreams other than being a very satisfied customer), you may consider creating a GroveStreams account and pushing data up frequently (every minute?). You can then easily set up a latency event that fires if/when data is not received within the specified period (perhaps 5 minutes?). This event could send you a message or email or send a message to the agent to have it send the/any device a message to do something. It’s really up to you. Whereas this may not be able to handle a situation that requires a reset/power cycle of the imp, it may be adequate to handle your situation.

Other cloud services may provide the same “latency” type of functionality. We use it to learn when a device has gone offline due to a cellular issue.

We’ve been using imps in a multitude of important monitoring and control applications to date and have yet to have one “hang”. It would be interesting to determine what is causing the devices to hang.

I’d be interested to see the code which hangs; we have imps that have been running for years without a single hang - and so have many customers.

Are you sure the issue is not related to power? Did the imp still respond to local network pings when it was hung?

The imp does indeed have a watchdog which will reboot the system if there is an issue at an impOS level. The squirrel code watchdog - which helps where the user’s squirrel code enters an infinite loop - is coming later this year.