Power state, how to keep it on

epor · January 23, 2013, 7:59am

Mine was doing alright for a day or so. I decided to run the watchdog every hour. The last time it was going offline, I could still see watchdog messages. However, this time it seems to be totally gone and i have not heard a peep from it in days. I can’t reset it till tonight as I’m coming back from vacation.

January 18, 2013 1:10:19 AM EST: watchdog
January 18, 2013 2:14:40 AM EST: watchdog
January 18, 2013 3:19:02 AM EST: watchdog
January 18, 2013 4:23:23 AM EST: watchdog
January 18, 2013 5:27:43 AM EST: watchdog
January 18, 2013 6:32:03 AM EST: watchdog
January 18, 2013 6:43:12 AM EST: Light
January 18, 2013 6:43:12 AM EST: 1
January 18, 2013 6:59:22 AM EST: Light
January 18, 2013 6:59:22 AM EST: 0
January 18, 2013 7:09:33 AM EST: Power state: online=>offline

hugo · January 23, 2013, 1:47pm

We’ve noticed something in common during our replication of this bug, and it would appear that actually sending output data (even if it’s ignored) vs logging improves reliability; this is due to the way actual output data is handled (higher priority than logging data).

I’d appreciate it if you could try adding an output port and sending data to it, like this:

`wd <- OutputPort(“watchdog”);

// add a dummy output port; keep whatever I/Os you currently have where they are
imp.configure(“my device”, [myinput], [myoutput, wd]);

// send something on the output port periodically
function watchdog() {
imp.wakeup(5*60, watchdog);
wd.set(0);
}

watchdog();
`

(remove the previous watchdog code)

epor · January 24, 2013, 10:41am

Ugh. My last comment involved an ID10T error on my part: It got powered off by a light switch. My bad. I will still try out this code anyway.

moose · January 24, 2013, 1:35pm

@Hugo: running my Imp now with the proposed watchdog code added.
Will keep you posted.

moose · January 27, 2013, 3:32am

@Hugo: Imp running fine for a few days now…

hugo · January 27, 2013, 6:47pm

@moose: We believe we’ve found what was causing the issue, and the watchdog (or actually, any squirrel code running regularly) fixes the issue that was biting most people who had “silent” devices. The real fix for this is now in our master tree and will escape into the next release.

We’re running a test farm on various ISPs and keeping a close eye on them, as we want to ensure the corner cases are covered before we ship anything new out.

moose · January 28, 2013, 12:59pm

Good to hear it is being solved in a structural way! The second part of your post anticipates on what would have been my next question, so I will not ask

kenkoknz · January 28, 2013, 3:21pm

does this means that we no longer need the watchdog timer in our code after the next firmware release?

hugo · January 28, 2013, 10:23pm

@kenkoknz: Correct.

Ramsrin · January 29, 2013, 12:46pm

That’s great news. When the fix is released, is there anything to be done or would devices be auto-updated. Thanks!

Ramsrin · January 29, 2013, 3:13pm

I just noticed that after running fine for a couple of days it went offline last night. Below are last 3 entries in the log:
1/28/2013 10:21:09 PM: 1
1/28/2013 10:21:27 PM: 0
1/29/2013 3:14:44 AM: Power state: online=>offline

hugo · January 30, 2013, 1:46am

@ramsrin: were you running any watchdog code at all? I don’t see any log messages indicating you were. Right now, this is required to deal with a bug that is fixed in the next revision.

nigelibrown · January 30, 2013, 4:14am

Hugo,
Last night I saw the following problem and the watchdog does the following:

imp.wakeup(5*60, watchdog);
zoneOutputs[0].set(0);
server.log("watchdog");

Tue, 29 Jan 2013 16:14:39 GMT: Device configured to be "Panic"
Tue, 29 Jan 2013 16:14:39 GMT: watchdog
Tue, 29 Jan 2013 16:19:51 GMT: watchdog
Tue, 29 Jan 2013 16:25:13 GMT: watchdog
Tue, 29 Jan 2013 16:30:34 GMT: watchdog
Tue, 29 Jan 2013 16:35:58 GMT: watchdog
Tue, 29 Jan 2013 16:41:18 GMT: watchdog
Tue, 29 Jan 2013 16:46:40 GMT: watchdog
Tue, 29 Jan 2013 16:52:02 GMT: watchdog
Tue, 29 Jan 2013 16:57:27 GMT: watchdog
Tue, 29 Jan 2013 17:02:46 GMT: watchdog
Tue, 29 Jan 2013 17:08:08 GMT: watchdog
Tue, 29 Jan 2013 17:13:30 GMT: watchdog
Tue, 29 Jan 2013 17:18:54 GMT: watchdog
Tue, 29 Jan 2013 17:24:14 GMT: watchdog
Tue, 29 Jan 2013 17:29:36 GMT: watchdog
Tue, 29 Jan 2013 17:34:58 GMT: watchdog
Tue, 29 Jan 2013 17:40:20 GMT: watchdog
Tue, 29 Jan 2013 17:45:42 GMT: watchdog
Tue, 29 Jan 2013 17:51:04 GMT: watchdog
Tue, 29 Jan 2013 17:56:26 GMT: watchdog
Tue, 29 Jan 2013 18:01:50 GMT: watchdog
Tue, 29 Jan 2013 18:02:25 GMT: Power state: online=>offline

hugo · January 30, 2013, 7:53pm

@nigelibrown I’ll contact you for more details. Definitely shouldn’t have happened.

DolfTraanberg · January 30, 2013, 10:21pm

31-1-2013 01:44:51: Gem. Dagverbruik: 329.748 W/h
31-1-2013 01:44:51: Gem. Maandverbruik: 481.048 W/h
31-1-2013 01:44:59: Power state: online=>offline

This started today, but the imp keeps working, all four COSM datastreams are still updated.

When I log out of the planner and log in again, everything works again. so I suppose it is not the imp, but the connection between pc and server that causes the power state offline error

Ramsrin · January 30, 2013, 11:01pm

Happened again earlier today - off-on-off. Went off within 4 mins of last activity. I do have the older version of watchdog running without anything output to the server log. Does it require logging to function?
01/30/2013 19:25:54: 0
01/30/2013 19:25:54: 1
01/30/2013 19:26:08: 0
01/30/2013 19:29:51: Power state: online=>offline
01/30/2013 19:30:00: Power state: offline=>online
01/30/2013 20:06:17: Power state: online=>offline

hugo · January 31, 2013, 1:06am

@dolf: it actually sounds like it’s the browser-server connection that’s the problem there. If you refresh the page, does the logging continue?

@ramsrin: The on->off and off->on within 10 seconds would have been a server update, which currently are visible as short transitions (and totally unrelated to your code). The online->offline at the end is the one worrying me. Just PMed you.

DolfTraanberg · January 31, 2013, 5:59am

I hope I can reproduce this

kenkoknz · January 31, 2013, 6:09pm

Hi,
Understanding you guys have a solution in the pipeline, but this is fundamental to the reliability & practicality of this cool device; Is it happening more just to me & a few others only? Anyway what triggers it to go online again? In the mean time I will wait patiently for next release.

Latest (watchdog set every minute):
Friday, February 01, 2013 11:38:34: watchdog
Friday, February 01, 2013 11:39:32: watchdog
Friday, February 01, 2013 11:40:32: watchdog
Friday, February 01, 2013 11:41:32: watchdog
Friday, February 01, 2013 11:42:32: watchdog
Friday, February 01, 2013 11:42:46: Power state: online=>offline
Friday, February 01, 2013 11:43:01: Power state: offline=>online
Friday, February 01, 2013 11:43:32: watchdog
Friday, February 01, 2013 11:44:32: watchdog
Friday, February 01, 2013 11:45:34: watchdog
Friday, February 01, 2013 11:46:32: watchdog
Friday, February 01, 2013 11:47:32: watchdog
Friday, February 01, 2013 11:48:32: watchdog
Friday, February 01, 2013 11:49:34: watchdog
Friday, February 01, 2013 11:50:34: watchdog
Friday, February 01, 2013 11:51:34: watchdog
Friday, February 01, 2013 11:52:34: watchdog
Friday, February 01, 2013 11:53:32: watchdog
Friday, February 01, 2013 11:54:33: watchdog
Friday, February 01, 2013 11:55:32: watchdog

hugo · January 31, 2013, 7:59pm

When the servers are updated, devices will drop off and then immediately reconnect (reconnect is initiated by the imp) - typically they are offline for ~10-20 seconds. Server software updates will become transparent in the near future, but right now you see these transitions.

The issue this thread is tracking is more about devices that drop off (unrelated to the short server updates) and don’t come back. Typically they are marked offline because they miss a connection maintenance update reply window. We’re still looking for a repeatable case of this that we can recreate and debug.

Note that you don’t need to have such a fast watchdog; every 30 mins is usually just fine. There is activity on the link below the level that you can see in the logs which is to keep NAT happy.