No data GMT+1 ~5 to ~11 14 mars 2014

What happend this morning?

No data were send out during this time?

It seems to be the server, is there any place to look for current problems?

Miki normaly twitter https://twitter.com/impstatus but nothing on today issues. Though looks like things are coming back online

From my logs it was 6hours from problem started till it was gone again, with no public status reports.

Hopefully it will get noticed faster next time, 6hours in these days and where we have no other options to restore connectivity than wait, does not help win over the skeptics.

I think i read somewhere on these forums/pdf that EI is designed to fail across multiple servers, but maybe thats only for blessed devices.

I saw the same lack of data on my Xively feed for a device I have reporting every 15 minutes. In my logs, I see no errors and on the IDE continued logging for that approximately six hour period. It does look like my agent failed to send to Xively. All seems fine now.

Yes, I was right in the middle of testing when this problem occurred. I tore my hair out for a hour before giving up and calling it a day. Interestingly, I also encountered another problem that I couldn’t attribute to my own code. Invoking format("…%c…"…) seemed to result in lines of base64 garbage in my server log, even when I wasn’t writing to server.log()! I’m not using any base64 encoding or decoding in my agent or device. Has anyone else experienced that?

Follow Up: Running one of the strings through a base64 decoder indicates that it’s a debugging line that somehow ends up being printed in base64. Perhaps that’s how it’s wrapped when sent to the server and the decoder fails to decode it properly. Hmm.

OK, I think I know what it is now. Looks like I may have been passing a ctrl-character of some kind to server.log(). The imp device probably base64 encodes it before sending it, but the ctrl character screws up the decoding character at the other end.

Unfortunately, the 6 hours was in the middle of the night PDT, and our automatic monitoring didn’t catch it so nobody was paged. It affected one server, which is the one non-blessed devices use. When our UK office woke up they restarted the affected service and things improved.

The monitoring has been fixed, and there’s now a secondary monitor where one of my imps will text me repeatedly if it happens again and for some reason pagerduty fails too :slight_smile:

We posted to impstatus when we had a handle on the issue, but again, this was rather a lot later than was ideal and this scenario should not go unnoticed again. Hope the hair grows back!

@Hugo

Hope the hair grows back!
Looking at your avatar, I'll doubt that this solution will solve this... :)