Device keeps on disconnecting for no apparent reason

Lately I’ve been getting random disconnects from every few minutes to tens of minutes. I’ve used the following simple device test loop to demonstrate the problem:

`
function loop() {
server.log(“going to sleep”);
// Sleep Imp for a second
imp.wakeup(1.0, loop);
}

imp.setpowersave(true);
loop();`

Typical log:

2015-10-07 16:50:03 UTC+1 [Device] going to sleep 2015-10-07 16:50:04 UTC+1 [Device] going to sleep 2015-10-07 16:50:05 UTC+1 [Device] going to sleep 2015-10-07 16:50:06 UTC+1 [Device] going to sleep 2015-10-07 16:50:07 UTC+1 [Device] going to sleep 2015-10-07 16:50:09 UTC+1 [Device] going to sleep 2015-10-07 16:50:10 UTC+1 [Device] going to sleep 2015-10-07 16:50:11 UTC+1 [Device] going to sleep 2015-10-07 16:50:58 UTC+1 [Status] Device disconnected 2015-10-07 16:50:58 UTC+1 [Status] Device disconnected 2015-10-07 16:51:14 UTC+1 [Status] Device connected 2015-10-07 16:51:14 UTC+1 [Device] going to sleep 2015-10-07 16:51:15 UTC+1 [Device] going to sleep 2015-10-07 16:51:16 UTC+1 [Device] going to sleep 2015-10-07 16:51:17 UTC+1 [Device] going to sleep 2015-10-07 16:51:19 UTC+1 [Device] going to sleep 2015-10-07 16:51:20 UTC+1 [Device] going to sleep 2015-10-07 16:51:21 UTC+1 [Device] going to sleep 2015-10-07 16:51:22 UTC+1 [Device] going to sleep 2015-10-07 16:51:23 UTC+1 [Device] going to sleep

The log stops with the device going to sleep for typically a few tens of seconds, then starts up again after one or two " Device disconnected" messages.

To try narrowing things down further I’ve tried two imp 01’s (both behave the same) in a single April. I thought the RSSI might be a factor (Hugo mentioned the imp might overpower the wifi router at close range) but I’m sure I’ve ruled this out as it behaves the same from -30dB to -70dB.

I thought it might be supply voltage so I have a scope on the 3V3 and run the April from a 5V 3A bench PSU. No glitches there. I’m al out of ideas now.

WiFi powersave mode can be unreliable depending on the router - this is not an imp thing, it’s just that some routers are better than others (plus it depends on network traffic).

Essentially the AP can randomly disassociate clients if it feels they have not been responsive enough - often this can be down to other network interference - and the imp then has to reassociate and reconnect.

Try without powersave on and see if it’s more reliable?

You’re right of course. Powersave was the only variable I could see having any bearing on this issue - which is why I included it in my simple test loop.

The random disconnection will happen with powersave in either state if the imp and router are in the same room. They just don’t seem to get on very well together. But conducting the test with a RSSI at around -70 dB with powersave off I do appear to get get a solid connection. That’s what was confusing the issue. The problem is, I could really do with the power saving!

By defaullt, when the AP disassociates the imp, it takes anything from just a couple of seconds to nearly a minute after which deep sleep is entered for 9 minutes which kills my particular app. With a reasonable signal level available I’m surprised it sometimes takes a minute or more to reassociate but it does… If I simply set the timeout to 2 minutes using:

server.setsendtimeoutpolicy(SUSPEND_ON_ERROR, WAIT_TIL_SENT, 120)

…then it seems (from only about an hour’s worth of experience!) that it can sometimes take up to around 90 seconds to sort things out between the imp and the AP. Makes you wonder what the conversation is like!

Anyway, with unfriendly routers like mine (Technicolor TG799) an increased timeout seems to ease things a bit if frequent comms and low power are needed.

If you can capture a wifi pcap of the upset (likely means running wifi with encryption off) then that’d be most helpful to debug.

You can also use https://github.com/electricimp/ConnectionManager to replace the suspend_on_error policy with one that suits your application better.

Ok, now I think I’m getting closer to the real cause of the random disconnects. So far everything I’ve done above (including the use of ConnectionManager calls) have only been having a palliative effect. That is, regaining the connection quicker when it drops.

But now I have found that if I comment out certain (time consuming?) parts of my device code I can get a constant connection even right next to the router.

My code is all event driven but when a 337 char string arrives via the uart (@128000baud) I have to lightly process it then agent.send the result (160 bytes) twice a second. I’ve stripped it all down to the bare bones but somehow it’s still blocking the maintenance of the WiFi connection.

Here’s my Device test code:

`
// regularly get value from Imp’s wifi signal strength
// and send to agent so we can see device is alive
function loop() {
agent.send(“rssi”, imp.rssi()); //receipt of rssi triggers publication
imp.wakeup(4.0, loop); // Sleep imp for another 4 seconds
}

// Function triggered by receipt of a byte from uart
function readback() {
local byte = serial.read();

// Ignore initial input / no data signal
if (byte == -1) return;

if (byte == 13) {
    agent.send("stringLen",inputString.len());
    inputString = "";
} else {
    // Add the input character to the buffer
    inputString = inputString + chr(byte);
}

}

// Convert passed integer value Ascii code into a character string
function chr(asciiValue) {
if (asciiValue < 32) return “”;
return format("%c", asciiValue);
}

//extend default retry period to 120 seconds
//server.setsendtimeoutpolicy(SUSPEND_ON_ERROR, WAIT_TIL_SENT, 120)

//imp.setsendbuffersize(8096); // a bigger buffer size
serial <- hardware.uart57;
inputString <- “”;
serial.setrxfifosize(1024); //accomodate large data xfer from catbrain
serial.configure(128000, 8, PARITY_NONE, 1, NO_CTSRTS, readback);
//imp.setpowersave(true); //This drops down to around one tenth of the normal operational power, with only a small impact on incoming message latency.
loop(); //kick-off regular updates to agent
`

And the test Agent:

`
// log connection events seen by the agent
device.onconnect(function() {
server.log(“Device connection event”);
});

// log disconnection events seen by the agent
device.ondisconnect(function() {
server.log(“Device disconnection event”);
});

// log “rssi” messages from the device
device.on(“rssi”, function(rssi) {
server.log(“RSSI=” + rssi);
});

// log how many bytes device recieved via uart
device.on(“stringLen”, function(stringLen) {
server.log(“stringLen=” + stringLen);
});

// regularly report supposed connection status
function loop() {
server.log(“Current connection state=”+device.isconnected());
imp.wakeup(4.0, loop); // Sleep for another 4 seconds
}

loop();`

And a log showing a typical disconnection (at around 12:20):

2015-10-31 12:19:42 UTC+0 [Status] Device connected 2015-10-31 12:19:42 UTC+0 [Agent] Device connection event 2015-10-31 12:19:42 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:42 UTC+0 [Agent] RSSI=-33 2015-10-31 12:19:42 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:42 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:42 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:42 UTC+0 [Agent] stringLen=347 2015-10-31 12:19:43 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:43 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:44 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:44 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:44 UTC+0 [Agent] Current connection state=true 2015-10-31 12:19:45 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:45 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:46 UTC+0 [Agent] RSSI=-38 2015-10-31 12:19:46 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:46 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:47 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:47 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:48 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:48 UTC+0 [Agent] Current connection state=true 2015-10-31 12:19:49 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:49 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:50 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:50 UTC+0 [Agent] RSSI=-42 2015-10-31 12:19:50 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:51 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:51 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:52 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:52 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:52 UTC+0 [Agent] Current connection state=true 2015-10-31 12:19:53 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:53 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:54 UTC+0 [Agent] RSSI=-32 2015-10-31 12:19:54 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:54 UTC+0 [Agent] stringLen=337 2015-10-31 12:19:56 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:01 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:05 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:09 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:13 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:17 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:21 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:25 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:29 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:33 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:37 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:41 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:45 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:49 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:53 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:54 UTC+0 [Status] Device connected 2015-10-31 12:20:54 UTC+0 [Agent] Device disconnection event 2015-10-31 12:20:54 UTC+0 [Agent] Device connection event 2015-10-31 12:20:55 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:55 UTC+0 [Agent] RSSI=-34 2015-10-31 12:20:55 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:55 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:55 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:55 UTC+0 [Agent] stringLen=347 2015-10-31 12:20:56 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:56 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:57 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:57 UTC+0 [Agent] Current connection state=true 2015-10-31 12:20:57 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:58 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:58 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:59 UTC+0 [Agent] stringLen=337 2015-10-31 12:20:59 UTC+0 [Agent] RSSI=-37

Apparently it takes nearly a minute for the Agent to spot the disconnection.
As is, the connection drops every couple of minutes. If I stop the 337 chars going into the device uart, it stays connected all the time. Oddly, if I let the chars come in but comment out the line that sends the char count to the agent, that’s also fine.

I can’t figure out what’s blocking?

This is likely lack of TX buffers in the imp; have you tried setsendbuffersize to increase the default 3k buffer?

Note that your logging (or sending data to the agent to log) is taking TX buffers too, which will impact performance.

@Hugo, yes I tried 8K - if you look in the device code there’s a line commented out:

//imp.setsendbuffersize(8096); // a bigger buffer size

But that made no difference. I don’t see how going bigger than that would help. Note that I’m only sending the len of the string to the agent, I was sending all 337 bytes in the full app but as part of chasing down the issue I changed it to just report the number of bytes it would have sent.

It’s definitely the case that if I comment out the agent.send(“stringLen”,inputString.len()); then the connection is totally stable. So I tried putting it in a function called from imp.onidle() to isolate it from the uart callback but that didn’t make any difference. Likewise I called it from a timer but predictably get the same result.

Still experimenting here: I’ve dropped the baudrate down to 19200, tried injecting just one char instead of the usual 337 payload every 0.5s but both changes make no difference. That’s really odd because if I stop the incoming uart data altogether it also stops the device disconnects completely.

So with the code above, the imp will happily stay connected for an hour or more punting out rssi via the agent to the server log - then I send a few chars into the uart every 0.5s and after a minute or so the imp disconnects and keeps reconnecting/disconnecting until I remove the serial.

Any send takes a whole packet buffer, the size isn’t important (up to ~512 bytes as I remember) - this explains why changes in size make no difference, it’s the overall packet send rate that is problematic.

Could you do:

`server.log(imp.setsendbuffersize(8096);
server.log(imp.setsendbuffersize(8096);
`

(yes, twice, so that we can verify that the buffer size has been increased correctly)

If you can leave your device running (and constantly disconnecting) then email info@ with your device ID we can take a trace and confirm what is happening.

buffer size reports:
2015-10-31 17:38:24 UTC+0 [Device] 8640
2015-10-31 17:38:24 UTC+0 [Device] 8640

Thanks, I’ve just mailed the ID.