Offline operation fails

vedecoid · August 25, 2019, 9:55pm

Our application requires offline seamless operation (both at connectionloss and after cold boot without Wifi signal). We embedded the instantiation of ConnectionManager as very first statements in the squirrel code

local cmsetup = {
"startBehavior"	    : CM_START_NO_ACTION,
"stayConnected" 	: true,
"retryOnTimeout"    : true,
"blinkupBehavior"	: CM_BLINK_ALWAYS,
"checkTimeout"	    : 5,
"connectTimeout"	: 60.0,
"errorPolicy"	    : RETURN_ON_ERROR,
"waitPolicy"	    : WAIT_TIL_SENT,
"ackTimeout"	    : 1.0 };

cm <- ConnectionManager(cmsetup);
imp.setsendbuffersize(8096);

When using this in a very minimal application, it works as expected.
When bundling this with a 5000+ lines real application, it doesn’t. Not when the wifi signal is lost and not when cold booting.

Before I start to scrutinize all that code, are there typical things one should not do that can cause the imp to hang/block/restart… when not connected ? . I suspect the device code is crashing somewhere when not connected, but I have no clue what kind of action may cause that.
We do have a serial port that can be reconfigured to serve as debug port, I assume an indication of a continuously rebooting imp is if writes to the serial port that happen very early in the code (eg right after the creation of cm and off course after first configuring the uart properly ) keep on being repeated…

That being said, it does look like the device is actually going in the suspended state. After 60secs, the attempts to connect stop and restart after 9 mins. which doesn’t really correlate with a continuously crashing device

Any advice is more than welcome

PS. I’m not using any sleep or deep sleep states in this app

hugo · August 26, 2019, 5:05pm

If you are seeing issues at cold boot:

note that if there’s no connection, and you’re not using a rescue pin setup, it’ll be 10s before a wifi/ethernet imp runs your user code (to allow you to recover from having deployed bad code)
the thing that almost always trips people up is that they do something (a server.log, an agent.send) before they initialize the policy - here that’s done by the connection manager init. Make sure you’re calling that before you do anything else at all.

If the device hits a runtime error, or if it tries to send when the policy is still SUSPEND_ON_ERROR (ie default), then the unit will not be running your code. Easiest way to see that is to set up your debug UART and install a https://developer.electricimp.com/api/imp/onunhandledexception handler to log the error to the UART (and flush it!) before calling imp.restart(). If you miss the flush then the UART will get unconfigured and you won’t see the message.

vedecoid · August 26, 2019, 6:29pm

Thx.
Didn’t know about the 10sec delay, but we waited longer than that so probably not the issue. The instantiation of ConnectionMgr is the very first thing that is done after the #require… section.
I do assume all necessary work is done in the constructor, right ?

Good idea to use the onunhandledexception to trap the potential error. Didn’t think about that…

vedecoid · August 26, 2019, 8:42pm

Interesting observation already after first inspection with logging to serial port. The squirrel actually crashes in the ConnectionManager code

This happened when running code and then switching off the Access Point forcing connection loss.

I’ve only briefly browsed through the lib github, but it seems that it has a problem when the connection is lost and none of the callbacks are defined (which is indeed, for this test, the case - we only instantiate cm, nothing else…-)

Could someone at eImp check this and indicate what prerequisites exist wrt defining the ConnectionManager callbacks ?
interesting as well is that, after this error, the imp doesn’t restart although the onunhandledexception callback does call for it…

imp.onunhandledexception(function(error) {
ErrorLog("******Globally caught error: " + error);
	debugTx.flush();
	imp.wakeup(5,function(){server.restart();});

});

does that mean we can’t delay the restart with an imp.wakeup ?

hugo · August 26, 2019, 8:55pm

On the connection manager - that line of code will only be reached if there is a callback registered, from what I can see. @betsyrhodes can you comment?

The unhandled exception handler, when it exits, will try to report the error as normal. If you want to restart, you have to do it within the handler - there is essentially no squirrel world left at that point for you to return to.

If you want to delay the restart, you would use a synchronous call like imp.sleep() before you call imp.reset(). I’m not sure you can use server.restart() in this context.

vedecoid · August 26, 2019, 9:00pm

you’re right. just inspected our ‘preprocessed’ code and we do have the onConnect and onDisconnect callbacks defined with the onDisconnect one missing the ‘expected’ parameter. Let me check if this brings us closer to proper operation …

vedecoid · August 26, 2019, 9:17pm

Just eliminated a (stupid) bug in the onDisconnect callback, making continued operation after wifi loss now working properly.
Next test was booting without wifi signal, which actually crashed on a part of the bootMessage function I think I copied from SmittyTone (quite some time ago so could be there’s already an updated version of it) with message “the index ‘active’ does not exist: in bootMessage device_code:96”

	// Get current networking information
local i = imp.net.info();

// Get the active network interface (or the first network on
// the list if there is not network marked as active)
local w = i.interface[i.active != null ? i.active : 0];

I can easily avoid that error by trapping it in a try-catch, but what would be the proper way to detect a non-existing wifi connection right after boot ?

But, the good news of the evening is that we now have proof the Imp effectively boots without wifi signal - now it’s just a matter of eliminating all the error conditions that seem to arise in that state

vedecoid · August 26, 2019, 10:27pm

All errors caught (surprising how many times one uses the “active” key in net.info) and offline operation now works well. Thx for the support !
This would have been very difficult without a serial connection to redirect the logs to and without the “onunhandledexception” handler.

smittytone · August 27, 2019, 9:15am

Current version changes that last line to:

local w = i.interface["active" in i ? i.active : 0];