Did I screw-up my remote imp006 into needing a manual power cycle to reconnect to network?

I am using an imp006 (600a0c2a69e60474) in a remote spot with variable cellular signal for no other purpose than to upload, as reliably and consistently as possible, 117 byte minutely UART binary packets from my sensor interface MCU to the internet. Power is not an issue and cellular data is not an issue. We don’t need minimalist efficiency; we just need maximum uptime and, most of all, NEVER needing to visit the site (hours away) for the sole purpose of a hardware reset.

I am very excited about the imp006 and the imp platform as I believe it may be able to serve this need better than my current Particle implementation of my sensor network (I am looking for a more stable solution).

I got excited about @hugo 's response to my other thread indicating the ConnectionManager library that has an exciting “stayConnected” parameter - looked like exactly what I needed. I need it to aggressively reconnect and use all the power and data it needs to have maximal uptime.

I am nervous that failure to properly comprehend all the documentation on ConnectionManager and server.setsendtimeoutpolicy has permanently disconnected my Imp006 until a manual power reset, by using "“errorPolicy”: RETURN_ON_ERROR "

The device went offline at 2am (8 hours ago) and hasn’t reconnected since, whereas my Particle Boron in the same spot (with however a much better cell antenna) has reconnected.

Here is my code:
#require “Serializer.class.nut:1.0.0”
#require “SPIFlashLogger.device.lib.nut:2.2.0”
#require “ConnectionManager.lib.nut:3.1.1”
#require “Messenger.lib.nut:0.2.0”
#require “ReplayMessenger.device.lib.nut:0.2.0”

cm <- ConnectionManager({
          "startBehavior": CM_START_CONNECTED,
          "blinkupBehavior": CM_BLINK_NEVER ,
          "stayConnected": true,
          "checkTimeout": 10,
          "connectTimeout": 600.0,
          "ackTimeout": 10.0,
          "errorPolicy": RETURN_ON_ERROR 
        });
imp.setsendbuffersize(8096);

local sfl = SPIFlashLogger();
rm  <- ReplayMessenger(sfl, cm, {"resendLimit": 900});
        
imp.setpowersave(false);
uart <- hardware.uartXEFGH;

lastPacket <- 0

function read_bytes(n)
{
    local rx_buffer = blob(n);
    local data
    data = uart.read()
    while (data != -1)
    {
     rx_buffer.writen(data,'b'); 
     data = uart.read()
    }
    return rx_buffer;
}

server.log("start");

uart.setrxfifosize(117*20)
uart.configure(115200, 8, PARITY_NONE, 1, NO_TX | NO_CTSRTS | READ_READY);
local dBlob = blob(117)

function rob() {
    dBlob = read_bytes(117)
    dBlob.seek(0, 'b')
    local tHeader = dBlob.readn(105) 
    if(tHeader > 1 && tHeader != 9999) {
        if(lastPacket == 0 || tHeader != lastPacket) {
                lastPacket = tHeader
                //rm.send("0", dBlob, RM_IMPORTANCE_HIGH)
                agent.send("RM_DATA", dBlob)
            }
    }
    imp.wakeup(0.01, rob);
}

imp.net.getcellinfo(function(cellInfo) {
    server.log(cellInfo)
});

server.log("Initialized")
imp.wakeup(0.01, rob);

I copied my ConnectionManager init code from someone else on these forums who seeemed to have a similar purpose.

I read the https://developer.electricimp.com/libraries/utilities/connectionmanager documentation, but it did not explain the “errorPolicy” SUSPEND_ON_ERROR/ RETURN_ON_ERROR difference.

Now, after I deployed the above code, I see that this is explained on a different article, with scary implications:


" . If RETURN_ON_ERROR is in effect, the only way for your imp to reconnect is to explicitly call server.connect() or server.connectwith() (impOS 42 and up)."

Unfortunately, I did not get this memo when I read exclusively the ConnectionManager library documentation. I had a strong assumption that the addition of the ConnectionManager library would manage the connection for me and make things better and not worse. Clearly, I should have read the documentation better.

Question: Will my imp006 ever reconnect, absent a manual intervention, given the code I deployed above? If not, the ConnectionManager documentation severely needs to be updated to not totally omit an explanation on “errorPolicy” causing this catastrophic outcome. Or, will impOS eventually reset my imp006 and allow me to flash new code?

Thank you for your help. While this is potentially very frustrating (device is 5 hours away), I am still hoping to evaluate the imp platform and, with the observation of its superior cellular stability to Particle, switch over my network eventually, and hopefully grow it with ElectricImp

So, a little hard to tell from this code:

  • You disabled the LED, which is what will give you status indications. As you’re powered, CM_BLINK_ALWAYS (vs CM_BLINK_NEVER) would be a better thing to pick as then if you see the device not being connected, the LED will tell you what’s going on.

  • The other settings (CM_START_CONNECTED and stayConnected) look correct.

  • What was the blue LED doing when you saw it offline? As that’s connected direct to the modem, that should have been lit.

  • It’s quite possible the device was connected but with the imp LED turned off, you couldn’t see that. I don’t think you’d be receiving data from your serial packets due to the following notes:

  • Your serial code is a bit… strange, which

  • READ_READY is not a flag you pass into uart.configure, it’s a bit that’s part of a flag change callback, see docs. Luckily here it’s the same value as NO_TX (1) hence doesn’t do anything.

  • The way you’re reading bytes is a bit overly complex. uart.readblob(117) will read up to 117 bytes into a blob that is created by the call with a single operation.

  • However, that’s also not going to work because you’re likely to call rob() when bytes are coming in, so you’ll almost never get a 117 byte packet and hence never have any packets to send.

  • Best practice would be to register a callback in your uart.configure, and accumulate bytes. Hopefully the thing sending serial data has a recognizable header and hence you can regain sync - if so, then something like the rx() callback in this example would work https://gist.github.com/hfiennes/7f8892b1856961b53c26b23d14ef60ee … otherwise, if packets have a break between them, you can use UART timing mode to detect packet boundaries and sync up to your packets.

If you have more info on the packet format I can make some example code.

If you do think you’ve caught the device being offline when it shouldn’t be, press reset (don’t power cycle) and it will transmit a black box log to our servers on the next connect, which we can then use to help diagnose an issue if we have the device ID to look at.

1 Like

I appreciate this helpful response Hugo; I am going to get through this, with your help, and hopefully convert my entire fleet to ElectricImp from Particle if and when I can get this stably working.

  1. Device is over 100 miles away, so I can’t see LED
  2. Thank you for your advice about the UART practices, but I can guarantee that it has been picking up the 117 byte packets with perfect synchronization and minutely uploading when connected. That is most certainly not the issue. I am asking about the cellular reconnection behaviors expected with my ConnectionManager() code and lack of an explicit onDisconnect callback.
  3. I will update to use rx callback accumulator, but I am highly certain that isn’t the current issue. There are 2x 117-byte packets per second, almost all of which have a 4-byte “9999” header meaning they are to be ignored. Once every 5 seconds a valid 117-byte packet is received, and my timestamp variable limits the uploading thereof to 60 seconds.
  4. The device has remained offline for 11 hours, which was the first time that it had naturally disconnected due to cellular interruption (common at the site) since I had deployed the code above. It had been uploading for about 4 hours perfectly before that. Since then, the Particle has come back online mulitple times. I think I bricked it with bad reconnection settings and have to revisit site.

My question is better phrased as such:

  1. What is the behavior of “stayConnected” where errorPolicy = RETURN_ON_ERROR and no onDisconnect() or server. onunexpecteddisconnect() handler is specified? Which one wins - will the stayConnected alone " aggressively attempt to reconnect when disconnected" as the documentation claims, or does this assume that the user has specified “SUSPEND_ON_ERROR” for errorPolicy, and that I must travel to my device to physically reset it so I can either program it with a RETURN_ON_ERROR? + disconnect handler which calls cm.connect(), or with SUSPEND_ON_ERROR which will take care of it automatically?

For this question, the code can be cut-down to the following relevant components:

cm <- ConnectionManager({
          "startBehavior": CM_START_CONNECTED,
          "blinkupBehavior": CM_BLINK_NEVER ,
          "stayConnected": true,
          "checkTimeout": 10,
          "connectTimeout": 600.0,
          "ackTimeout": 10.0,
          "errorPolicy": RETURN_ON_ERROR
        });
imp.setsendbuffersize(8096);
        
imp.setpowersave(false);

function harry() {
//Upload minutely data from UART
}

function fred() {
//Upload minutely data from I2C sensor
}

function rob() {
    harry()
    fred()
    imp.wakeup(0.01, rob);
}

imp.wakeup(0.01, rob);

And thanks for your real-time help with this @hugo; I need to book a ferry reservation within the next hour to get to the site tomorrow so diagnosing with certainty that it won’t come back up absent manual intervention is necessary, and I do believe this is the case given the RETURN_ON_ERROR setting with no manual reconnect callback.

The UART thing is likely barely working, as you’re pulling the entire buffer out every 10ms and a 117 byte packet at 115,200bps takes… 10.15ms to arrive. You’re likely just being really lucky with timing.

I would say deploying untested code then going 100 miles away is not ideal. I’m sure you’d be fine in production, but this is your first attempt at such code and best done on your desk initially…

The connection settings do look ok to me, but you did also note that the cellular connection in that region is spotty/low signal, so it’s possible the device is not getting network association. Device ID will help determine that, so can you provide it?

Does your device have wifi credentials set? It will be looking for wifi when the cellular connection goes down before it falls back to cellular if that’s the case. This will use some of the connection timeout period, but should generally still be ok.

stayConnected makes connection manager poll at “checkTimeout” intervals and, if impOS reports that the device is not connected, will kick off a connection attempt. ConnectionManager is ONLY for use with RETURN_ON_ERROR or RETURN_ON_ERROR_NO_DISCONNECT mode - you can’t manage connections in SUSPEND_ON_ERROR mode as the OS does management then. You can see the code in github - you do not need to be calling cm.connect etc as the combination of CM_START_CONNECTED and stayConnected will do that for. you.

Note that you DO NOT want to use RETURN_ON_ERROR_NO_DISCONNECT in your case, despite the name sounding good. This mode prevents any send error from dropping the connection, which effectively means you need to determine in your code whether you’re connected, because any errors during attempted transmissions (eg send timeout, output buffer full, etc) are just returned to you and the connection is still marked up. The actual PPP connection dropping will still cause a disconnect, but there can be scenarios where the radio gets upset to a point where it thinks its online but packets go nowhere and in that case, the send timeout will cause a connection teardown and restart. The NO_DISCONNECT mode is for specific circumstances like high throughput streaming of data to the server.

Notes on your reduced code:

  • sendbuffersize() doesn’t need to be set here, that’s a bit cargo-culty from an example I guess. A bigger send buffer helps performance when you’re streaming a lot of data to the server (like, I’ve seen 20kB/sec on Cat-M sustained)… but you’re only sending hundreds of bytes, which is within the default buffer size anyway.

  • setpowersave() is for wifi, it doesn’t affect anything else (enables 802.11 PS mode). It’ll do nothing at all here, so not harmful but also not effective.

1 Like

Thank you @hugo:

  1. Device ID is 600a0c2a69e60474
  2. Packets have been working fine, perhaps through luck:
  3. There is no WiFi at the site - only cellular.
  4. The Imp was reliably reconnecting for the prior day when there was no ConnectionManager. And no point did it go 11 hours without reconnecting like it has since I last deployed this change. I really think this change messed up the reconnection
  5. If I want the most aggressive cellular reconnection possible, while still running the code to trap my during-disconnected minutely sensor readings in a persisted ReplayMessenger callback to get uploaded once there is connection, what should the structure of my code look like? Should I use ConnectionManager stayConnected, or should I omit this and let Imp handle the reconnection?

Thank you for the additional great info in your post which I will read over. I ordered another 2 imp006s but would like to get out and fix this tomorrow once and for all.

Note that you don’t get an online log line from the system (ie can just see “device disconnected” as a historical log even if the device has since reconnected).

As your other logging is only on packets being sent by the agent, if your serial packet capture code is misbehaving (which I believe it will) then - aside from the device online notification at the top - you wouldn’t be able to tell the difference between disconnected and connected-but-packets-not-being-received-due-to-serial-code.

In this case though, yes your device is offline.

The last black box indicates CELL INFO “CAT-M1”,-79,-115,87,-19, so RSRP of -115 and RSRQ of -19. Yes, that’s pretty awful for signal. It has managed to connect in the previous 24h with a tad worse (-116/-20) but that could be why the device isn’t getting back online - ie, it’s trying, but something has changed in the wireless environment that has ever so slightly worsened the signal.

You can pull this info yourself and send it to the agent (eg in your cm.onconnect handler): https://developer.electricimp.com/api/imp/net/getcellinfo

In your case I’d leave your settings as-is. They are almost certainly fine - the issue appears to be the cellular connection, not your code.

Seems like you’re aware that the signal is bad at this site and hence you may need to try and get a bit more link budget with a better antenna; how is the device powered, exactly? It may have tried to fall back to 2G if cat-M was too weak, and if so your power supply will need to be able to deal with the current peaks.

1 Like

I appreciate this helpful analysis hugo. Perhaps it is just a coincidence about it having reconnected pretty good yesterday but not in the last 11 hours when I deployed this. It’s only been 36 hours at this site.

Power supply is a 36ah 12v battery with 5v buck regulator + 50w solar panel - no worries about power.

I will:

  1. Leave ConnectionManager code the same. Are you sure my settings are currently doing the most aggressive reconnects possible?

  2. Install high gain cellular antenna. Right now, it’s just a stick antenna.

  3. Change UART code to accumulate buffer on rx(), and then check-in every 117-bytes. The 117-byte packets are sent every 0.5 seconds and only a few of them are not dummy packets (12 per minute are valid). So I assume the handler will be able to take care of me.

Again, thank you for helpfully countering my incorrect belief that my settings ruined the disconnect. I will test tomorrow and, if necessary, get back.

Let me know if you have any other recommendations. If I switch my whole network to imp, they will all be doing the exact same thing as this station.

So, when you say “5v buck regulator”, what are the regulator specs? You can jumper the imp006 breakout to run directly from the 12v battery, which would be preferred (and likely more efficient). Look at the “primary cell” jumpering.

Your settings are for aggressive connect, yes.

On (3) it sounds like you have gaps between packets, so it would be best for you to use a rx handler which looks for gaps to re-sync. Right now, if your code started mid-packet then you would be out of sync and would never regain sync. I’ll try to post some example code shortly.

1 Like

I am using this regulator, rated to 3A: https://www.amazon.com/gp/product/B07F24WGBB/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

I also have a Particle Boron connected and a negligible 17ma Teensy 3.5 connected

I believe the 3A should be sufficient, yes?

I designed this hardware before I looked at imp for the first time, but in the future, I love how the imp itself can downregulate my 12V to 5V by itself. I don’t think here it is an issue however (as long as 3A is sufficient).

Thanks also for the UART synch consultation.

I don’t necessarily trust the transient response of some of those DCDCs. 3A is fine for 2G yes, but there’s a very fast current ramp time; I’m more trusting of the on-board regulators on the breakout board :slight_smile:

Example code might look like this. Note that strings can be any length (up to memory) and hold any character including null, so you don’t need to use a blob if you don’t want to.

// Receive buffer
buffer <- "";

// Timeout used for re-sync
rxtimeout <- null;

// Callback that's fired when new data has been received
// Here, we just pull all we can from the UART FIFO and put it in a buffer.
// We only forward the packet to the upper layer once we've seen 50ms of idle
// Not checking buffer length here, but if we see more than 117 bytes, we likely
// just want to discard anyway as we'll be in sync for the next packet.
function rx() {
    // Collect data, set timeout
    buffer += uart.readstring();

    // 20ms after the last byte received we send
        if (rxtimeout != null) imp.cancelwakeup(rxtimeout);
        rxtimeout = imp.wakeup(0.050, sendbuffer);
    }
} 

function sendbuffer() {
    // Timer has fired, so null out the reference
    rxtimeout = null;

    // Process packet if it's the right length
    if (buffer.len() == 117) processpacket(buffer);

    // Empty the buffer for future data
    buffer = "";
}

// Configure UART with rx data handler
uart.configure(115200, 8, PARITY_NONE, 1, NO_TX | NO_CTSRTS, rx);
1 Like

Thank you Hugo! Are you sure the usage of a string will preserve the binary nature of the data and not try to encode it into some different format?

And I will stick with current ConnectionManager settings as well.

EDIT: Let me make sure I acknowledge how pleased I am with the quality of your support Hugo. It is quite a far-cry from Particle. I am excited to move to your platform. It sucks that you weren’t 1 year ahead of the curve with the imp006 or else I never would have wasted my time and money with the Particle Boron! With the imp006, I get:
-Way more RAM
-2G fallback
-0.1$/mb vs $0.40/mb (Particle) cell data
-Built-in-WiFi
-(Hopefully) more stable cellular reconnection

The Boron cannot touch this!

We did have the impC001 then, but different price point and performance level.

Also, we have a pile of UARTs (vs… 1? 2? on boron), lower power consumption, an RTC built-in, actual physical security even if an attacker has the device, etc etc. Hope it works out for you as it has for others.

1 Like

Frustratingly, the forum software trapped my response into “Saving” and never went through, and was unrecoverable in the temporary browser HTML view.

I now attempt to reconstruct my reply, although it won’t be as good:

I am happy to report @hugo that you were right about my code above not permanently locking me out sans power cycle, that the issue was mostly cellular signal, and that there were issues with my UART code. The same Imp006 has now been up and uploading for over 12 hours in the same spot working perfectly.

The only thing I need to do now is figure out ReplayMessenger and the connection settings so that packets received when the device is disconnected are persisted/reuploaded on the next connection.

I request your help because I find the documentation on ReplayMessenger incomplete and confusing. It does not show agent code to parse a received rm.send() message. The object that gets passed differs from what happens when you go agent.send(). I’ve tried messing with the “payload” and “data” sub-objects to no avail. I always get a syntax error.

  1. Question #1. Given the code:
    cm <- ConnectionManager({
    “stayConnected”: true,
    “connectTimeout”: 600.0,
    });
    local sfl = SPIFlashLogger()
    rm <- ReplayMessenger(sfl, cm, {“resendLimit”: 900});
    function processpacket(dBlob) {
    rm.send(“0”, dBlob, RM_IMPORTANCE_HIGH)
    }
    …will the Imp continue to process those packets and queue them for later reuploading with rm, even when the device is disconnected? Or will ImpOS shut down the imp for 9 minutes until trying to reconnect later, preventing these minutely packets from getting queued into persistence?

  2. Question #2.

Given the “rm.send(“0”, dBlob, RM_IMPORTANCE_HIGH)”, what agent code is necessary to retrieve the actual data “dBlob”? The following which works for raw agent.send does NOT work for rm.send:
function postData(data) {
//The “data” object contains our binary data
}
device.on(“0”, postData);

I am really liking the ElectricImp and excited to move forward with it.

I’m not an expert on replaymessenger (read: I’ve never used it myself), so paging @terrencebarr @zandr who can likely help.

1 Like

Hi Paul,

Sorry for the slow reply, I was out for a few days. Will look at your question tomorrow.

Best,
– Terrence

1 Like

@terrencebarr Thank you Terrence!

Hi Paul,

It’s been a while since I used ReplayMessenger so I hope I get the details right. To answer your comments/questions:

ReplayMessenger is mostly a superset of the device-side MessageManager and is compatible with the agent-side MessageManager, so most of the MessageManager documentation also applies to ReplayMessenger. For example, both device-side MessageManager and ReplayMessenger wrap the payload into a Messenger.Message object, which contains sequence numbers and other things beyond the payload you send back and forth with agent.send()/device.on().

ReplayMessenger uses ConnectionManager, and by default (the way you initialized ConnectionManager) it uses RETURN_ON_ERROR, which means the Squirrel application continues to run if the connection fails, and ReplayMessenger will continue to accept messages and queue them. When ConnectionManager manages to reconnect, it will call ReplayMessenger _processPersistedMessages() to unspool messages to the MessageManager partner (agent).

Note that the code for MessageManager and ReplayMessenger is available on github, so when in doubt you can consult the code. E.g.:

See above, ReplayMessenger and MessageManager wrap the payload into a MessengerMessage object, so on the agent your MessageManager onMsg() callback receives a MessagePayloadTable

https://developer.electricimp.com/libraries/utilities/messenger-replaymessenger#message-payload-table

So to access your data you would use msg.payload.data.

See also the examples folder https://github.com/electricimp/Messenger_ReplayMessenger/tree/master/examples
e.g.
https://github.com/electricimp/Messenger_ReplayMessenger/blob/master/examples/ReplayMessengerExample.agent.nut#L69

Hope this helps.

Best.
Terrence

1 Like

Thanks you Terrence! I will take a stab at implementing this. This library is really great.

PS: Feel free to contact me at tbarr [at] twilio.com for any questions you have that go beyond the forum.

1 Like