Need help solving loss of connectivity

I’m having an intermittent problem (always the most fun) which has got me baffled. I don’t know where to look for the error (I’m assuming coding). Perhaps someone has run into something similar and can give me a suggestion?

First the scenario:
Imp3 collecting sensor data via LoRa radio connection to sensor arrays.
Collection takes place on regular intervals and is very predictable.
Sensor array turns on its radio and sends RTS to Imp and Imp sends CTS to sensor node.
Both Imp and sensor switch to private channel on LoRa.
Data is exchanged on private channel to prevent crosstalk.
Sensor shuts down radio after exchange and Imp goes back to listen for next RTS on public channel.

This works smoothly and consistently for several days, sometimes longer. I sometimes get incomplete data records (checked by CRC) where I don’t think I should. Although I’m chalking that up to interference for now, it shouldn’t be and could be related.

Then at some point the Imp stops receiving RTS requests, but everything else seems to work fine.

I’ve tried reconfiguring the radio itself by sending parameter command string. NO EFFECT.
The radio responds correctly, reporting its parameters are set as requested.

I’ve tried reconfiguring the radio UART when this is detected, NO EFFECT.
uartDM is configured to 9600, 8, PARITY_NONE, 1, NO_CTSRTS. The radio reports its parameters are set as they should be.

I’ve tried shutting down and restarting the radio when this is detected assuming the radio itself was hangning, NO EFFECT.
Although the uart is not reconfigured this time, the radio reports the correct settings are configured.

The above can be done multiple times with no effect on the problem.

You might be thinking, ‘but of course, it must be your sensors have stopped transmissions’. BUT, when I restart the Imp, everything goes back to normal.

To prevent regular reboots, I’d really like to narrow down what’s happening. But don’t know where to look. The Imp’s ID is 30000c2a690be27f. Perhaps there is something in the logs to might help.

So when you say “restart the imp”, is that a power cycle? Does that also power cycle the LoRA radio? Your comments make it sound like the radio itself is still responsive on the UART interface - implying the imp is doing its job just fine - but the LoRA radio has got itself in some strange state.

I very much recommend having power switches on anything that the imp can control (eg the LoRA radio). Which radio are you using?

Yes, ‘restart’ = power cycle via imp central. That also seems to cycle the radio. But to be sure I was fully cycling the radio, I added a load switch (TPS22918) by which I can control power to the radio and cycle it independently and completely. I use the same load switch configuration at the sensor end to conserve power between transmission cycles.

I know the radio remains responsive to the uart because the config commands and responses use the same interface. The radio supplies the correct response to my command string showing a properly functioning uart interface both TX and RX.

So, like you, I assumed the radio was hanging somehow. added the load switch, deployed, and assumed I had control of the problem. NOT! I’m going to try increasing the power down time, but not holding out much hope.

The radio is a Dorji DRF1278DM which has done a nice job for me so far. It’s based on Semtech SX1278.

A restart from impCentral is not power cycling anything on the imp side (unless a restart of your code does this).

If you’re power cycling the radio, ensure that ALL the IOs to the device are driven low before you pull power to prevent the radio being powered by (eg) the CS, CLK or MOSI lines (current will leak through the esd diodes). If this is happening you’ll see a low voltage on the radio side of the power switch when it’s off.

One other thing would be to take regular dumps of ALL SX1278 registers and see if there’s a critical bit flipping when it goes wrong.

Thanks for the idea, Hugo.

I hadn’t thought of parasitic power leaking to the radio, keeping it in an active state. I’ll try writing the tx and rx pins low before cycling the load switch and see if that does the trick. Will I also need to change the uart config, or can I leave that as is?

Your suggestion seems logical though. When I do the ImpCentral restart, I’m assuming these I/O pins go low as part of the process.

As to the SX1278 registers, I don’t have access to those at this point. The Dorji unit takes care of all of that. But if this new radio cycle method works, I’ll be contacting Dorji since I’ll know for sure that the radio is the source of the error. They’ve been good about support as long as I can identify the problem.

When there’s a restart, the pins are tristated. This will likely improve matters a little in terms of leakage, yes, though there’s no substitute for actually driving low.

btw, there is an LoRA driver here, for the HopeRF modules (which I think used the 1278 or a clone of it); these are super-cheap

Not sure why this is not yet in a release state, at first glance it appears complete.

Unfortunately, writing the tx,rx pins to 0 before power down did not do the trick. I’m not understanding why doing an imp reset works, but killing the radio for a second does not. I’ll keep tinkering.

Thanks for the HopeRF reference. That looks like a very good alternative for the Dorji I’m using (almost identical characteristics) but with greater control. Also excited that a driver has already been written.

So my guidance before was on the assumption that your module was the SPI one; now I see it’s a serial device, I suspect you’re just hitting a bug in their internal firmware which is essentially translating serial commands to SPI ones for the LoRA chip…

If build & run in impCentral fixes it then I suspect re-sending the init sequence of UART commands to the module will also fix it, as that’s all that would happen in that event.

If you want to PM code we can take a look.

Yes, it is the serial module which is why I suspected, as you suggest, that I could just resend the uart commands. Pretty much the first thing I tried. I get the proper response from the radio, but it still does not respond to incoming transmissions. Then I tried the radio power-down routine followed by the uart commands (assuming the radio’s buffers have been wiped). But still nothing changes. Only when I do the imp reset do things go back to normal.

The device is at a remote location, so I’m somewhat limited with what I can play with physically.

Thanks for offering to look at the code. I’ll dink around with it a little more to see if I can find something. But I might take you up on that if I can’t find it.