Time interval between SPI transactions is too high ~400us

shatruddha · July 19, 2014, 4:23pm

I’m reading sensor data over SPI and writing the same to a SPI Flash. I was getting overrun error in my sensor, on further debugging, i found that SPI transactions are taking too long to process. I’m configuring SPI at 4000 kHz, so this will be floored down to 3750 kHz. At this frequency, A single transaction should take ~2.1 us, which is further confirmed by my logic analyser. But time it takes to start another transaction is ~400us. I’m sending an array of blob to spi.write().
I’m attaching my waveform snapshot for further reference

shatruddha · July 19, 2014, 4:25pm

another snapshot for multiple transaction

shatruddha · July 19, 2014, 5:12pm

For a single transaction itself, there is too much delay in between. For example for the following code
cs.write(LOW);
if (num_bytes > 1) r = r | 0x40;
r = r | 0x80;
b.writen(r, ‘b’);
spi.write(b);
data = spi.readblob(num_bytes);
cs.write(HIGH)

If I take a look at whats happening at hardware level, (take a look at the attached image). CS gets lot at 5.285ms. spi.write(b) happens at 5.391ms more than 100us after deasserting CS where as there are only 3 lines of code in between, on a normal microcontroller it would take 1-5 us to do that. Further spi.readblob() starts at 5.466ms and cs.write(HIGH) is happening at 5.547ms where actual readblob transaction takes only 12us.
Why is it taking too much time in between and is there a way to reduce this delay.

shatruddha · July 19, 2014, 6:00pm

Adding to above, I’m running this code with server.disconnect() so I do not expect delay happening because of background internet activity

peter · July 20, 2014, 10:30am

Here is a thread about optimising a fairly similar construct, which ends up getting it down to 11ms for 60 readings or about 200us each, though that’s a burst rate and not continuous. Basically the technique is to remove as much computation as possible from the critical path: remove maths by pre-calculating needed values, and remove lookups by pre-calculating target functions using bindenv().

Open-coded Squirrel runs in a bytecode interpreter, and so is never going to be quite as fast as assembler or C on “normal” microcontrollers. Often (though admittedly not always) it’s possible to arrange for the really time-critical things to be done below the Squirrel level – such as with the pin-triggered pulse generator, or the sampler.

Peter

shatruddha · July 21, 2014, 7:44am

Hi peter,
Looks like last waveform that I uploaded was not successfully uploaded. Take a look at the waveform. may be my argument will make more sense after that.

peter · July 21, 2014, 8:17am

That waveform is about what I’d expect to see from the code you posted, yes. Is the problem you’re having with it just the overall speed, or is your device more picky about things like the time between CS going low and SCK beginning to toggle?

Peter

peter · July 21, 2014, 8:27am

As an example, here’s that snippet of code optimised using the techniques described in that other thread:

`
local cswriter = cs.write.bindenv(cs);
local spiwriter = spi.write.bindenv(spi);
local spireader = spi.readblob.bindenv(spi);

// ^ All the above can (should) be done just once for multiple reads

if (num_bytes > 1) r = r | 0x40;
r = r | 0x80;
local rchar = r.tochar();
cswriter(0);
spiwriter(rchar);
data = spireader(num_bytes);
cswriter(1)`

Peter

shatruddha · July 21, 2014, 4:56pm

Thanks Peter
let me try it. But even after doing that, will I be able to reduce time delay between the two SPI transaction and the one between last SPI transaction and pulling CS high. I doubt that.
Issue that we have is, My sensor has an inbuilt FIFO, which can store upto 32 samples. we read all 32 of them simultaneously to save power. But by the time we read 32 (say its taking 200us per sample read with the optimized code, for reading all 32 it will take 6.4ms and my sansor is filling FIFO at 5000 Hz i.e. it will fill all 32 FIFO in 6.4ms. It gets very messy for us, since we wont even get time to store the data correctly.
Is there a way by which whole of this can be reduced to say less than 20us. Some patch or some api.

hugo · July 22, 2014, 12:48am

The obvious solution here is to read the entire FIFO in a single SPI read transaction, which you can then do at the maximum speed. Which sensor are you using?

shatruddha · July 26, 2014, 12:34pm

I’m using adxl345. I’m doing burst read of all 6 registers, but to read all 32 fifo, I’ll have to re-initiate the transaction.
thats what is taking time.

hugo · July 28, 2014, 1:02am

Yeah, that looks like to be a very bad design decision from the designers of the chip; it should have auto-wrap on the fifo registers so that (eg) if there are 10 entries in the FIFO you should be able to just read 60 bytes and get them all in a single transation.

However, that isn’t the case - you’d need to use Peter’s code and loop tightly to read every FIFO entry after first determining the number of valid entries in the FIFO by reading FIFO_STATUS.

Should be less than 400us though by quite a way. Did you try his code?

shatruddha · August 9, 2014, 2:54pm

Hi Hugo,
I tried the code that peter suggested, it brought down the overall transaction timing from 284 to 217 us, but still there is huge amount of time wasted between CS getting deasserted and actual spi transaction happening.
Take a look at the waveform

hugo · August 11, 2014, 6:49pm

You could still make it faster by using writeread() (so that it’s a single transaction for addressing and data) but I’m afraid that’s as fast as you’re going to get in the VM. The time there isn’t “wasted” - the imp is busy executing your bytecode - that’s just as fast as it goes at this point in time.

montu17 · August 12, 2014, 2:33am

Hugo/Peter, this is a critical issue for us (Shatruddha and I work together). We are building an industrial sensor and need to be able to sample at pretty high rates. Is it possible for you to create a native code hook so that we can sample on that SPI pin in native/C code instead of Squirrel? Or may be you could provide a SPI sampler similar to your hardware.sampler?

Seems like the Squirrel byte interpreted language is not the best choice given the processing cycles it takes up. Also, seems like it’s consuming more battery power than if we were some other compiled language.

Thanks,
Abhinav

peter · August 12, 2014, 4:15am

I think you’re right, I think that if the 20us quoted upthread is a hard requirement, Squirrel as a bytecode-interpreted language is not the best choice. You’re going to need to use an alternative SoC, either in addition to the imp (still benefiting from the imp’s cloud service, WiFi config etc) or instead.

I’m just a bit concerned that your problems won’t be over once you’ve got those 50,000 bytes of data per second into the imp. What happens to those bytes then: do you stream them all up to the cloud, or do further calculations on them in the device? Either way you might just end up hitting a second bottleneck.

Having a bytecode-interpreted language on the imp is important mainly because of the sandboxing it provides: errors or exceptions that happen in Squirrel can be safely captured by our improm and neatly reported in the IDE, even when debugging remotely over WiFi after deployment. You wouldn’t get that with a compiled-to-native language such as C – it would just crash – which is why debugging of traditional embedded systems tends to take place six inches away over JTAG rather than half a world away over WiFi and a cloud service.

Peter

coverdriven · August 12, 2014, 7:21pm

There’s a point where you’ll probably need to bite the bullet and consider a dedicated micro to do this. This is an easy job for a PIC32 or PIC24 to do this for you. They have plenty of RAM to capture everything and you can pass the stuff through to the imp at your leisure via a UART after sampling is complete. I imagine there is already plenty of code flying around for interfacing an adxl345 to a PIC. The PIC can then control power to the imp, further reducing your battery use.

As a final thought, have you considered using SPI FRAM instead of SPI FLASH? Lower power and much faster write times.