Accurate Periodic Timer

Hi guys,

This one has been niggling at us for a while and is perhaps the most useful single thing for us that could be added to the imp platform:

Originally to perform periodic actions we’d do something like this:

function myFunction()
{
myTimer = imp.wakeup( 0.99, myFunction );

// Do periodic stuff
}

We found that adjusting 1.0s to 0.99s helps to avoid overshoot but fundamentally we see very large drift over time, in some instances where we’ve been uploading readings to a server, we drop a reading every 7s, presumably as it has drifted into a different 1s time bucket, with contributing factors such as under-the-hood interrupts and other routines running at this time.

We’ve found much better results by applying corrective action using the hardware.millis() timer:

function myFunction()
{
local currentTime_ms = hardware.millis();
local nextInterval_ms = 1980 - (currentTime_ms - lastSampleTime_ms);

if( nextInterval_ms > 0 ) 
{
    myTimer = imp.wakeup( 1.0 * nextInterval_ms / 1000, myFunction );
}
else
{
    myTimer = imp.wakeup( 0, myFunction );
}

lastSampleTime_ms = lastSampleTime_ms + 990;

// Do periodic stuff

}

This is obviously really unideal and a lot of wasted code but it means we drop a reading only after many minutes rather than a few seconds, the drift is much better.

Would it be possible to obtain an api call for periodic timers that effectively does this under the hood?

Many thanks

If you are using the accurate counters on the imp to adjust your timers, a couple of opportunities for improvement:

  • the big one is to do your periodic stuff FIRST. If there is any variability in your processing time based on whatever Your code is doing, your current timer implementation misses it.
  • you could also you hardware.micros (probably won’t make a difference due to the accuracy of imp.wakeup but it won’t hurt either) as well as cache local references to all of your time based functions to tighten up that portion of the code so that the processing time of figuring out how long to sleep has as minimal overhead as possible
  • Finally - I believe that @peter has indicated in the past that if/else statements without curly braces are actually somewhat more performant than those with them do you may want to drop those.

For measuring processing overhead without impacting the imp module with a bunch of profiling squirrel code I’ve found that asserting / deasserting a pin and watching things on a logic analyzer or scope to be very helpful.

Hopefully some of those tips help!

Hi @deldrid1,

Thanks for responding, the above was just an example to explain the issue as simply as possible.

We already do periodic stuff first, but the issue is when you have multiple things scheduled or a lot of network traffic you get jitter outside of the function itself and it’s pretty bad - there’s nothing that we can do at the Squirrel level to solve the issue, we really do need a periodic timer feature implemented at the C level for any consistent level of accuracy.

Incidentally we have a benchmark script so I added your mention of if/else statements to the list but we couldn’t measure any difference in performance on the Imp005.

For your or anyone’s interest, a few benchmarks (on Imp005):

NewSlot vs Set vs Rawset (x100,000)

Using NewSlot only: 248281us / 248ms (1.00)
Using Set only: 256723us / 256ms (1.03)
If statement to determine NewSlot vs Set: 444657us / 444ms (1.79)
Using Rawset only: 1752268us / 1752ms (7.06)

Clone vs NewArray (x100,000)

Using Clone: 739us / 0ms (1.00)
Using NewArray: 877us / 0ms (1.19)

Append Array vs Set Preallocated Array (x50,000)

Using Preallocated Set: 85us / 0ms (1.00)
Using Append: 754us / 0ms (8.87)

Array Append vs Array Push (x100,000)

Using Append: 1459us / 1ms (1.00)
Using Push: 1510us / 1ms (1.03)

If vs Regexp for 5x Typeof comparisions (x100,000)

Using regexp: 1636us / 1ms (1.00)
Using cached if comparison: 2629us / 2ms (1.61)
Using if comparison: 4088us / 4ms (2.50)

Switch Statement vs Table Lookup vs Array Lookup (x30,000)

Array Lookup (top): 1196270us / 1196ms (1.00)
Array Lookup (even): 1206878us / 1206ms (1.01)
Array Lookup (bottom): 1208118us / 1208ms (1.01)
Array Sparse Lookup: 1213264us / 1213ms (1.01)
Table Lookup (even): 1241312us / 1241ms (1.04)
Table Lookup (bottom): 1241757us / 1241ms (1.04)
Table Sparse Lookup: 1247864us / 1247ms (1.04)
Table Lookup (top): 1253271us / 1253ms (1.05)
Switch Lookup (bottom): 1322331us / 1322ms (1.11)
Switch Lookup (even): 2503497us / 2503ms (2.09)
Switch Lookup (top): 3669185us / 3669ms (3.07)
Switch Sparse Lookup: 3980175us / 3980ms (3.33)

Try vs In (x100,000)

Using In when key exists: 198529us / 198ms (1.00)
Using Try when key exists: 237068us / 237ms (1.19)
Using In when key doesn’t exist: 2005639us / 2005ms (10.10)
Using Try when key doesn’t exist: 3955854us / 3955ms (19.93)

Empty Loop vs Empty Cached Loop (x100,000)

Using cached loop: 75561us / 75ms (1.00)
Using uncached loop: 222308us / 222ms (2.94)

Foreach Loop vs Cached For Loop vs Cached While Loop (x50,000)

Using a foreach loop: 34237us / 34ms (1.00)
Using a cached while loop: 63677us / 63ms (1.86)
Using a cached for loop: 64826us / 64ms (1.89)

Class vs Static Class (x100,000)

Using a static class: 1382263us / 1382ms (1.00)
Using a class instance: 1383494us / 1383ms (1.00)

Brackets vs No Brackets (x100,000)

Using no brackets after an if statement: 151316us / 151ms (1.00)
Using brackets after an if statement: 151388us / 151ms (1.00)
Using no brackets after an if/else statement: 155971us / 155ms (1.03)
Using brackets after an if/else statement: 156120us / 156ms (1.03)

Note that there is never going to be a fully accurate timer in squirrel; implementing it as a language primitive isn’t going to help if your VM (or the OS) is busy when the timer expires - the VM is not re-entrant.

Do you collect stats of expected wakeup time and actual? (eg pass expected time in hardware.millis to the routine and compare it with actual time).

If you see several seconds of delay then you are likely blocking somewhere in your code - you mentioned when sending large amounts of data to the server, for example. The imp will generally only block in a couple of places: sending to the server (max block time is the one you specify as a send timeout) and any time passed to server.flush() - there are other ones but they’re more esoteric, like if you’re booting wifi on imp003/004 in the background and your code is trying to use the SPI flash at the same time.

First thing to look at is your send buffer size - it’s very small by default.

The second thing is to use non-blocking modes. You can send data upstream in a fully non-blocking way using RETURN_ON_ERROR_NO_DISCONNECT - you’ll get a wouldblock from the send call if it can’t fit into the buffer within the send timeout (which you can then set to zero). This results in no blocking at all from upstream data transfer. Minimal example here: https://gist.github.com/hfiennes/56b9c359479b7d13245956f3d573f357

Hi Hugo,

We don’t strictly need a perfectly accurate timer (though this is something we miss - it would be interesting to see support for syncing an external sub-second RTC), however we do need one that doesn’t drift AKA a periodic timer.

Ignoring the individual jitter of each imp.wakeup(), the issue we suffer from is a delay in scheduling the next imp.wakeup(), a primitive periodic timer could efficiently meet this requirement. For example, we ask to wakeup after 1s but wakeup after 1.1s due to blocking or busy code, 10 seconds later and we’ve drifted a whole second. So instead we keep having to realign to the millis() count and reduce our wakeup time each time to compensate - which works, but could be performed much better and efficiently by a primitive periodic timer.

We do have some blocking code and we have some genuinely busy loops that add jitter, in both cases we can’t avoid this, the blocking code is due to I2C accesses which I don’t believe have a non-blocking option unless we really slow down the transfer rates by inserting imp.wakeup()s, and our other code is necessary as part of the functionality of the device and it’s not that we’re hitting 100% utilisation, it’s just that it runs long enough to add some jitter.

Many thanks

@kiwitech
Have you considered outputting a PWM signal from one pin into a pin that generates a change of state interrupt? I imagine that the PWM output is divided down from the oscillator and will be quite reliable.

Also, your stats are really helpful. Although, I’m not sure that I agree with all of them.

Squirrel seems to pre-allocate table slots and array spaces in chunks to mitigate constantly having to resize. This may have a bearing on performance of newslot and push/append.

If you want fast array operations, it absolutely makes sense to pre-allocate and just maintain an integer pointer to where you’re writing.

I was surprised by your switch lookup figures. If this is right, does this mean that cases in a switch statement are tested from bottom to top?

I use try/catch in lieu of if (“x” in y) quite a bit and try/catch is definitely faster if you expect to find the key. I agree that try/catch takes longer if the key is not present.

Hi @cvrdrvn,

That’s certainly not a bad shout, it probably would do the job, sadly we don’t have the capability to modify our hardware as it’s gone through a lot of costly certifications.

You can find our benchmark script here for full scrutiny (it’s a VSCode project or just copy and paste src/device.nut), feel free to raise pull requests if you wish to have anything added, we’ll happily do so: https://github.com/kiwipower/squirrel-benchmark

It might be worth reminding that we do also have a full Squirrel test framework with a lot of the Electric Imp API stubbed out and a Mocking class, which can be found here if it’s at all of use: https://github.com/kiwipower/nutkin

We also have a fork of the official Squirrel branch that mimics Electric Imp’s bindenv() strong refs and a few hex string behaviours (there’s plenty of differences still but these are the core ones for running most EI code): https://github.com/kiwipower/squirrel

Sorry, to be clear switch statements are evaluated top to bottom, in the context of the description it actually meant the bottom range of values that were being switched against. From memory the preallocation of slots doubles each time, but has been a while since I looked at the source.

Many thanks