HTTP requests by agents

ctmorrisonhvacsp · July 24, 2014, 4:02pm

What’s the likelihood that HTTP requests by an agent are being delayed/batched? The cloud service I’m using, GroveStreams, is alerting that it’s randomly getting PUTs faster than 1 every 10 seconds. My agent code (well-tested and observed) clearly indicates it’s posting every 60 seconds (driven by messages from devices via imp.wakeup). However, I can’t watch it 24x7, obviously. Thoughts?

hugo · July 25, 2014, 12:59am

When you get the result callback from an async operation (or the sync operation completes), it’s completed: the HTTP request has been made and you have the result code.

Is it possible grovestreams are looking at requests from our server IP addresses vs requests on your account?

ctmorrisonhvacsp · July 25, 2014, 8:05am

Whenever I’m watching the IDE log, the server.log messages with the request and result code are immediate. That tells me, as expected, EI is pushing the request immediately. What I can’t see is if things are delayed for any reason if the IDE is not watching a model. In other words (and yes, I’m reaching), does having the IDE watching a model in any way effect the timing of HTTP requests?

The way they meter is timing PUTs made to a specific GroveStreams component from a specific IP address (& x-forwarded-for header if it’s included, which I am doing). They’re suggesting the calls are possibly being buffered somewhere, but I’m having trouble with that concept, unless it’s on their end.

BTW, they’re great people to work with, so I trust they’re not simply throwing out ideas to cover up a problem. I also feel the same about you guys!

hugo · July 26, 2014, 1:22am

Yeah, these are definitely not buffered. That would be… hard - especially returning the result from a call that hasn’t happened yet!

Wondering if their metering is actually not doing what they think it is. If they want to contact us to help debug we’re happy to help.

ctmorrisonhvacsp · July 26, 2014, 9:48am

Thanks for your feedback. The curious part is that we’re not seeing this issue with the monitoring devices that use another platform. It seems to be associated with the Electric Imp devices for some unknown reason. I’m still in a dialogue with them and will let you know what we find. I passed along your offer of assistance.

hugo · July 26, 2014, 1:03pm

Well, the “unknown reason” could well be that there are many other users on the imp platform pushing data to there - ie, same source IP (the developer server farm is small) - but not for the other platform.

Looking in our logs, I can see almost 1000 grovestreams accesses per hour (ie, one every 3-4 seconds). Xively is running at ~50x that rate without issues.

ctmorrisonhvacsp · July 26, 2014, 5:12pm

Within Grovestreams, users create “organizations” and within organizations you can have multiple “components”. Their metering is relative to the number of calls to a particular component from the same IP address (same IP and x-forwarded-for header, if one is included). All of my models create a unique x-forwarded-for address and they all are speaking to different components usually within different Grovestreams organizations. We’re seeing issues where there’s only one device pushing data once per minute, but they’re detecting pushes occasionally within 10 seconds.

My devices are likely responsible for 600 of the 1000 calls per hour–about 10 sites pushing data once per minute.

hugo · July 27, 2014, 3:07am

I can’t comment further apart from there are that many accesses per hour going on, and we are not buffering any outbound requests.

As I said, it sounds like grovestreams have some issues with their rate limiting code not taking notice of the user/organization - ie, the problem is, in my opinion, certainly on their side.

ctmorrisonhvacsp · July 27, 2014, 10:30am

Thanks and I agree.

ctmorrisonhvacsp · July 28, 2014, 2:55pm

I may have had a wee bit of an epiphany (for me!) about the 10 second Grovestreams violations I’m seeing.

My code does a device.wakeup() to call the main function once every 60 seconds. The device.wakeup() is at the beginning of the function. I’m now thinking that the agent.send() at the bottom is a blocking call that causes the device to sit and wait until the device comes back online if it ever goes offline. Since this is a cellular connection, it goes offline with some frequency greater than a normal LAN connection. When it comes back up, it does the desired send followed almost immediately by the next one, since it’s been waiting so long.

Does this explanation make any sense? If so, I might want to introduce a device.isconnected() check before calling the agent.send() and simply bypass it if the device is not online.

ctmorrisonhvacsp · July 28, 2014, 3:03pm

Duh…can’t use device.isconnected() in the device code. Hmmm…

Guess I should use server.isconnected() instead.

ctmorrisonhvacsp · July 29, 2014, 10:10am

Well…after too many hours of working with this and wasting readers’ time, I’m pretty confident I did finally stumble across the root cause yesterday.

My code is fundamentally:

function bigLoop() { device.wakeup(60, bigLoop); //read the inputs //do some calculations and build the gsString to be sent to GroveStreams agent.send(sendToGS,gsString); }

What I failed to realize until yesterday is that the agent.send() is a blocking call and will halt the program until the connection to the server is restored. Since the device.wakeup() has already scheduled the next run, it will occur 60 seconds after it was called. I don’t know if that scheduled call occurs if the agent.send() is still waiting. However, it’s certainly the case that the agent.send() will eventually proceed, often within 10 seconds of the next scheduled function call, causing a GroveStreams rejection of the PUT.

What I did was change the last line of the function to:

if (server.isconnected()) { //ignore sending this update if the connection is down agent.send(sendToGS.gsString); }

This has eliminated the issue and still gives me a nicely regulated feed. I could have put the imp.wakup() call for the next run AFTER the agent.send() call, but having it at the top of the function results in a very consistent frequency, which I like.

To be clear, this was not a GroveStreams problem, but one resulting from my code. That being said, GroveStreams is re-examining how they meter feeds to make it a bit more tolerant while still protecting their platform from an excessive feed (my perception, not their statement).

hugo · July 29, 2014, 6:58pm

That’d do it! Glad you worked it out…

ctmorrisonhvacsp · August 13, 2014, 8:17am

Hmmm…so I thought I had quashed this issue, but it seems to persist. I’ve introduced code to check server.isconnected() prior to calling agent.send(). However, I’m continuing to see instances of what appear to be delays in agents receiving messages from devices.

My code is basically:
function poll() { imp.wakeup(60,poll); //do some data gathering if (server.isconnected()) { agent.send(data); } }

The above approach was implemented to make sure the device didn’t get delayed waiting for a connection to come back up and then possibly have two agent.send()s within 10 seconds of one another, causing an error at GroveStreams (10 second PUT violation).

When the agent receives the message from the device, it immediately sends a PUT to GroveStreams.

If the agent receives a 403 return code from the call to GroveStreams, it’s sending a message to the device to introduce an 11 second delay. This code is likely no longer required, but still functions and clearly indicates the issue.

Here’s the log showing the agent log entries, clearing indicating it seemingly received two messages from the device, one right after the other. I don’t know how this is possible and I hate to beat this issue to death, but I’d like to know what I’m doing wrong (or if it’s still possible for the agent.send() to complete and yet the message to the agent be delayed (or buffered) until the next call (obviously, component id and api key obscured):

2014-08-13 06:38:06 UTC-4 [Agent] GS URL=http://grovestreams.com/api/feed?compid=x&api_key=xxxx&RT=71.2&ST=61.7&FM=1&FP=1&C1=0&C2=0
2014-08-13 06:38:07 UTC-4 [Agent] HTTP response: 200
2014-08-13 06:39:06 UTC-4 [Agent] GS URL=http://grovestreams.com/api/feed?compid=x&api_key=xxxx&RT=71.1&ST=63.3&FM=1&FP=1&C1=0&C2=0
2014-08-13 06:39:07 UTC-4 [Agent] HTTP response: 200
2014-08-13 06:41:06 UTC-4 [Agent] GS URL=http://grovestreams.com/api/feed?compid=x&api_key=xxxx&RT=71.0&ST=63.0&FM=1&FP=1&C1=0&C2=0
2014-08-13 06:41:06 UTC-4 [Agent] HTTP response: 200
2014-08-13 06:41:06 UTC-4 [Agent] GS URL=http://grovestreams.com/api/feed?compid=x&api_key=xxxx&RT=71.2&ST=62.5&FM=1&FP=1&C1=0&C2=0
2014-08-13 06:41:16 UTC-4 [Agent] HTTP response: 403
2014-08-13 06:41:16 UTC-4 [Agent] 403 error detected
2014-08-13 06:42:18 UTC-4 [Agent] GS URL=http://grovestreams.com/api/feed?compid=x&api_key=xxxx&RT=71.0&ST=56.7&FM=1&FP=1&C1=1&C2=0
2014-08-13 06:42:19 UTC-4 [Agent] HTTP response: 200

The 403 is confirmed by a violation being logged at GroveStreams.

The code works flawlessly on several device-agents 99.999% of the time.

By the way, the agent logs if the device disconnects/connects and there are no entries for those events.

hugo · August 15, 2014, 10:02pm

How is the device connected? It’s always possible that the connection has dropped and the imp tries to send, it times out, and the connection is then silently re-established.

You could determine this by using RETURN_ON_ERROR and a short timeout on sending the messages - agent.send will then return an error if it didn’t manage to send during that period.

If you have a device ID then I can look in the logs at that time period and see if indeed the imp reconnected.

coverdriven · August 17, 2014, 5:06am

That reminds me, @Hugo, do you think agent.send and device.send will ever accept a callback function to report (completion/error)? I appreciate that resources are limited in the device.

ctmorrisonhvacsp · August 17, 2014, 8:12pm

The device ID is 20000c2a6906bc99 and it just occurred, again, at Sun, 17 Aug 2014 23:15:16 GMT (7:15:16 EDT). I appreciate the continued support as I try to figure this out. The agent it not “seeing” any disconnect/reconnects.

BTW, the device is connected via a cellular modem and the connection is not solid. That being said, I have seen the same scenario with two different devices on cable service.

hugo · August 19, 2014, 12:46am

@coverdriven agent.send, if used in return_on_error/wait_for_ack will return an error if the message was not ACKed by the server in the specified timeout. If you use wait_til_sent then it’ll return as soon as the message is queued to send - but obviously there’s all of wifi and the internet between that point and the server.

If you want a positive acknowledge from the agent, you can pretty easily send one?

@ctmorrisonhvacsp that device connected 108 times on the 17-18th. You should have seen an offline notification at 22:13:29 UTC (I’m assuming you mean 23:15 BST?). It reconnected at 22:13:55 UTC, and stayed connected until 22:29:52 UTC. From what I can see, offline and online notifications should have been delivered to your agent for all of these - are you seeing any of these notifications at all? (they’re definitely appearing in your device logs).

The main takeaway is that your cellular connection appears to be kinda flakey, though…

coverdriven · August 19, 2014, 6:19am

@Hugo, thanks, but wait_for_ack is blocking and the device has too much to do in the meantime. My strong preference is to use wait_til_sent in all cases. I do use positive acks for all agent-originated traffic at the moment, but not the other way round. It was just my thinking that it would be more consistent with your api, if agent.send and device.send supported a callback (a la server.connect and httprequest.sendasync)

ctmorrisonhvacsp · August 19, 2014, 7:58am

@Hugo, I see lots of disconnect/connect log entries, but not at the time when the 403 occurred (the agent did not post any log entries). That’s the part that’s confusing me. In my device code, I check server.isconnected() before I call agent.send(). I guess it’s possible the server could go offline in between those two calls. Otherwise, I’m clueless as to why I’m still getting the apparent delayed messages to GroveStreams that end up being less than 10 seconds from the subsequent call.