Agent returning HTTP 404 error

budredlight · November 19, 2018, 11:28am

Hello All,

We have an agent that returns a HTTP response with basic status information (our agent version, whether the device is online, etc.).

In production, after an end user does a blink-up (no plan id so a new agent is created), a HTTP client request to the agent sometimes returns a 404 error immediately following blink-up. We suspect this is because the agent takes a moment to spin up and be ready to process requests. It is happening frequently enough to be of concern.

Just wondering if we can do anything or wait for any amount of time to ensure the agent is ready? Are there any upper limits on time taken for the agent to be ready? Is this dependent on load on Electric Imp backend and if so is that customer specific?

Any help would be appreciated!

Thanks,
Anthony

hugo · November 19, 2018, 7:06pm

If the agent isn’t up yet, then this means the imp is doing an OS upgrade; if you’re getting a 404 then just keep polling for a minute or so (OS upgrades take between 10-30s usually).

budredlight · November 19, 2018, 9:33pm

Thanks for the quick reply. This makes sense.

But what are the circumstances for an OS upgrade to occur? Reason I ask is that we have many devices that have been online but are doing an OS upgrade only now. Is it (for example) only done when a new agent is created?

Thanks!
Anthony

hugo · November 19, 2018, 9:47pm

OS updates (to production devices) aren’t that often; impOS 38 is going to be going out shortly but release 36, the previous production release, has been out around a year, so that doesn’t appear to explain what you’re seeing.

The more detailed answer is that the agent is created only when the device reaches its production server, which is generally only a few seconds after the blinkup app gets a response - except in the case of an OS update, where it will be told to upgrade, then comes back to the welcome server and gets sent to its assigned production server. The actual time will often depend to some extent on the local network the device is connected to, though.

Do you retry at all right now? If you have specific device IDs and time periods where you’ve seen this, then PM them and we can tell you exactly what happened there and how long the delay was.

budredlight · November 19, 2018, 10:37pm

Thanks for the reply. I’ve sent you a PM with some device IDs and timestamp where the issue has occurred.

budredlight · November 19, 2018, 10:56pm

And sorry to answer your question, we do retry at the moment. Currently it is 5 retries with exponential backoff so it will wait around 8s before the last attempt.