When agent code upload has an issue, nothing is reported

saferaging · July 26, 2013, 10:12am

There needs to be a lot more details provided when an issue is identified within the IDE – currently the system just provides zero output and information on any issues. If there is an issue either in the device WHEN CONNECTED or in the agent when receiving data there appears to not be any indication that an issue occurred or what the issues is. I have experienced this numerous times and since there isn’t a great local testing/debug environment, this quickly becomes very frustrating. Any tips?

saferaging · July 26, 2013, 10:16am

Furthermore, there isn’t a great way to get the devices out of this state. An atomic reset function that would clear the system and make it easier to test in isolation. For now I am assigning to a new model, but that seems to not work reliability for many reasons (can document).

DolfTraanberg · July 26, 2013, 10:18am

do you have an example of such an issue and what is an atomic reset function?

saferaging · July 26, 2013, 10:22am

Imp Mac: 0c2a69003664 would be a good example.

saferaging · July 26, 2013, 10:24am

Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): Power state: asleep=>online Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): Agent Started Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): Downloading new code Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): Device configured to be "CONTACT" Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): connected Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): sleeping until 1374933353000 Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): no handler for agent.send() Fri Jul 26 2013 09:57:43 GMT-0400 (EDT): Power state: online=>asleep Fri Jul 26 2013 09:59:45 GMT-0400 (EDT): Power state: asleep=>online Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): Downloading new code Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): Device configured to be "CONTACT" Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): Agent Started Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): connected Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): sleeping until 1374933470000 Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): no handler for agent.send() Fri Jul 26 2013 09:59:46 GMT-0400 (EDT): Power state: online=>asleep Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): Power state: asleep=>online Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): Downloading new code Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): Agent Started Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): 0c2a69003664 Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): Device configured to be "FAIL CLEAR" Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): sleeping until 1374934304000 Fri Jul 26 2013 10:13:20 GMT-0400 (EDT): Power state: online=>asleep

The Fri Jul 26 2013 09:57:43 GMT-0400 (EDT) was supposed to be the FAIL CLEAR push, but it kept returning CONTACT which is why the “no handler for agent.send” was reported.

DolfTraanberg · July 26, 2013, 10:38am

for what I can see is that the “no handler for agent.send()” should be fixed first.
When an error occurs, you have a good chance that the Imp is responding unpredictable

saferaging · July 26, 2013, 10:40am

As mentioned, that is happening because the imp service is not using the correct agent code with the correct device code despite it being set in the IDE. This is an issue not with my code, but with the service.

DolfTraanberg · July 26, 2013, 10:45am

Well, in that case I think you’ll have to wait for the Imp guys to pop in.
Could be a bug, but I never had that issue.

saferaging · July 26, 2013, 10:46am

cool, thanks for trying though! Do appreciate the community love to help out!

saferaging · July 26, 2013, 11:39am

Imp team - can you review the thread and provide some guidance? Thanks!

hugo · July 26, 2013, 1:39pm

Try extending the time before a sleep, so that there’s more time when both the imp and agent are “up” for the run command to work. RIght now I believe the agent will only restart if the run command is received when the imp is online, which would explain what you’re seeing.

saferaging · July 26, 2013, 1:41pm

It was 60 seconds, I moved it down to 10 seconds for more rapid testing AFTER the event. Is 1 minute too quick?

hugo · July 26, 2013, 1:42pm

No, should only need to be a few seconds (long enough for the imp to reboot, which should restart the agent too).

Actually, your agent is restarting (assuming “agent started” is a log from your code). Are you absolutely sure it’s not running the latest code? Maybe put a version number in the print?

saferaging · July 26, 2013, 1:52pm

“Agent started” is from my code, the problem is it isn’t properly downloading new code for device that matches the agent it thinks is loaded, there have been several posts in the forum about this – so I am not alone in this.

–

Coincident to this, my original, root issue is that exceptions and errors seem to be getting swallowed up and I would like a little more color about why things are breaking when they break.

hugo · July 26, 2013, 6:06pm

“Downloading new code” means new code is being sent to your imp. Are you saying that this is not happening? We’re not aware of any issues like that, though if your imp is not processing the data (ie it’s sleeping without an onidle) then the packet may never get processed by the imp and hence the code won’t update.

I’ve not seen any other posts about this? There’s a known issue where agents don’t restart until a device restarts (which means you can’t update the agent for a sleeping imp), which is addressed in a future release.

What exceptions and errors are getting swallowed? If you have code to replicate an issue that we can try, that helps a lot with getting bugs fixed.

saferaging · July 29, 2013, 12:33pm

Perfect example is the current status of “20000c2a69003664” right now, I have updated the version to be 4, I have saved it numerous times both offline and while imp is connected – it refuses to update to version 4 despite the logs saying “downloading new code” and going through a new configured process.

saferaging · July 29, 2013, 12:35pm

In terms of exceptions and errors, the problem is it just dies and we get zero data back. Now if I pull suspicious code out, it seems to magically work (assuming that the service updates the device in a timely fashion – see above). I can re-add the code parts until the issue is identified by human inspection and then it all works. This makes debugging incredibly hard.

hugo · July 30, 2013, 12:56am

If you’re running into errors after disconnecting from the server, then that is hard to debug right now; we have a task in for runtime errors that happen after the server connection has been torn down to be reported at the next connect - which happens right after a runtime error happens.

If you have examples of an imp dying without reporting an error when it is connected to the server, we’d be very interested to see it (and fix it). These do exist, but are rare (like we had an issue with deeply nested tables causing issues when being sent to the agent).

The issue with imp 3664 is, I believe, because you’re sleeping before the imp has had a chance to process the new squirrel message. The server has sent the code to the device, but if it’s told to disconnect before all incoming data has been processed then it will do that. If not then we would very much like to investigate further.

saferaging · July 30, 2013, 11:47am

This is now in the ticketing system - still an outstanding issue. We will use the ticket thread to track progress (not this forum thread) debugging on the Imp end.

hugo · July 31, 2013, 1:01am

…for anyone else watching, if you have an issue with getting a mostly-asleep imp to take new code, delay your sleep/server disconnect by maybe 0.2 seconds (ie, after onidle, have an imp.wakeup for at least 0.2 seconds later that calls the sleep).

What happens is the imp is going to sleep before the server has had a chance to tell it it’s got new firmware waiting.