Dead Imps: Blink red during blinkup

robertdowling · February 17, 2015, 6:22pm

We’ve had two modules go into this state, conveniently when we give them to the client!

Steps

Imp works fine, accepts blinkups over and over, for good and even bad networks. Accepts any code and runs it. Always accepts with next blinkup. Multiple iPhones and iPads all work.
Give device to client, client calls, “It’s broken”
Device will not blinkup or run code.

Most important symptom we think:

good device pauses blinking during entire blinkup, then blinks once green (or red) after blinkup, then does what it will.
bad device starts blinking red right in the middle of blinkup, even though iPhone is still flashing.

The code is not running on bad device because the GPIOs are not toggling as expected. The module draws about 60mA @ 6V, like the WiFi is on. Note, we switch WiFi off in our app, so we expect current to go to 10mA or so at boot.

Bad device is blinking the “Joining WiFi Network” pattern (Red long, short, short, short) for 1 minute, then shuts off.

Our second device has now switched into this mode and we can’t get either of them back to accepting blinkup.

MTRobert

coverdriven · February 17, 2015, 8:32pm

How is your client powering the module? I saw this blinking behaviour when I mistakenly powered an imp off my monitor’s USB port. The imp starts up but can’t draw enough power to operate wifi so just flashes red. You should confirm this first.

robertdowling · February 17, 2015, 8:38pm

We have the device back. We’ve been running off a bench supply limited to 1.5A at 6V. Tried our normal batteries too.

I notice everyone says LED flashing stops during blinkup. It does not stop on these modules now, but after 1 minute, they shut down, so it doesn’t seem to be a reboot loop. They are almost normal!

I have a 3rd module and it responds correctly, pausing during blinkup.

I wonder if the photodiode has been fried (twice??) somehow. It is just the on-board one.

hugo · February 18, 2015, 2:05am

When you say modules, are you talking about imp001 or imp002? What are they plugged into, exactly?

Also note that if the customer isn’t blinking up with your account credentials, they won’t get your code.

robertdowling · February 18, 2015, 4:49pm

Thank you for your reply.

Imp002 on MakeDECK, with our own simple circuitry (leds, op amp, strain gauge). Yes, they are not using my credentials. That is useful information.

So it is possible these devices have update to some kind of null code, and that is why they won’t let me update them again?

If I provide the client my code, is it possible the next time the module is in range of their wifi AP, it will come back to life?

Or is there some way to factory reset them in place? It’s very hard to unsolder them.

MTRobert

hugo · February 19, 2015, 12:55am

They shouldn’t need to be “factory reset”; their behavior is defined by the account they are on - this is why if they blink up on another account, they won’t get your code.

Side note: this is what our commercial service is for. You make devices, lock them to a certain model, and they will then only ever run your code (the end user cannot reprogram them). You can then also maintain the code of all devices in the field with the ops console, which allows deployment of device & agent code to the installed base, no matter where they may be and without any manual interaction.

If they are online, and your customer gave you their login info, then you could log into their IDE, paste in the code, and hit build & run to make them operational, yes.

If you can provide a MAC address we can look and see if they appear happy or not.

robertdowling · February 19, 2015, 1:14am

We’ve been trying to reproduce the failure with a 3rd good MakeDeck, by creating a new developer account, but not adding any squirrel code, and then having that account blink up the good Imp. Yes, we can get the 3-blink pattern and confirm that my firmware is no longer in there, but it still accepts blinking up after this, so it is not “permanently” broken like our first 2 appear to be. We can recover the good module by blinking up again.

Our bad device MACs are 0c2a69076f90 and 0c2a69078006. (FWIF, the new good imp is 0c2a6907703).

One extra detail, now 6f90 never stops blinking red. 8006 still pauses for about 1/2 of the blink up, but then in the middle, resumes blinking red while the iPad is still flashing. And the good imp pauses red blinking for the full duration of blinkup.

Thanks

MTRobert

hugo · February 19, 2015, 4:55am

As these are unhoused boards, blinkup will be harder to do correctly (as you need to shield from ambient light). I know that MakeDeck did have their bias a little hot on the phototransistor on some early boards - you could be suffering from that issue.

What phone are you blinking up with, iOS or android (android is more sensitive to saturation of the phototransistor).

If the LEDs are blinking regularly, I doubt either of the imps are broken.

robertdowling · February 19, 2015, 5:42pm

I doubt they are broken too.

At first we speculated static damage or phototransistor damage, but my gut says SW (I’m a SW guy, can you tell?). The device is basically running fine (blinks it’s patterns, shuts down in 60 seconds) only problem is that it is just refusing to run blinkup. I think we’ve hit some exceptional corner case. Or even more likely, we’re doing something really dumb.

We’ve removed one bad MakeDECK from our circuit and under no conditions have we been able to blink it up. We did not have his problem before. The fresh MakeDECK, stand-alone blinks up readily even with sloppy procedure. I don’t think that’s the issue.

We’ve used iPad Air, and a couple of different iPhones. All have worked easily on the good module, and not at all on the 2 bad modules.

We will continue to search for what the client is doing (unfortunately, without their help) but by Monday, we will have to deliver something and a recommendation on how to not have it fail in the same way a 3rd time.

Perhaps we can make a movie of good vs bad flashing to illustrate the difference, if that would help you. Any info gained from the MAC?

I appreciate your efforts. I’ll say that I’ve thoroughly enjoyed Squirrel and the very cool API you tucked in, and I’d encourage anyone who’s listening to give Imp a go. It’s very polished, professional and still very playful. It’s a joy to use, and I don’t say that very often.

MTRobert

hvacspei · February 19, 2015, 6:57pm

I’ve used a number of the MakeDeck’s P3V3 and C3V0 over the past several months with few issues. They make a very nice product. As @Hugo mentioned, I may have had one or two that were, perhaps, a bit too “hot” and I resolved this by placing a tissue over the phototransistor while doing a Blinkup. I am assuming you’re using the on-board Blinkup components and not external ones. Please advise if I’m mistaken.

jasongon · February 19, 2015, 7:21pm

I’m working with @robertdowling on this, and yes, we are using the on-board Blinkup components. I hesitate to blame the MakeDeck board, or any hardware for that matter, because these boards once worked fine, with no issues. Only after an undetermined sequence of events did the board stop responding, and it has happened twice now.

We are reading through documentation to try to find any clues, especially regarding the BlinkUp procedure. This seems like a stretch, but I am wondering if this line from the http://electricimp.com/docs/manufacturing/blinkup_faqs/ page might provide some insight into our problem. At the bottom of the page, it says “If API key and account don’t match, end-users will find the device unresponsive.” It seems like this shouldn’t be an issue. We are using the ElectricImp iOS app for BlinkUp, so we are not trying to implement this ourselves, but perhaps having our client try to BlinkUp from a different account has caused this mismatch that made it unresponsive.

gino · February 19, 2015, 7:30pm

“unresponsive” in that case means that the device won’t run your code. (That’s talking about production devices, anyway.) BlinkUp itself should never be affected by something like that.

hvacspei · February 19, 2015, 7:35pm

Are your customers blinking up to their own EI developer account or have these devices been moved into production? If they blinkup to their own developer account, they will first appear in the Unassigned Devices area and will have no application software. They will need to be moved to a model that has the agent and device code you created (assuming you’ve sent that to them).

It sounds to me like you developed and tested the code in your developer account and when they go to blinkup, they use credentials for their developer account and the device hasn’t been associated with an appropriate model in their account. Without a model that includes the agent and device code, the device is rather dumb, so to speak.

jasongon · February 19, 2015, 8:02pm

@hvacspei I definitely agree that following that procedure will cause the Imps to forget their original code, and then seem unresponsive by not doing anything. However, the problem we have is that these dead Imps are unrecoverable. We can no longer get it to respond to BlinkUp from any account, including the original developer account where the application was first created.

We have recreated those steps that you mentioned with a new, working imp002. We associated it with our developer account, ran our code on it, then used a new developer account without any code for BlinkUp. The Imp then just sits and does nothing. We even used BlinkUp to try to connect to both existing and non-existing networks (since we worried this was causing an issue). We did get it to recreate the blink pattern that the dead imps emit, but we were then able to recover it with the original account and get it to happily connect and run our code.

Our concern is that these two dead Imps no longer respond to BlinkUp, when they originally did. We have probed the OPTO_IN pin, and can see the BlinkUp signal on an oscilloscope, but the imp002 doesn’t seem to respond to it. We aren’t sure what we did to get two different Imps into this same unresponsive state, and aren’t convinced that we know enough to avoid doing it again.

hvacspei · February 19, 2015, 8:24pm

If you add them back to the original account after putting them into the second account, they will appear in the Unassigned Devices and have no code. Could this be fooling you into thinking they’re unresponsive? Each time they are introduced to an account, they will appear in the Unassigned Devices group with no code. (I’m pretty sure I’m right about this, but the EI folks will correct me if I’m wrong).

gino · February 19, 2015, 9:33pm

@hvacspei if you didn’t delete them from the IDE, they’ll just start running the original code when you blink them back up to your account. Devices will only show up as unassigned if they are brand new to your account or you’ve deleted them.

the problem is that the device doesn’t stop blinking its LED during BlinkUp. (or that it stops blinking for a bit and then starts again before BlinkUp is complete?) Both behaviors indicate that the device isn’t receiving BlinkUp properly.

@jasongon Have you tried doing a “clear wireless configuration” BlinkUp with that device? It is much shorter, so more likely to succeed under adverse conditions, and if it works the device will start blinking amber, which is very obvious.

jasongon · February 19, 2015, 11:17pm

@gino Thanks for the clarification and the suggestion. We tried the “clear wireless configuration” BlinkUp, and although the LED does stay off for the duration of that process, it goes right back to the red blinking pattern afterward.

We took a video of a dead Imp next to a healthy Imp during a normal BlinkUp attempt so that you can see the difference between their behaviors:

To note: This dead imp does not turn off its LED at all during BlinkUp, but we have another Imp that does, and it also fails to accept or respond to BlinkUp attempts. Also note that the healthy Imp (the one in the foreground) is shown blinking the amber pattern after we successfully cleared its configuration. But the dead Imp, which we tried clearing at the same time, did not take to the “clear wireless configuration” BlinkUp.

gino · February 19, 2015, 11:34pm

@jasongon thanks for the video. When you probed the OPTO_IN pin, did you do it at the module itself? There is a solder jumper on the P3V3 so I just want to make sure it’s getting all the way to the imp. If you could send a scope screenshot of a blinkup that would be informative.

hugo · February 19, 2015, 11:56pm

(also, checking VDDA on the “dead” imp would be good. If this is not powered, then the symptoms which will look just like this)

robertdowling · February 20, 2015, 6:30pm

Success! Thank you for you help.

A combination of two failures caused our two devices to be very difficult to recover. The hint to look at both OPTO_IN and VDDA got us to the solution.

Giving our client an independent blinkup account without populating it with our app caused our app to be erased. By itself, not enough to get us into the jammed state we had yesterday, but this exacerbated a noise problem on our opto circuit.
Our enclosure / perf board assembly puts a lot of spikes on OPTO_IN when WiFi is on. Our app turns WiFi off, so it is quiet enough to allow blinkup. But when our app was removed in 1) and the client’s programmed WiFi is out of range, the constant attempt to download an App creates too much noise during blinkup, and the process always fails.

We overcame noise in our prototype by carefully watching OPTO_in while touching the antenna until the noise went away. At that point, we could blink up the first module.

In the other module, VDDA was indeed disconnected. Connecting that and blinking up outside our enclosure overcame noise and recovered it as well.