BBW weird lock-up and difficult restart.

I’ve a BBW (Rev A6) that has been running an IOT application 24/7 for 5 or 6 months now. Its worked great, until now.

I had a weird hard lockup where it stopped with all the LEDs off and didn’t respond to the reset button. I immediately suspected failure of the 5V supply as the 12V parts of the system were still alive

I unplugged the power supply and it wasn’t dead. When I plugged it back in after a quick test of the power supply, one green LED and both Ethernet LEDs lit but it still didn’t boot or respond to the reset button (which I left accessible via a pencil point on purpose). I unplugged the power supply again and left it unplugged while I checked a few other things. About 15 minutes later, still scratching my head and fretting about the hassle of swapping in my BBG I have on hand as a spare, I plugged the power supply back in to make a few more measurements (access to the board is difficult, but key interface circuit points are probe-able) and the thing booted right up and is working nominally again.

Any ideas what could have caused this need for a long interval without power before it would boot?

One of the other devices in the IOT system sent me a “no hearbeat” Email message about it, as designed, but since a simple power cycle didn’t initially cure it, this could be a real problem eventually, (I haven’t yet implemented the planned power cycling hardware). Needing a 15-20 minute power down to reset could be hard to deal with

The BBW is on a UPS, same one as the router which never glitched, so its hard to see it as a AC power issue, although a thunderstorm is moving through the area (small one by our standards and I never noticed the lights dim or flciker). Also the Raspberry Pi2 which sent the “no hearbeat” message never glitched, despite not being on a UPS.

I’m mystified by this “long time constant” for the power cycling to have any effect. Nothing on the BBW gets other than barely delectably warm to a finger touch.

Any ideas?

This happens to a RevC BBB we have here when a serial debug cable is attached, and using cat /dev/ttyUSB0 on the remote end. This does not seem to happen when using screen, or minicom. But the reason why cat was used was the output from screen, or minicom became garbled over time.

However, unplugging the serial debug cable ( PL2303hx ) on the USB sides cures the glitch.

Other than the GPIO interface pins on the P8 connector (something like 34 inputs and 4 outputs) the only other connections are the 5V power supply “barrel connector” and the Ethernet cable. It wasn’t just a communication failure, all the on board LEDs were out, which is why I’d thought the 5V power supply had failed initially. After the power supply was measured to be putting out 5V I expected it to boot when plugged in again, but it didn’t, although some of the LEDs did lite up. Although later it booted normally after sitting unplugged for 15-20 minutes.

I might think some kind of thermal cutout happened and needed time to cool down and “reset”, but very early on in the development I determined that nothing on the board heads up significantly in long term operation.

The only way you’re going to solve this most likely is by debugging via JTAG.