reliability issues with Beaglebone Blacks, various failures

In our lab we have 8-9 Beaglebone Black units. We are using them to control high intensity lights, with circuits in the 7A/9V neighborhood. The circuits are simple, using mosfets to control the high power circuits. There is water in our environment but we protect our electronics pretty well, through both design and procedure. It’s warm (~80F) but not particularly humid. We haven’t really measured the humidity but I’m reasonably comfortable here.

We’ve had three of our boards fail in different ways:

One is an element14 board that boots ok, but reproducibly gives a mysterious error on install from source of Python 2.7.11 (a requirement for us, long ugly story). I’ve tried to reflash with an SD card and it seems unresponsive whether I hold down the S2 button or not. Damaged SD card socket? Linux does see the SD card’s filesystem. It just refuses to boot from it. Maybe a damaged S2 button?

Another element14 one seems to overheat. With an IR camera, we’ve seen the power regulator spike over 60C. It has frozen up on us a couple times if we let it run–lights stop blinking, network goes unresponsive, but does not appear to power off.

Another has a Beaglebone logo with the text '‘beagleboard.org’ silkscreened on it (CircuitCo? It’s close but not identical to the logo I see on their board on Adafruit). It shut down mysteriously. W have rebooted a couple times since and seen diminishing uptimes. Most recently it didn’t finish rebooting at all.

The rest seem to run OK. We have had other failures related to other electronics. These ones seem hardware related, within the BBB. These have all been bought in 2016, I think. We’ve bought them through Adafruit and Amazon when they were out of stock.

A few questions come to mind:

  1. we are looking at scaling up at some point. 35%-ish failure rates for microcontrollers over six months are not going to work for us in the long term. Is this normal for Beaglebone Blacks? We already handle these units pretty carefully. Maybe we could do more, though it’s hard to say what. If we can’t get it under control we might want to find another board to standardize on. The BBB has a lot of features we need–we’d prefer not to leave it if possible.

  2. The industrial BBBs seem to address higher temperature, though we don’t expose ours to extremes of temperature (that they don’t generate on their own anyway). Might the higher quality parts provide higher reliability in general?

  3. Also, could we expect better customer service with industrial BBBs?

  4. What could we expect in general with industrial units?

At this point all I can say is that I did not design these boards for your application and therefore I cannot guarantee that it will do what you need it to do.

From what little you have said about your application, it sounds like industrial versions would be preferable.

I would need to see a detailed schematic of your design to see if there were any issues in your application and use of the board as designed…

There can be a variance across different boards depending on what corner cases you might be hitting. It could be the ones that are working fine are working better than they should be as opposed to the ones that fail not working as good as they should be.

Gerald

I understand that. I’m just trying to work through the issues we’re seeing, and find a way forward for us. It’s a trial-and-error process. I chose the BBB initially because it’s a great design that meets a lot of our needs, known and potential. To go with another design would likely involve compromises for us.

I could provide more details, though I’d prefer not to do so in a public forum. I would also need to talk to my management about confidentiality and stuff. If you are open to a (hopefully brief) offline conversation, that would be ideal.

I appreciate any help you can give.

One is an element14 board that boots ok, but reproducibly gives a mysterious error on install from source of Python 2.7.11 (a requirement for us, long ugly story).

This more than likely is a software issue. So, it sounds like there is some sort of software requirement, but I’d very serious consider changing this requirement. At minimum, make a good case to management.

As to the rest of your questions. No one can really answer these questions because you’ve not provided enough information. However with that said, this is something you really need to pay someone for. Someone who knows the whole situation, and someone who knows the hardware, and software very well.

I am in a similar situation - the red industrial boards undergo humidity tests for a week, then work for about 10 minutes before failing - according to the testers.

I was wondering if in general it might be more reliable to run SD card images rather than using eMMC. We don’t need 4 GB of eMMC but we are stuck with its
larger failure cross-section.
Right now I am pursuing the avenue of reducing processes/software that are not needed.

BTW I noticed all those industrial Beaglebones are gone as of 2018.