Beaglebone Black with Ubuntu flasher hanging unpredictably?

Hi all,

I’ve deployed a number of Beagle Bone black units in West Africa as part of an emergency connectivity relief effort to support NGOs working to fight the Ebola outbreak. The beaglebones are providing a simple network monitoring function.

The beaglebones were imaged in November with the Ubuntu flasher downloaded from here http://elinux.org/BeagleBoardUbuntu#Flasher (The version of the image is BBB-eMMC-flasher-ubuntu-14.04.1-console-armhf-2014-10-29-2gb.img)

I’m having an issue with a few of the beaglebones hanging unpredictably, and I know I should provide some more information to help diagnose…but I’m having a hard time finding any “smoking gun” of what’s causing the hang. The beaglebones are in remote telco sheds monitoring network equipment - so one of my challenges is that I don’t have a monitor connected or anyone I can ask “whats on the screen.” Fortunately I do have the ability to power cycle remotely (see below).

Here’s what I know:

  1. The beaglebones have not been modified much at all from the standard base flasher image. Just a few monitoring tools I’ve added from apt packages (smokeping and zabbix-proxy) I use these tools elsewhere, and I’ve never had an issues with them hanging a system.
  2. The systems run for weeks at a time just fine
  3. At some point, the systems in question will “hang”. They stop responding to pings, but the ethernet port of the router they are connected to still shows a link light.
  4. Because I have the beaglebone connected to a remote manageable power strip / PDU, I am able to power cycle the beaglebone when this happens. This causes the unit to boot normally, and it functions normally before the problem reoccurs another few weeks later.

Each beaglebone is powered by a dedicated 5V / 1A power supply connected to its barrel connector. Other equipment at the site does not hang or reboot - so I know the beaglebone hang does not coincide with a power issue at the site.

Can anyone give me any tips on diagnosing this? I can see the time of hang and powercycle in dmesg and syslog…but there’s no hint there as to what happened. Everything was “all conditions normal” before the hang.

Has anyone seen this behavior before?

Thanks so much - any help greatly appreciated!

What is the ambient air temperature the BBB is operating in?
I would measure the temperature of the Sitara chip.
Perhaps it is running on the high side.
There is built in die temperature sensor, although I don’t know how easy it is to read it.

Either, based on data, or as an experiment, put a heat sink on the Sitara and/or blow some air over the BBB

— Graham

Thanks for the response! I’d thought about temperature as an issue…I’ll have to dig into this.

I’d done some testing of beaglebones in a hotbox before this deployment and I ran things up pretty hot (like 65C for multiple hours) and never had an issue with the beaglebones…but let me investigate and see if i can find a correlation between high temp and these crashes.

uname -r ?

Regards,

uname -r ?

3.14.22-ti-r31

Hi Graham,

re: Temperature - I looked at graphs of the temperature sensor of a router that’s located in the same cabinet as the beaglebone. At the time of the crash - the router temperature sensor was reading 40C (this sensor is inside the router case, so is not indicative of an ambient air temp of 40C)…it actually looks to have been one of the cooler days.

Yuck, yeah there are some issues with that old version...

Please upgrade to 3.14.37-ti-r57

sudo apt-get update ; sudo apt-get install linux-image-3.14.37-ti-r57
; sudo reboot

and retest one of your units in those conditions.

Regards,

Thank you for the response (and sorry for my lack of update!..been traveling).

I’ll update and let you know if there’s an improvement.

Thanks!

Several weeks after implementing this, and stability seems very much improved.

Thank you so much for your help - this is a big relief to have this sorted!

FYI - there is now a small army of ~15 beaglebones deployed throughout Sierra Leone and Liberia to monitor health of connectivity there.