I’m currently diagnosing a Beaglebone Black (A6A) failure. It’s been running for a few months at about 20% cpu load on average. Yesterday I was looking into some intermittent ethernet issues (may be unrelated to this issue) and, after power cycling it, I was not able to communicate with it at all (not via ethernet over network, not via ethernet directly, and not via the onboard USB ethernet connection). The heartbeat light still flashes.
The microHDMI port does not show anything on the screen (but looks like it does have a signal).
I attached the serial port output. It seems to point to eMMC issues. Do these error messages indicate a hardware failure or could it just be a corrupt filesystem? I have not yet tried to flash the eMMC in order to keep it in its current state.
Thanks, anyone who can provide insight,
bbb_failure_serial.txt (9.42 KB)
Also the u-boot messages seem mangled. Not sure what would cause that – if it’s a problem with my serial connection I would expect to see further mangling once the kernel starts to boot, and on the initramfs prompt, but that all seems intact.
Shutdown Linux before powering down.
I tried it again without microhdmi / usb keyboard connected, and got intact messages from U-Boot.
bbb_failure_serial_2.txt (3.18 KB)
So you are implying this is a corrupt filesystem issue?
I won’t be able to guarantee power failures won’t occur during runtime for this application, but if this is a corruption issue I don’t have a problem making the root partition read-only.
That would be my guess. Yes, making it read only will help. Or attach a battery that can keep it powered up until shutdown can occur, and have the kernel shut it down. A lot of discussion in the forum on this topic.
Thanks for your help.
I’m now trying to re-flash the eMMC, but every time I try, after a few minutes all LEDs blink in what looks like an error pattern (two short blinks then pause). I tried using the same uSD to flash another board, which worked fine.
Attached is the serial output from the failing board while trying to flash the eMMC.
This seems to point to a hardware failure – what is the failure rate of the eMMC components?
bbb_failure_flashing.txt (35.5 KB)
remember eMMC/microSD are just managed nand underneath...
They don't last forever..
But i'd double check your power supply.. During this stage:
Copying: /dev/mmcblk0p1 -> /dev/mmcblk1p1
[ 39.436176] EXT4-fs (mmcblk1p1): mounted filesystem with ordered
data mode. Opts: (null)
rsync: / -> /tmp/rootfs/
You'll reach max current draw pretty quickly..
As cpu get's maxed out, along with reading /dev/mmcblk0p1 and writing
I’m using a 5V/2A power supply, and have tried the same process with the identical setup across multiple boards (same power supply, same uSD). I’m not seeing the power supply dip below 5V at all during the flashing process; it’s steady at 5.20V throughout the process. Maybe that’s on the high end, but the spec says 5.0V +/- 0.25 is acceptable.
During the few months of running there was very little writing and reading from the eMMC. After boot and initialization, only log files would change, and there shouldn’t have been many log writes. The nand should be well within its wear/endurance limit.
I’d like to write this failure off as an odd occurrence, but I’m trying to set up a few boards that will be difficult to reach and will need to be very reliable. Is there anything I can do to track down the cause of this problem and be reasonably sure it won’t happen on other boards?