BeagleBone problems with continuous operation

So we're having some trouble with our BeagleBones -- very painful to
debug since it takes hours to days for the issue to manifest -- and I
was wondering if someone could put forth some ideas.

In short the problem is that the boards are becoming unresponsive
after a while. They're hooked up to another device via ethernet, from
which we can tell that the beaglebone is no longer behaving properly
(since it's stopped communicating with the device). I can't ssh in to
the board when this happens ("no route to host"), and when I plug USB
into the beaglebone to open a tty the board seems to reboot.

It's possibly a software/OS level error, but since I can't access the
board either by ssh or tty to check when the problem comes up I don't
know. I've put quite a few error logging messages in the software I'm
running, but none of them have been emitted (or at least none have
made it all the way to disk).

I've read that some boards have an issue with the reset switch. I've
checked the voltage on that line and it seems to be okay (3.3V on pin
10 of P9), but the reset line on the board still seems to be very
sensitive. While I was checking the voltage the board kept rebooting
whenever the probe contacted the pin, or when the pin was jiggled,
etc.

Actually I never saw the reset line go low, even when I had it
monitored on the scope and the board was clearing resetting due to my
prodding of pin 10.

I get why the reset line might reset the device if jostled, but why
does the USB cable freak it out enough to reset? What could be going
on here?

What is your DC power source? Is it properly grounded? In other words is the return side of your 5V power supply grounded? By plugging in your USB cable from a different ground source it could inject noise onto the ground lead.
If plugging it in causes issue, I suggest you keep it plugged in and reset the board, run your SW and then try and debug your issue from there.

Gerald

Yes, that could be the issue with the USB reset (board is powered off
a Li battery while the USB cable came from my desktop). Will try to
think of a way to power both from the same source without compromising
the setup -- thanks!

Peter

A battery? Well, in that case USB will not work as it requires 5V. The system is designed for 5V.

Gerald

The battery output is switched down to 5V for the BB.

Is the negative side of the battery connected to earth ground? When you plug in the USB cable, you are connecting the system ground to the PC ground. The battery and the PC need to connected to the same ground.

Gerald

I had a problem with the systemd journal file (in /run/log/journal/...) growing without bounds. As this is a tmpfs partition, it ends up as a memory leak with the kernel ultimately killing various processes. My system lasted about a day and a half before exhibiting similar symptoms to yours. Running the following crude script is my current Band-aid, but I am pursuing an "official" fix via the systemd mailing list. Note that setting the currently undocumented limits in /etc/systemd/systemd-journald.conf did not work as expected :frowning:

Hope this helps,

Dave.

systemd-journal_clean (142 Bytes)

I had a problem with the systemd journal file (in /run/log/journal/...) growing without bounds. As this is a tmpfs partition, it ends up as a memory leak with the kernel ultimately killing various processes. My system lasted about a day and a half before exhibiting similar symptoms to yours. Running the following crude script is my current Band-aid, but I am pursuing an "official" fix via the systemd mailing list. Note that setting the currently undocumented limits in /etc/systemd/systemd-journald.conf did not work as expected :frowning:

The updated systemd packages in angstrom will now log to disk and not include coredumps, which should improve the sitation a bit.

regards,

Koen

The last time the system locked up, I was able to see that the “heartbeat” LED on the BeagleBone had gotten stuck (in the ON state).

Do you know at what level this blinking is implemented (processor / OS / user space process)?

Thanks

The kernel blinks the heart beat LED. If you have a true kernel panic
where it cannot continue, the LED will stop blinking.

The blink rate should speed up with increasing system load. If you put
a load on the board by running some intensive software, the blink rate
should correspond to the load average (higher load == faster blinking).

-Andrew