After working around the random reboot issue with kernel 3.14.19-ti-r28 by compiling the kernel without watchdog support (details here), I ran into a different issue on my BBB (rev. A5C) today. From what I can tell, during my overnight backup last night, the SD card somehow locked up (with user led1 permanently lit) and any process that had tried to access the SD card was locked up as well.
Trying to kill any of the hung processes with “sudo kill pid” wouldn’t work in most cases and only resulted in an unkillable process, because after I did that, “sudo kill -9 pid” would be useless as well. So I killed most of the processes with kill -9, but I still ended up with a bunch completely hosed.
The SD access was completely busted so not even a simple ls command worked anymore. That would just hang as well and I’d have to kill -9 it.
The rest of the BBB was working just fine, as long as I didn’t access the SD. The CPU was pretty much idle, except for postgresql process that tried to access the database that is also on the SD, but obviously couldn’t anymore.
I also discovered that my eMMC (from which I boot my Debian 7 while the SD card is used for storage), was almost out of space. That was mostly due to me installing a number of kernel versions over the last couple of days, which contributed to a shortage in space. Note that my tmp and log folder are on a USB drive and thus the system eMMC wasn’t completely filled up and was still working.
While I initially thought the low available memory to be the cause of the problem, and I certainly wish for that to be the answer, I fear it might be happening again.
I do have netconsole logging running and of course, nothing was in the log.
The SD card I use is a SanDisk Ultra 32GB Micro SDHC.
So I’m wondering if you have any suggestions on what I can do to gather more information on the issue, should it arise again? Would I be able to do some sort of process dump, e.g. of the process handling the SD card? If so, could anyone point me to information on how that is done and what process I would have to run it on? I kind of guess that it might be the mmcqd/0 process? That’s one I’ve seen causing an issue on the 3.8.13 kernel where it randomly froze up and I do recall that on these occasions, the usr LED1 also was always lit. My report of that issue (and the kernel panic details) can be found here.
I appreciate any pointers and assistance you can offer.