troubleshooting hung up SD card - processes trying to access SD card just hang

Hi,

After working around the random reboot issue with kernel 3.14.19-ti-r28 by compiling the kernel without watchdog support (details here), I ran into a different issue on my BBB (rev. A5C) today. From what I can tell, during my overnight backup last night, the SD card somehow locked up (with user led1 permanently lit) and any process that had tried to access the SD card was locked up as well.
Trying to kill any of the hung processes with “sudo kill pid” wouldn’t work in most cases and only resulted in an unkillable process, because after I did that, “sudo kill -9 pid” would be useless as well. So I killed most of the processes with kill -9, but I still ended up with a bunch completely hosed.

The SD access was completely busted so not even a simple ls command worked anymore. That would just hang as well and I’d have to kill -9 it.

The rest of the BBB was working just fine, as long as I didn’t access the SD. The CPU was pretty much idle, except for postgresql process that tried to access the database that is also on the SD, but obviously couldn’t anymore.

I also discovered that my eMMC (from which I boot my Debian 7 while the SD card is used for storage), was almost out of space. That was mostly due to me installing a number of kernel versions over the last couple of days, which contributed to a shortage in space. Note that my tmp and log folder are on a USB drive and thus the system eMMC wasn’t completely filled up and was still working.
While I initially thought the low available memory to be the cause of the problem, and I certainly wish for that to be the answer, I fear it might be happening again.

I do have netconsole logging running and of course, nothing was in the log.

The SD card I use is a SanDisk Ultra 32GB Micro SDHC.

So I’m wondering if you have any suggestions on what I can do to gather more information on the issue, should it arise again? Would I be able to do some sort of process dump, e.g. of the process handling the SD card? If so, could anyone point me to information on how that is done and what process I would have to run it on? I kind of guess that it might be the mmcqd/0 process? That’s one I’ve seen causing an issue on the 3.8.13 kernel where it randomly froze up and I do recall that on these occasions, the usr LED1 also was always lit. My report of that issue (and the kernel panic details) can be found here.

I appreciate any pointers and assistance you can offer.

Sebastian

Early this morning I had this in my netconsole log. The Beagle was still up and running fine. I unmounted the SD card and ran e2fsck -cvf /dev/mmcblk0p2 which didn’t turn up any issues.

Any ideas what this is about?

I just managed to cause the issue again by running:
sudo du -s *
while in the main directory of the SD card. So this seems to be connected to intensive SD card use.

Again, nothing in the logs that I could identify. Since the shutdown didn’t work as expected I had to cut power.

Shutdown just got fixed, pushed out as r30, currently building..

Regards,

It’s good to hear that the power down issue is fixed. However, I think I wasn’t very clear in my last message. The system couldn’t do a proper shutdown as the SD card had locked up, which resulted in hanging every process attempting to access the card or its contents. That’s what prevented Debian from shutting down (completely unrelated to the power of the Beagle not turning off).

I believe the issue I’m seeing is a different version of the one I reported on github with the 3.8.13 kernel. This may be an issue with just my BBB, the SD card I use or perhaps an incompatibility with the both not working together properly.

For now, I’ve rolled back to the 3.8.13 kernel as at least there I didn’t used to have the random reboots the 3.14 kernel is giving me.