Sigh, mmcqd hang in Ubuntu

Robert, I don’t know if you’re reading but I love your work on getting Ubuntu running on the Beaglebone Black, I’ve been playing with the latest release and have discovered that when ever it gets significant disk load (I can do this reliably with an apt-get update) I get a panic like so:

[ 900.821426] INFO: task mmcqd/0:69 blocked for more than 60 seconds.
[ 900.828175] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.ot
[ 900.837171] Kernel panic - not syncing: hung_task: blocked tasks
[ 900.843545] [] (unwind_backtrace+0x0/0xe0) from [] (panic+0x84/0x1e0)
[ 900.852182] [] (panic+0x84/0x1e0) from [] (watchdog+0x1d4/0x234)yslog

[ 900.860359] [] (watchdog+0x1d4/0x234) from [] (kthread+0xa0/0xb0)ot

[ 900.868629] [] (kthread+0xa0/0xb0) from [] (ret_from_fork+0x14/0x3c)

[ 900.877152] drm_kms_helper: panic occurred, switching back to text console

And well, it doesn’t really switch back to a text console, perhaps it does but my debug port serial cable isn’t having it switch back it goes dead. Reboot, it fixes up the root partition and crashes again, reboot again and it comes up multi user.

Presumably it is deadlocking somewhere in the mmcq part of the world. I don’t have a cross compilation environment up yet, its on my list.

–Chuck

I’m getting something like that on angstrom too, you guys have an idea of how to fix it???

this is the error:

[ 120.198907] INFO: task mmcqd/0:74 blocked for more than 60 seconds.
[ 120.205540] “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.

[ 120.213773] Kernel panic - not syncing: hung_task: blocked tasks
[ 120.220067] [] (unwind_backtrace+0x1/0x8c) from [] (panic+0x55/0x14c)
[ 120.228631] [] (panic+0x55/0x14c) from [] (watchdog+0x153/0x1a0)
[ 120.236740] [] (watchdog+0x153/0x1a0) from [] (kthread+0x61/0x72)
[ 120.244938] [] (kthread+0x61/0x72) from [] (ret_from_fork+0x11/0x34)

the same here…
rev A5A with angstrom as provided by factory.

using a stress program that read/write on the nand and is ok I leave the board running it for weeks… and it is ok… If I login to ssh and ask for something else … bang… let’s see similar panic messages…

I stopped the porting of my application to the new BBB even if it is cheaper and has better performance because of this, can anyone help or anyone knows about some fix of this in newer kernels?

Hmm...your stack trace doesn't match where mine goes wrong when I get
these hangs:

http://bb-lcnc.blogspot.com/2013/10/hung-task-bug-in-xenomai-kernel.html

...but I suspect it is the same issue. The mmc hang is a known problem
with the BeagleBone 3.8.13 kernel, although it shows up a *LOT* more
once you apply the Xenomai real-time patches (which are not directly
related to the mmc hang but increase IRQ service times enough to
'tickle' the bug a lot more). The reason you don't see hangs until you
login via ssh is likely because you need to have multiple threads
talking to the SD card to trigger the bug.

The solution identified by Rolf Roesch is to cherry-pick a fix for the
problem from the 3.12 kernel source:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7472bab236bdee1173412585591329e718f4d324

This commit seems to fix the problem on both normal and xenomai patched
kernels, and will hopefully fix your problem as well.

I've already added this commit to my xenomai real-time kernel build, but
I was leaving it to Rolf to let everyone know about the cherry-pick fix
and get credit for finding it since he did all the leg-work to figure it
out. Since you're asking about it, though, I figured I should share the
'magic'. :slight_smile:

Now does anyone know who to push this to so it gets into the upstream
BeagleBone kernel patch-set for 3.8?

Pedro did you ever manage to fix this? It’s been crippling us, too. Our application does a fair of IO reading from FTDI devices, and crashes after a random amount of time with this stack trace.