Possible eMMC firmware bug or hw issue - recent Seeed Studio BBBs with 6.1.x kernels

and it seems to be breaking everyone (am57xx too)… sdhci-omap: additional PM issue since 5.16

Sounds like it might finally be getting some much needed attention upstream!

One week in, really 3 weeks to go…

*************************************************
eMMC Firmware Version: 
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************
1 Like

Fingers crossed! :wink:

2 weeks in…

dmesg | grep boot0
[    5.362457] mmcblk1boot0: mmc1:0001 MK2704 2.00 MiB 
*************************************************
eMMC Firmware Version: 
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************

3 weeks in…

dmesg | grep boot0
[    5.362457] mmcblk1boot0: mmc1:0001 MK2704 2.00 MiB 
*************************************************
eMMC Firmware Version: 
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************

if it fails ‘it’ should fail in the next few days… but i’ll be on vacation, so next update will be next Wednesday…

Regards,

I think we can say this mess is fixed… year and half???

[44-am335x-bbb: 6.1.83-ti-r37 (up 4 weeks, 18 hours, 20 minutes)]
*************************************************
cat /sys/kernel/debug/mmc1/ios
clock:		52000000 Hz
vdd:		21 (3.3 ~ 3.4 V)
bus mode:	2 (push-pull)
chip select:	0 (don't care)
power mode:	2 (on)
bus width:	3 (8 bits)
timing spec:	1 (mmc high-speed)
signal voltage:	0 (3.30 V)
driver type:	0 (driver type B)
*************************************************
dmesg | grep boot0
[    5.362457] mmcblk1boot0: mmc1:0001 MK2704 2.00 MiB 
*************************************************
eMMC Firmware Version: 
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************
cat /boot/uEnv.txt
uname_r=6.1.83-ti-r37

Sent out my angry message to: sdhci-omap: additional PM issue since 5.16

Regards,

1 Like

I think you’re showing remarkable self-restraint… :joy:

Thanks for putting in the work to get to the bottom of this Robert. Are there any fixed kernels which have Debian binary packages available yet?

I presume that although these particular Kingston eMMCs have been failing because of these repeated resets, the regression also has the potential to negatively effect other eMMCs and SD cards too?

Thanks,

Tim.

Hi @TimSmall at this point, every kernel ‘branch’ and tag from v6.1.x → 6.14.x from me has that commit reverted (including arm64 builds).

I think we’ve been incredible lucky with older eMMC, that just seem to handle being reset XYZ times a second!

I’ll keep the test board running, now almost 5 weeks… in my ci farm, to keep an eye on it.

[44-am335x-bbb: 6.1.83-ti-r37 (up 4 weeks, 5 days, 22 hours, 35 minutes)]

cat /sys/kernel/debug/mmc1/ios
clock:		52000000 Hz
vdd:		21 (3.3 ~ 3.4 V)
bus mode:	2 (push-pull)
chip select:	0 (don't care)
power mode:	2 (on)
bus width:	3 (8 bits)
timing spec:	1 (mmc high-speed)
signal voltage:	0 (3.30 V)
driver type:	0 (driver type B)
*************************************************
dmesg | grep boot0
[    5.362457] mmcblk1boot0: mmc1:0001 MK2704 2.00 MiB 
*************************************************
eMMC Firmware Version: 
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************
cat /boot/uEnv.txt
uname_r=6.1.83-ti-r37

Regards,

1 Like

Robert

Is there a plan (on your part or anyone elses) to release updated images for the BBB including this fix. I saw on one of the other channels your comment to the effect that ‘it is pretty late for the armhf’, I fully understand that the age and limited capabilities of this CPU mean that at some time people will stop expending effort on it - but is that time now? Looking on Latest Software Images - BeagleBoard the latest images offered when the filter is set to ‘Beaglebone Black’ are from September 2023, I have been able to boot SD card variants, but I have not been able to flash any of these to the eMMC on my Seeed Studio BBB purchased about 18 months ago. From the u-boot log last time I tried to do this, about 6 months ago it looked like the eMMC and/or SD card were shuffling their /dev filesystem node names at some time between the OS being booted from the flasher image on SD and the flashing script being auto-run, not sure this was a symptom of the issues with Seeed devices covered by this ticket - there is some logging from this in this comment.

Anyway, is there any likelihood of new BBB images in the foreseeable future, or should I work my way back through older images until I find one that works for me (probably around Debian 9)?

Nothing I am working on requires bleeding edge OS, but obviously from a security PoV supported is better than out-of-support.

I should just say, whatever your response, I’m very grateful for the extensive work you’ve put into keeping this community of devices viable, and I respect your right to focus on more current devices if that is what you choose to do.

Weekly builds can be find here. Index of /rootfs (v5.10/v6.1)

Now that the eMMC bug is fixed, I’m planning to move armhf to 6.12 lts, and I need to finish wiring in bb-imager (first user setup) support

1 Like

Thanks Robert for the rapid response and the positive news it contains.

still lots of things to work out, but here’s a quick v6.12.x base: Index of /rootfs/debian-armhf-12-bookworm-base-v6.12 after you flash it mount the ‘boot’ partition, and change sysconf.txt defaults…

I’ll be at EW next week, so not planing any changes for a week or two…

(microSD only, no thoughts on eMMC flasher compatibly)

Regards,

Kernel developers want to try a lighter revert, so i’m going to update this node, to that change and start watching it for 4 weeks…

[44-am335x-bbb: 6.1.83-ti-r37 (up 6 weeks, 4 days, 23 hours, 36 minutes)]

dmesg | grep boot0
[    5.362457] mmcblk1boot0: mmc1:0001 MK2704 2.00 MiB 
*************************************************
eMMC Firmware Version: 
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************

https://lore.kernel.org/lkml/20250312121712.1168007-1-ulf.hansson@linaro.org/

I take it there’s nothing wrong with running the “heavier” mods you already made in the meantime?

Correct… the full revert i did hurts devices with sdio/wifi when trying to suspend/idle…

Regards,

Thanks for this, I’m a little late to the party but i think what’s been explained here may explain what we’re seeing on a bunch of beaglebone blacks we have deployed.

However, if understand correctly, this only affects beagles running v6 kernels? All of our beagles are running v5.10.x kernels, does that mean that the issues we’re seeing with the failing Kingston eMMCs is actually caused by something else?

Regardless, using the late v6.12.x kernels will correct the problem?

So we are still working with upstream, testing a less invasive revert, but any kernel from v6.1.x (where the commit was first applied) to mainline you need to revert 3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 mmc: sdhci-omap: Allow SDIO card power off and enable aggressive PM - kernel/git/torvalds/linux.git - Linux kernel source tree

I’m 3 days into a 4 week verification test with the new revert modification…

In it’s simplest form, it reset the eMMC many times a second, till after 3 weeks the eMMC just gives up and dies… Wear pattern is low, but the parts gives up, and no matter what never responds.

All the beagleboard.org branches have 3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 reverted but if you use some other tree, revert it…

Regards,

We use pre-built kernels built by rcn and downloaded from the usual place. However, so I could look at the code to be doubly-dog sure, I used the tools at BeagleBoard Kernel docs and built a kernel (5.10.168-ti-r76) that we have deployed on sites in the field. Sure enough, the PM related patch is not built as part of the kernel.

However, we have 100+ BBBs that uses the Kingston EMMC04G-MK27 running 5.10.x kernels with eMMC-related failures. We purchased them from a Mouser or Digikey but the silkscreen says Seeed Studio and they’ve been deployed in a multiple locations for less than a year. When they were deployed, we did our typical on-site testing and they worked as expected. However, there were delays in the larger project where these were to be used and they continued to run but were not being monitored. About 4 to 6 months went by, the project was back on track. We went to integrate them we had all these errors, hence, I don’t really know when they started failing.

Finding this thread lead us to the Kingston chip and correlates to what we have observed, failures are only occurring with beagles with the MK27 eMMC.

My conclusion is that the 5.10.x kernels don’t have the PM-related commit, but they are experiencing h/w failures similar to (if not exactly the same as) what was reported with the v6.1.x kernel.

Where does that leave us?

Here’s a complete log of one the units that won’t boot and it looks similar to what others have reported:

U-Boot 2022.04-ge0d31da5 (Aug 04 2023 - 18:48:26 +0000)

CPU  : AM335X-GP rev 2.1
Model: TI AM335x BeagleBone Black
DRAM:  512 MiB
Reset Source: Power-on reset has occurred.
RTC 32KCLK Source: External.
Core:  150 devices, 14 uclasses, devicetree: separate
WDT:   Started wdt@44e35000 with servicing (60s timeout)
MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
Loading Environment from EXT4...
** Unable to use mmc 0:1 for loading the env **
Board: BeagleBone Black
<ethaddr> not set. Validating first E-fuse MAC
BeagleBone Black:
BeagleBone Cape EEPROM: no EEPROM at address: 0x54
BeagleBone Cape EEPROM: no EEPROM at address: 0x55
BeagleBone Cape EEPROM: no EEPROM at address: 0x56
BeagleBone Cape EEPROM: no EEPROM at address: 0x57
Net:   eth2: ethernet@4a100000, eth3: usb_ether
Press SPACE to abort autoboot in 0 seconds
board_name=[A335BNLT] ...
board_rev=[00C0] ...
switch to partitions #0, OK
mmc0 is current device
SD/MMC found on device 0
Couldn't find partition 0:2 0x82000000
Can't set block device
Couldn't find partition 0:2 0x82000000
Can't set block device
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
Scanning disk mmc@48060000.blk...
Scanning disk mmc@481d8000.blk...
Found 4 disks
No EFI system partition
BootOrder not defined
EFI boot manager: Cannot load any image
gpio: pin 56 (gpio 56) value is 0
gpio: pin 55 (gpio 55) value is 0
gpio: pin 54 (gpio 54) value is 0
gpio: pin 53 (gpio 53) value is 1
switch to partitions #0, OK
mmc0 is current device
gpio: pin 54 (gpio 54) value is 1
Checking for: /uEnv.txt ...
Checking for: /boot/uEnv.txt ...
** Invalid partition 2 **
Couldn't find partition mmc 0:2
** Invalid partition 3 **
Couldn't find partition mmc 0:3
** Invalid partition 4 **
Couldn't find partition mmc 0:4
** Invalid partition 5 **
Couldn't find partition mmc 0:5
** Invalid partition 6 **
Couldn't find partition mmc 0:6
** Invalid partition 7 **
Couldn't find partition mmc 0:7
switch to partitions #0, OK
mmc1(part 0) is current device
Scanning mmc 1:1...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
BootOrder not defined
EFI boot manager: Cannot load any image
gpio: pin 56 (gpio 56) value is 0
gpio: pin 55 (gpio 55) value is 0
gpio: pin 54 (gpio 54) value is 0
gpio: pin 53 (gpio 53) value is 1
switch to partitions #0, OK
mmc1(part 0) is current device
gpio: pin 54 (gpio 54) value is 1
Checking for: /uEnv.txt ...
Checking for: /boot/uEnv.txt ...
gpio: pin 55 (gpio 55) value is 1
1741 bytes read in 3 ms (566.4 KiB/s)
Loaded environment from /boot/uEnv.txt
Checking if uname_r is set in /boot/uEnv.txt...
gpio: pin 56 (gpio 56) value is 1
Running uname_boot ...
loading /boot/vmlinuz-5.10.168-ti-r76 ...
11129344 bytes read in 706 ms (15 MiB/s)
debug: [enable_uboot_overlays=1] ...
debug: [enable_uboot_cape_universal=1] ...
debug: [uboot_base_dtb_univ=am335x-boneblack-uboot-univ.dtb] ...
uboot_overlays: [uboot_base_dtb=am335x-boneblack-uboot-univ.dtb] ...
uboot_overlays: Switching too: dtb=am335x-boneblack-uboot-univ.dtb ...
loading /boot/dtbs/5.10.168-ti-r76/am335x-boneblack-uboot-univ.dtb ...
210706 bytes read in 18 ms (11.2 MiB/s)
Found 0 extension board(s).
uboot_overlays: [fdt_buffer=0x60000] ...
uboot_overlays: uboot loading of [BB-ADC-00A0.dtbo] disabled by /boot/uEnv.txt [disable_uboot_overlay_adc=1]...
uboot_overlays: loading /lib/firmware/BB-NARWHAL-UART4-UBLOX-GNSS.dtbo ...
1493 bytes read in 22 ms (65.4 KiB/s)
uboot_overlays: loading /boot/dtbs/5.10.168-ti-r76/overlays/BB-BONE-eMMC1-01-00A0.dtbo ...
1605 bytes read in 6 ms (260.7 KiB/s)
uboot_overlays: uboot loading of [BB-HDMI-TDA998x-00A0.dtbo] disabled by /boot/uEnv.txt [disable_uboot_overlay_video=1]...
loading /boot/initrd.img-5.10.168-ti-r76 ...
6828963 bytes read in 437 ms (14.9 MiB/s)
debug: [console=ttyS0,115200n8 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_pool=1M net.ifnames=0 lpj=1990656 rng_core.default_quality=100 quiet] ...
debug: [bootz 0x82000000 0x88080000:6833a3 88000000] ...
Kernel image @ 0x82000000 [ 0x000000 - 0xa9d200 ]
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
   Loading Ramdisk to 8f97c000, end 8ffff3a3 ... OK
   Loading Device Tree to 8f8e5000, end 8f97bfff ... OK

Starting kernel ...

[    0.151280] l3-aon-clkctrl:0000:0: failed to disable
[    9.419005] debugfs: Directory '49000000.dma' with parent 'dmaengine' already present!
[    9.451853] gpio-of-helper ocp:cape-universal: Failed to get gpio property of 'P8_03'
[    9.451880] gpio-of-helper ocp:cape-universal: Failed to create gpio entry
[   10.114321] omap_voltage_late_init: Voltage driver support not added
rootfs: recovering journal
[   23.138837] sdhci-omap 481d8000.mmc: Card stuck in wrong state! card_busy_detect status: 0xe40
[   23.346262] mmc1: cache flush error -110
[   24.582813] mmcblk1: recovery failed!
[   24.586608] blk_update_request: I/O error, dev mmcblk1, sector 8192 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[   24.597436] Buffer I/O error on dev mmcblk1p1, logical block 0, lost async page write
[   24.605384] Buffer I/O error on dev mmcblk1p1, logical block 1, lost async page write
[   24.619228] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.625378] mmcblk1: recovery failed!
[   24.629146] blk_update_request: I/O error, dev mmcblk1, sector 11880 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[   24.640027] Buffer I/O error on dev mmcblk1p1, logical block 461, lost async page write
[   24.653933] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.660080] mmcblk1: recovery failed!
[   24.663833] blk_update_request: I/O error, dev mmcblk1, sector 12808 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[   24.674706] Buffer I/O error on dev mmcblk1p1, logical block 577, lost async page write
[   24.688610] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.694766] mmcblk1: recovery failed!
[   24.698498] blk_update_request: I/O error, dev mmcblk1, sector 31456 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[   24.709370] Buffer I/O error on dev mmcblk1p1, logical block 2908, lost async page write
[   24.723381] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.729528] mmcblk1: recovery failed!
[   24.733285] blk_update_request: I/O error, dev mmcblk1, sector 31504 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[   24.744160] Buffer I/O error on dev mmcblk1p1, logical block 2914, lost async page write
[   24.752331] Buffer I/O error on dev mmcblk1p1, logical block 2915, lost async page write
[   24.766301] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.772448] mmcblk1: recovery failed!
[   24.776183] blk_update_request: I/O error, dev mmcblk1, sector 3416064 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[   24.787214] Buffer I/O error on dev mmcblk1p1, logical block 425984, lost async page write
[   24.803039] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.809294] mmcblk1: recovery failed!
[   24.813064] blk_update_request: I/O error, dev mmcblk1, sector 3416064 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[   24.824113] Buffer I/O error on dev mmcblk1p1, logical block 425984, lost async page write
[   24.852682] sdhci-omap 481d8000.mmc: error -110 requesting status
[   24.860671] mmc1: cache flush error -110
[   26.094821] mmcblk1: recovery failed!
[   26.098645] blk_update_request: I/O error, dev mmcblk1, sector 8192 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[   26.115614] sdhci-omap 481d8000.mmc: error -110 requesting status
[   26.121779] mmcblk1: recovery failed!
[   26.125559] blk_update_request: I/O error, dev mmcblk1, sector 7405440 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   26.142653] sdhci-omap 481d8000.mmc: error -110 requesting status
[   26.148808] mmcblk1: recovery failed!
[   26.152574] blk_update_request: I/O error, dev mmcblk1, sector 8192 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   26.163104] Buffer I/O error on dev mmcblk1p1, logical block 0, async page read
[   26.176440] sdhci-omap 481d8000.mmc: error -110 requesting status
[   26.182593] mmcblk1: recovery failed!
[   26.192989] sdhci-omap 481d8000.mmc: error -110 requesting status
[   26.199259] mmcblk1: recovery failed!
fsck.ext4: Input/output error while trying to re-open rootfs[   26.205302] mmc1: cache flush error -110


rootfs: ********** WARNING: Filesystem still has errors **********

fsck exited with status code 12
The root filesystem on /dev/mmcblk1p1 requires a manual fsck
(initramfs)