eMMC M627 issues

Sorry to ressurrect this thread.

I am seeing a large number of BBBs with M627 eMMC that are failing. They either outright fail to operate under u-boot, or have I/O failures either during boot or a few seconds/minutes after boot.

These BBBs have rootfs on the eMMC and they ran the affected v6.1.38 version for some time. We had a number of dead MK27 chips, pretty much matching the issues covered in this topic, but the M627 chips held up a couple of months ago.

I’m posting here in hope of understanding what effect the problematic aggressive PM patch might have in the M627 in long-term operation. The rootfs is running from the eMMC and the BBBs have been updated, to the best of my knowledge, to a kernel with 3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 reverted.

Is there a possibility of the M627 slowly dying over time, after exposure to the faulty kernel?

Thanks!

António

I moved this to a new thread… the split the MK27 (now solved) with the M627 issues you see.. I personally haven’t looked at long term running tests of the M627 as i had been deep in the MK27..

Regards,

Thanks, Robert.

Our customer is seeing considerable exposure to failing M627 chips after a couple of months and we’ll put some M627 BBBs in long-term runs with the affected kernel to get a better understand of what the impact may be on these units.

I’ll keep you posted on what we find, if we eventually do.

Regards,

António

Hi all,

We were fortunate to have logs and database records that allowed to rebuild the wear history of these affected devices.

Based on the data that we collected our conclusion for the M62704 is

  • EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B increases two steps (20%) at each 12 (that’s right, twelve) days of uptime on average;
  • EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A increases about 10% within the same time period;
  • The eMMCs still appear to retain data and we can access their filesystems if we mount them readonly and without journal. Occasionally we get I/O errors or an unresponsive chip.
  • Most of the affected eMMCs in the boards that we collected freak out at some point if we try any kind of write operation on them (which is expected, given their wear level).

The above is, of course, contingent on the workload of the software running on the BBB, but the speed up in wear is evident from the moment that we install an affected kernel.

I hope this helps, regarding assessing the impact of this patch on the M627. Stay safe!

Cheers,

António

Hi António,

Any further news or theories on the wear you are seeing?

We too are experiencing wear issues with the M62704 (BBG) and W62704 (BBB Industrial).

These are the majority of our 74 boards (see attached spreadsheet). A couple of the BBG have MK2704. These boards started going into the field since mid 2024.

Even though some boards are showing end of life (according to wear) the boards are operating normally, but we do not know how long this will last.

Occasionally a board will become either corrupt or only allow the flash to mount read only. Both require a board change. Returned boards show eMMC wear. All the boards are running Linux 6.1.20.

Given @RobertCNelson has created a separate thread here, I assume this is a different issue to the MK2704 above.

We only notice the wear metrics changes after a power cycle of the board.

Sometimes our application looks like a contributor (large db log index), but often this is not the case. We have yet to be able to reproduce wear increases without doing excessive db writing. This is not the case for many of our boards in the field.

The uptime of the boards can be very long.

Do you think the uptime is a contributing factor? Would rebooting each day help reduce wear?

Look forward to any probing questions or theories.

Thanks,

David

emmc_collection_tracker_2026-04-28_vs_2026-05-14.xlsx (186.3 KB)

Have you tried down clocking the speed of the eMMC via the device tree?

Not yet. Has that solved strange wear issues in the past?

I had to do it on an odroid c2 that was showing crc errors, underclocking it solved the problem there (200MHz to 150MHz). I posted else where on how to do it - probably easiest is to do a device tree overlay that makes the change, if you just want to try it, it’s easiest to interrupt uboot, and then modify the device tree before boot. As it’s simply to try - would say it’s worth trying …

We now have confirmed we lack the fixes for the Kingston MK2704, so our first step is getting this known problem fixed. I assume Kingston may have reused much from the MK2704 in the M62704/W62704. I will see if I can get the option to do overlays put back in the build (assuming I need dtc).

The ‘mount’ shows (rw, relatime), which should be good for reducing wear.

It’s not so much as a Kingston issue.

the shci-omap aggressive power managment is to blame, it was fixed in 6.1.134: Making sure you're not a bot!

Backport this commit to your v6.1.20 stack as soon as possible: Making sure you're not a bot!

MMC_CAP_AGGRESSIVE_PM should only be used for “sdio” devices, not devices with an eMMC connected..

We had Kingston do a failure analysis on it, MMC_CAP_AGGRESSIVE_PM was resetting the bus 100’s of times a second trying to save power, eventually the “eMMC” controller just dies and gives up from the abuse.

Regards,

1 Like

Thanks for clarifying Robert