Modified BBB-eMMC flasher for debian is failing with ext4 errors

Hi,

So I’m working with a modified version of Robert Nelson’s flasher script (https://github.com/RobertCNelson/boot-scripts/blob/master/tools/eMMC/bbb-eMMC-flasher-eewiki-ext4.sh). I am flashing debian, but I’m having issues with corrupt results.

The main differences are:

  • It does some things specific to my application, like performing a few web requests.
  • Instead of just creating one partition, it creates the root/boot partition, and a second read-write partition.

The partition table looks like this:

`

Formatting: /dev/mmcblk1

Disk /dev/mmcblk1: 119808 cylinders, 4 heads, 16 sectors/track
Old situation:
New situation:
Units: 1MiB = 1024*1024 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End MiB #blocks Id System
/dev/mmcblk1p1 * 1 1500 1500 1536000 83 Linux
/dev/mmcblk1p2 1501 3743 2243 2296832 83 Linux
/dev/mmcblk1p3 0 - 0 0 0 Empty
/dev/mmcblk1p4 0 - 0 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table …

`

  • It was originally based on an older version of the script, however, some parts have been updated. The EEPROM flashing for instance, has been upgraded to the current version of RCN’s script.

The program flow of dd_bootloader → sfdisk → format → rsync is still the same as in RCN’s script.

The problem:
Sometimes (only sometimes, and not with a predictable pattern that I can discern), the flashing will fail and produce an unbootable image.
When the flasher will produce a corrupt image, errors in rsync’s attempts to copy over files to the root file system occur, just after the ext4 filesystems for both partitions have been created.

Here is an example:
`

Copying: /dev/mmcblk0p1 → /dev/mmcblk1p1
mount /dev/mmcblk1p1 /tmp/rootfs/ -o async,noatime
rsync: / → /tmp/rootfs/
[ 432.938083] EXT4-fs error (device mmcblk1p1): ext4_mb_generate_buddy:757: group 0, block bitmap and bg descriptor inconsistent: 32768 vs 23112 free clusters
[ 432.953982] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 432.984704] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.011220] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.041931] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.064714] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.087946] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.109954] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.131633] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[ 433.153583] EXT4-fs error (device mmcblk1p1): ext4_find_dest_de:1829: inode #2: block 6119: comm rsync: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
writing to [/dev/mmcblk1] failed…

`

On boot after letting it finish* (write_failure/inf_loop doesn’t behave as expected), the following output occurs - U-Boot is unable to boot from the root partition and starts resorting to desperate measures…

`

U-Boot SPL 2015.04-dirty (Apr 14 2015 - 10:12:34)
U-Boot 2015.04-dirty (Apr 14 2015 - 10:12:34)

Watchdog enabled
I2C: ready
DRAM: 512 MiB
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
Using default environment

Net: not set. Validating first E-fuse MAC
cpsw
Hit any key to stop autoboot: 0
gpio: pin 53 (gpio 53) value is 1
Card did not respond to voltage select!
Card did not respond to voltage select!
gpio: pin 56 (gpio 56) value is 0
gpio: pin 55 (gpio 55) value is 0
gpio: pin 54 (gpio 54) value is 0
switch to partitions #0, OK
mmc1(part 0) is current device
gpio: pin 54 (gpio 54) value is 1
Checking for: /uEnv.txt …
Checking for: /boot.scr …
Checking for: /boot/boot.scr …
Checking for: /boot/uEnv.txt …
** Invalid partition 3 **
** Invalid partition 4 **
** Invalid partition 5 **
** Invalid partition 6 **
** Invalid partition 7 **

FAILSAFE: U-Boot UMS (USB Mass Storage) enabled, media now available over the usb slave port …
UMS: disk start sector: 0x0, count: 0x750000
\

`

It appears that somehow the root partition is being corrupted. Attempting to mount that partition on the eMMC, while booted from a non-flasher SD card gives the following results:

`

root@arm:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
mmcblk1boot0 179:16 0 1M 1 disk
mmcblk1boot1 179:24 0 1M 1 disk
mmcblk0 179:0 0 7.4G 0 disk
└─mmcblk0p1 179:1 0 7.4G 0 part /
mmcblk1 179:8 0 3.7G 0 disk
├─mmcblk1p1 179:9 0 1.5G 0 part
└─mmcblk1p2 179:10 0 2.2G 0 part
root@arm:~# mount /dev/mmcblk1p1 /media
[ 4127.908991] EXT4-fs (mmcblk1p1): no journal found
mount: wrong fs type, bad option, bad superblock on /dev/mmcblk1p1,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.
root@arm:~# mount /dev/mmcblk1p2 /media
root@arm:~# ls /media
etc logs lost+found photos var

`

mmcblk1 is the eMMC in this situation. As can be seen, it’s not possible to mount the boot partition, but it is possible to mount the second partition produced. This is consistently the case.

The sfdisk command the flasher runs is:

`

LC_ALL=C sfdisk --force --in-order --Linux --unit M “${destination}” <<-EOF
${conf_boot_startmb},${root_part_size},${sfdisk_fstype},*
,${sfdisk_fstype}
EOF

`
where startmb=1, root_part_size=1500, sfdisk_fstype=0x83 (linux partition).
and the second line describes the second partition which fills all remaining space.

I have tried:

I have tried checking the power supply to the beaglebone (the beaglebone is powered by a daughterboard) with an oscilloscope, and it appears to be stable.

Different SD cards - they still fail

Different Beaglebones

Adding more sync/flush_cache operations around everything in partition_drive and copy_rootfs

I also added this (unexplained code in RCNs script which claims to flush the eMMC buffers - anyone know why this works?)

`

#https://github.com/beagleboard/meta-beagleboard/blob/master/contrib/bone-flash-tool/emmc.sh#L 158-L159

force writeback of eMMC buffers

flush_cache
dd if=${destination} of=/dev/null count=100000
sync

`
in many places around the script.

Does anyone have any suggestions on what might be causing these ext4 errors or what I might be doing incorrectly with setting up the partition table?

I noticed that RCN’s script only uses dd to wipe the first 108MB - should it be wiping the whole image? Why 108MB?

Thanks for making it this far :slight_smile:
Joshua Collins

Important version numbers:

Kernel: 4.1.17-bone19
Debian: 8.3-minimal-armhf-2016-01-25
MLO: http://rcn-ee.com/repos/bootloader/am335x_evm/MLO-am335x_evm-v2015.01-r7
U-Boot: http://rcn-ee.com/repos/bootloader/am335x_evm/u-boot-am335x_evm-v2015.01-r7.img

Most often, these ext4's errors are actually due to power fluctuations.

They seem to happen more often when powering via USB, but i've seen it
happen via a 5v DC supply too.

Or if a cape is installed that is sharing the eMMC pins'.

Regards,

Hi Robert,

I thought I had checked for power supply issues and not found anything, but I checked again and found problems. Our cape board powers the beaglebone through the VDD_5V pins on the header.

Here are the VDD_5V (blue) and VDD_3V3B (yellow) rails when the beaglebone is powered from our custom cape, captured at a point during flashing:

So, we’ve found a problem.

But here is what those same rails look like when the beaglebone is powered by a 5A lab supply through the barrel jack, without our custom cape connected - it’s interesting that there are still significant (albeit much smaller) fluctuations. Are they large enough to be a problem? I wonder whether the beaglebone has enough local capacitance to handle the eMMC’s power demands.

Thank you for your help,
Joshua Collins

First, I think the waveforms you are displaying do not look like they originate from the power rails. I suspect you have a scope probe where the ground wire and probe tip are too long and you are picking up radiated noise from other sources. Start by making the scope ground very short. I’m talking about 1/4 inch or less and measure the power rails again. The power rails use parallel power planes, and array of low ESR decoupling capacitors and several bulk capacitors. The design is very good and that is why these waveforms look suspect.

Regards,
John

You can test this by simply shorting the ground clip to the probe tip and moving about the board. My guess is the ringing will still be visible.

Regards,
John

http://electronics.stackexchange.com/questions/221292/tip-barrel-test-of-oscilloscope

Regards,
John

Hi all,

I performed the test as suggested with a much shorter ground wire instead of the longer one I was using. Here is a picture of the setup:

Since then, I cannot reproduce the same magnitude of noise - this is the worst I could find:

Thank you for pointing out the issue John - I hadn’t realised the probes could pick up so much noise - this isn’t a welding workshop…

So, I’m not sure whether the small spike shown would be enough to cause problems with the eMMC- the only power supply tolerance specification I could find was a small reference in the reference manual to requiring the input power to be 5VDC +/-.25V. However, the graph shown is actually for the regulated 5V on the board.

Thanks,
Joshua Collins

I think the spike you see is either power source related or you still have measurement issues. Start by using twisted pair to source the power supply or you can even just twist the existing wires onto themselves. Either way, I don’t think this is an issue. From the TPS65217C datasheet (Table 8.3), the 5V input must be between 4v3 and 5v8. Anyhow, it is the 3v3 and 1v8 rails that are important here and I suspect those rails are clean.

http://www.ti.com/lit/ds/symlink/tps65217.pdf

Regards,
John