[Beaglebone] SD Card Corruption on Read Only File System

I also have so many issues with SD-cards, even though they are used in RO mode. We have tried at least 5 different brands, and they are equally unreliable when compared to CF cards. By next week, I should get a new industrial grade micro-sd card made by ATP to test. Supposively, ATP’s micro-sd cards use Multi-Level Cell (MLC) technology to them more reliable than any other sd-cards. I’ll update you guys on ATP’s sd-card liability in about 10 days. I hope their SD-cards will hold up; otherwise, i really need to move to nand mtd.

Aren't SLC (single level cell) flash more durable than MLC? You can
use MLC as SLC, it just holds less data but should be more durable. At
least that was my understanding.

EEtimes seems to agree with me [1].

[1]:http://www.eetimes.com/General/PrintView/4390427

-Andrew

Someone educate me on this please - isn't NAND what is used in SD
cards, and won't any NAND suffer the same concerns with wear
levelling, bad black management, and number of write cycles?

Yes, but if you let Linux (or anything with half a wear leveling
algorithm in it) write in a civilized manner, each erase block will get
less writes and hence last longer.

Read Arnd's LWN column [1] to understand garbage collection and open
erase blocks.

[1]:Optimizing Linux with cheap flash drives [LWN.net]

You may be moving the direct control out of the SD card's built in
controller to somewhere else, but ultimately the issues remain and
must be addressed, yes?

Yes, but SD card controllers are built to a dollar amount, not for
performance. Kingston's controllers are usually junk. SanDisk and
Samsung usually have better controllers. Having direct control of the
flash from Linux should give the best results as Linux has quite a lot
of work done on it to provide robust operation with flash file systems.

CF might be better than SD simply due to the use cases, such as pro
cameras and video, whereas SD is more targeted at consumers where
beating the living daylights out of their gear isn't the common
use-case. CF also has a long history of being expensive. SD doesn't.

-Andrew

Hi guys,
I thinks it’s time to report also to you the results of the test I mentioned in the last post.
=> 2 boards with same sd cards installed, one of them with a scheduled reboot and both with a stress program found here: http://weather.ou.edu/~apw/projects/stress/

Hence:
after one week:

  • the board with periodic reboot was ok. (kept is on and keep monitoring)
  • the board always on was unaccessible from ssh but the network was somehow alive (ping ok).
    I connected the usb cable to enter to the serial console, the login propt is shown but no way to access, so noway to soft reboot. Still connected to serial, I made an hard reboot. Bootloader fails.
    Take out the sd and try to read it into a card reader on my laptop. The partitions were there but large amount of files unaccessible as expected. I tried fsck several times (20 or more) and some error was corrected but every time the fs was still corrupted at the end (fsck said).
    I prepared a new sd where I prepared a homemade partition table using ext3 instead of ext4. Copied all the content from the original partition ext4 to the new one, creating a tar file to avoid symbolic links loops. Placed the new ext3 sdcard into the initial board.

Now is up and running without problems from more then 20 days with the same disk stress process massively writing the sd!

So my solution is to use ext3. And the cause is a misbehavior of ext4 with sd cards or the partition table provided from the factory is not correct, I don’t know…
Up to you

byebye

p.s. the other board with ext4 and automatic reboot is not accessible from time to time between 2 reboots… so not a solution

People,
I unfortunately have to withdraw the above…

After a few days after a proudly wrote the report of my tests the sd filesystem started to be corrupted somewhere and I was no more able to connect with ssh. A manual reboot from serial console caused the kernel to panic because of impossibility to load some library after remounted the rootfs.

No way!
We still look for a solution! :frowning:

Don’t use damn kingston cards. Buy a fresh sandisk class 4 or class 6.
After all, mount your filesystem read only. If you want to keep some files temprorary, use ramdisk.
I promise you can not corrupt your sd card with this scenario.

Regards
Ozkaya

2013/3/14 SKiAt <theskiat@gmail.com>

After a few days after a proudly wrote the report of my tests the sd filesystem started to be corrupted somewhere and I was no more able to connect with ssh. A manual reboot from serial console caused the kernel to panic because of impossibility to load some library after remounted the rootfs.

No way!
We still look for a solution! :frowning:

SkiAt,

I just want to share with you my experience. Using Ext3 on a micro-sd flash is a bad idea b/c of journaling support in ext3/ext4 filesystem. It will tend to wear and kill your sd flash even quicker.

The approach i am taking is using a fat filesystem as the root partition of the sd-card. I then mount the root filesystem from a squashfs archive. Next, overlay the root-filesystem with advanced union filesystem (aufs) with a tmpfs. Now, i would have an OS that would not write the the flash card but ram. This is by far the best approach i can think of to minimize flash errors. Although i still get flash error, but i believe it is at the MMC bus level (signal level), not actually the memory chip.

I jumped ship the other way, from the Arduinos to the BBone because I
needed the raw horsepower.

I think that 3.8 or whatever the stable revision number ends up being is
going to be much better than that. I'm working with 3.8 now. I can pop
out the card from a running Bone, take it back to my office and do
whatever and when I return to the lab, the Bone is still running.
They've apparently grabbed the 3rd LED to indicate RAMdisk activity
because I see that LED flicker a lot but very little activity on the SD LED.

John

What exactly is the workload you're subjecting the SD cards to?

I strongly suspect that the way you're using the cards is what's
hurting them, if you have good power and aren't doing unclean shutdowns.
Moving to some other dev kit won't change this.

-Andrew

Andrew,
In my case is just bring up the linux with my application that uses just a couple of serial port and some ip socket.
and after some time the filesystem goeas..

The last test I'm going to try is to mount rootfs from usb device (an industrial grade compact flash) let's see what will happen..

bye
keep in touch

Hi Andrew,

Steps to reproduce:

1. install image on sd card that comes with the bone.

2. change timezone and localtime to match where i live

3. add init.d script and rc.d to on startup change one of the gpio pins to interrupt input

4. add a folder to /var containing a script that every 5 mins hits up proc to see how many interrupts the gpio pin has had. add a link to this script to cron.d.

5. save the new number in one of the many ram backed filesystem locations.

6. use curl to upload the difference between the current and last count to a web service.

7. wait a month or two.

So... yeah. Aside from about 5 nonvolatile changes to the fs during initial systemsetup, nothing should be touching the flash card at all.

As an aside, the gpio pin is attached to an open collector pulse power monitor and I get really bad bounce on the beagle io pins. I've never seen bounce on digital pulse interfaces before... yet another annoyance with the bones...

Regards,
Jon

Steps to reproduce:

1. install image on sd card that comes with the bone.

Which other manufacturer and models of SD cards have you observed this
on?

Have you tried SanDisk Mobile Ultra 4 GB cards or Samsung Plus 8 GB
cards? Do you see the same results over time?

2. change timezone and localtime to match where i live

3. add init.d script and rc.d to on startup change one of the gpio
pins to interrupt input

4. add a folder to /var containing a script that every 5 mins hits up
proc to see how many interrupts the gpio pin has had. add a link to
this script to cron.d.

5. save the new number in one of the many ram backed filesystem
locations.

Which of "many" ram backed locations? Are you sure it's tmpfs or a
ramdisk?

Do you have swap enabled? (sorry, I don't run the stock Angstrom that
comes with bones)
If so, disable it. Even setting your swappiness properly can still end
up writing to swap when you don't expect or want (on an SD card you
_NEVER_ want to write swap).

6. use curl to upload the difference between the current and last
count to a web service.

Are you sure this isn't writing logs somewhere?

7. wait a month or two.

So... yeah. Aside from about 5 nonvolatile changes to the fs during
initial systemsetup, nothing should be touching the flash card at all.

As an aside, the gpio pin is attached to an open collector pulse
power monitor and I get really bad bounce on the beagle io pins. I've
never seen bounce on digital pulse interfaces before... yet another
annoyance with the bones...

Regarding GPIO bounce, the GPIOs are reasonably quick on am335x, do you
see bounce when observing them with a scope? I believe the GPIO
subsystem is on a 100 MHz clock that rotates between each bank (check
TRM for sure) so 25 MHz input is possible (although the timing would
have to line up nicely to avoid aliasing). Sorry, I'm not much help
here other than to suggest you put some hardware filtering on there or
debounce in software.

-Andrew

Steps to reproduce:

  1. install image on sd card that comes with the bone.

Which other manufacturer and models of SD cards have you observed this
on?

Have you tried SanDisk Mobile Ultra 4 GB cards or Samsung Plus 8 GB
cards? Do you see the same results over time?

  1. change timezone and localtime to match where i live

  2. add init.d script and rc.d to on startup change one of the gpio
    pins to interrupt input

  3. add a folder to /var containing a script that every 5 mins hits up
    proc to see how many interrupts the gpio pin has had. add a link to
    this script to cron.d.

  4. save the new number in one of the many ram backed filesystem
    locations.

Which of “many” ram backed locations? Are you sure it’s tmpfs or a
ramdisk?

Do you have swap enabled? (sorry, I don’t run the stock Angstrom that
comes with bones)
If so, disable it. Even setting your swappiness properly can still end
up writing to swap when you don’t expect or want (on an SD card you
NEVER want to write swap).

  1. use curl to upload the difference between the current and last
    count to a web service.

Are you sure this isn’t writing logs somewhere?

  1. wait a month or two.

So… yeah. Aside from about 5 nonvolatile changes to the fs during
initial systemsetup, nothing should be touching the flash card at all

Can you keep the rootfs marked as read-only? I see very little that could be influenced by the hardware here and advise you to test your use case well on any platform, since the Bone isn’t likely causing your issue.

As an aside, the gpio pin is attached to an open collector pulse
power monitor and I get really bad bounce on the beagle io pins. I’ve
never seen bounce on digital pulse interfaces before… yet another
annoyance with the bones…

Regarding GPIO bounce, the GPIOs are reasonably quick on am335x, do you
see bounce when observing them with a scope? I believe the GPIO
subsystem is on a 100 MHz clock that rotates between each bank (check
TRM for sure) so 25 MHz input is possible (although the timing would
have to line up nicely to avoid aliasing). Sorry, I’m not much help
here other than to suggest you put some hardware filtering on there or
debounce in software.

The slew rate on the digital outputs is programmable. Try setting the lower slew rate.

How exactly are you interfacing to this external device? And what
exactly is the device? It sounds like some kind of power meter or
similar?

-Andrew

Kingmax and Kingston cards are all crap. If they fail, that's kind of
expected operation. The low end SanDisk stuff is questionable, too,
especially if you're buying from anywhere that's not a reputable
retailer of SanDisk (ie: cards bought on eBay would be not reputable).

Buy some Samsung Plus series 8 GB uSD cards from a reputable location,
like NewEgg or Amazon (assuming those are available where you live).

I'm also a fan of SanDisk Mobile Ultra 4 GB class 6 uSD cards but the
Samsung should (it doesn't, but it *should*) have better performance
based on the way the controller inside works.

-Andrew

I just want to share with you my experience. Using Ext3 on a micro-sd flash is a bad idea b/c of journaling support in ext3/ext4 filesystem. It will tend to wear and kill your sd flash even quicker.

The approach i am taking is using a fat filesystem as the root partition of the sd-card. I then mount the root filesystem from a squashfs archive. Next, overlay the root-filesystem with advanced union filesystem (aufs) with a tmpfs. Now, i would have an OS that would not write the the flash card but ram. This is by far the best approach i can think of to minimize flash errors. Although i still get flash error, but i believe it is at the MMC bus level (signal level), not actually the memory chip.

The squashfs solution sounds like the best solution. Basically it runs like a LiveCD with nothing to corrupt.

Can you please elaborate on the steps you took to accomplish this?

-Ed

I don't buy memory cards off ebay :slight_smile: Too many people I know get rubbish that way. We have a local supplier with small margins anyway, so it's not usually a price issue going the local retailer way.

Next time it fails I'll try one of the more expensive cards if you like, though if this turns out to be the answer I'd be a little irritated that the card that comes with the devices when you buy them are rubbish and fail within a month.

(Actually if that were the case our local supplier ought to be worried as Australia has some pretty good consumer protections which I would think would kick in if suddenly we all decided to put the sd card issue to them... and I do know others affected; its not just me!)

Cheers,
Jon

How exactly are you interfacing to this external device? And what

exactly is the device? It sounds like some kind of power meter or

similar?

Yep it’s a power meter with an open collector pulse output (90ms width). I have the gpio configured for internal pullup and then I have the gpio pin wired through the pulse output open collector and a nominal resistor (not large… maybe 270ohms? can’t remember exactly) to ground.

Jon

Well, if you read the SD spec, using an SD card as a root file system
isn't really one of the designed use cases. The main goal is for
recording still images or video in very specific ways. Random access,
especially lots of small writes, isn't something SD was meant to do.
It just so happens that the bus is quite simple to implement for single
board computers and the memory itself can be found for < $1 / GB so it's
become popular.

The cards that come with Beagles are crap. All of the reasonably low
priced Kingston cards are crap (I've not seen much testing of the
expensive ones so I reserve judgment on those). Kingston cards ship
with Beagles, in my opinion, because of the pricing and availability,
not because of the quality.

There's the old adage:
1. Fast
2. Cheap
3. Good
Pick two.

Rasp Pi doesn't come with an SD card (at least not in the USA last I
checked). Many other single board computers, if they do come with an
SD card, also come with Kingston cards. Going to some other board
isn't going to help here. Spend the $10 (USD) on a decent uSD card,
it's worth it regardless of what use or board you have.

If you'd like, send me a self addressed stamped envelope and I'll mail
you a bunch of crap Kingston cards. I have at least 15 in a box under
my desk at work and I have no use for them.

-Andrew

Well, if you read the SD spec, using an SD card as a root file system
isn't really one of the designed use cases. The main goal is for
recording still images or video in very specific ways. Random access,
especially lots of small writes, isn't something SD was meant to do.
It just so happens that the bus is quite simple to implement for single
board computers and the memory itself can be found for < $1 / GB so it's
become popular.

Understood, so I'd imagine these board designers would be doing everything to ensure the software was set up in such a way as to reduce the chance the device would fail in 30 days.

If they aren't, then you can't just say "yes well their product destroys hardware but to be fair that hardware wasnt designed with their product in mind" - the beagle makers made a design decision to use that hardware; either make it work or make it clear you're going to be feeding them 12 cards a year and that they better keep good backups.

The cards that come with Beagles are crap.

Again, that may well be true, and buying expensive cards which somehow let you do what the spec you referred to implies they weren't designed to do, but regardless of that, the problem is either the hardware or the software and both came with the device in the default config, so....

Anyway, I assume these things are failing due to excessive (compared to their intended use as camera storage) writes, in which case what on earth is writing to the cards? I saw others commenting that ext4 is possibly to blame, but still I'd say something is still hitting the disk unintentionally...

I'll ignore the rest of your comments because they were largely and uncharacteristically unhelpful.

Jon