Hard Poweroff Tolerance

Hello,

How is the Beagle Board running Angstrom Linux at tolerating hard
poweroffs? Of course of interest are the filesystem, the integrity of
the SD card itself, and the hardware.

I've used Voyage Linux before on x86 embedded systems and it defaults to
booting with a read-only kernel. Even so, I've seen file corruption,
which I haven't been able to correlate to anything except repeated
hard-poweroffs, even though the drive is mounted read-only. I have not
been able to yet determine whether the issue is the hardware itself
(power spikes on the flash controller during hard poweroff maybe?).

Does anyone have specific experience or data on this? Is a Beagle Board
safe to hard power-off? What about Angstrom? Is there a good way to get
to a read-only mount of the disk? I've tried the basic remount command,
but of course files are open on the filesystem (syslog, etc), and it
fails. Does having ext3 journaling make it really poweroff tolerant for
this type of system (where hard poweroff is the norm rather than the
exception)?

I feel like I have a pretty good idea what the answers are, but the data
corruption I've experienced in the past make me a bit hesitant about it.

Thanks for your insight,

Alan.

I of course meant "read-only filesystem" not "read-only kernel."

Hi Alan,

We've had similar concerns at my company. I'm not privy to the
construction of the SDcard. However, I don't think that we've set up
a RO filesystem (maybe Martin will chime in here...). I believe it's
the standard/default image.

I happen to be scripting a power-test setup now using a SSRelay
controlled by another computer's serial port. I'll be turning the
system off at random intervals (from 1 second to 5 minutes). Then I
confirm communication over ethernet and twiddle some stuff (if the
system is 'on' long enough).

I'll let you know of any failures.

The benchtop setup should be running by later today. I'll leave it
running for weeks (or death).

In the course of things, I've already power-cycled the system dozens
and dozens of times. No issues.

HW setup:
BeagleBoard -xM. No Video. Communications: Ethernet, I2C Bus #2
(Expansion Port) to other gadgets. No intentional file IO (e.g. no
dynamically saved data).

-Chris

Alan said:
"Does anyone have specific experience or data on this? Is a Beagle
Board
safe to hard power-off?"

How is the Beagle Board running Angstrom Linux at tolerating hard
poweroffs? Of course of interest are the filesystem, the integrity of
the SD card itself, and the hardware.

Actually, it is not a Beagleboard problem, but a combination of filesystem type and flush/sync frequency.

A multitasking operating system is almost always reading or writing files somewhere. You may try the command lsof to see the list of files and sockets; when the fourth column (“FD”) contains a number followed by “w” or “u”, then it means that there is a file in “open” status for writing or updating. The number of them can be astonishing (I see more than 1000 w/u on my old Ubuntu desktop system).

Every time you power off, the system has no time to flush “write” buffers. That is, a “write” operation could be aborted in the middle of updating the file allocation table (no MSDOS pun intended), thus leaving discrepancies between the actual data on disk and the actual directory hierarchy data (larger discrepancies if the “disk” has large “blocks”, like memorycards or flashmemory). “Repairing” it means checking allocation structures trying to guess what is missing (possibly losing files). Most Linux systems just check and repair at every reboot, requiring user input only for the most dramatic cases.

You may want to go back to the stone age using a filesystem with “sync” option (that is, return control to the program only after completing/syncing a write operation), but this will make the system slow and unusable (and will overstress the disk with too much “write” operations: remember that memory cards life depends on “write/erase” cycles: with different memorycards your mileage will vary). Delaying write operations means that the operating system can sort/optimize/reduce them (for examples, if modifying ten files in one directory, the directory could be written once instead of ten times) and complete when there is less CPU stress.

A possible solution could be building and tuning a complete Linux system with read-only root (a hard poweroff won’t damage the filesystem), variable data in tmpfs (a hard poweroff won’t leave broken directories), and read/write storage on a possibly “sync”-mounted partition (that is: only files that need to be frequently written/deleted go there, to minimize damage). This is quite an expert task.

A simpler solution, if the filesystem is small enough, is using a large RAM-disk at boot (and the read/write user data storage on a SD). The Beagleboard xM has plenty of RAM. If you don’t need to use Firefox 4 with lots of open tabs, you may consider it.

That said, you sholdn’t call it an issue. Since December my xM is happily running in the car of a coworker, off its original microSDHC on which I installed Ubuntu 10.10. It has an internet connection (UMTS USB key) and a GPS to tweet its position from time to time. This means it has a “hard poweroff” everytime the motor stops (thus, at least 20-30 times per week). It does many disk writes because it starts a full Ubuntu Unity desktop and auto-updates Ubuntu packages daily. This means that the microSDHC is quite stressed with “writes”. To date, it never had problems. Some weeks ago I bought and prepared another 4Gb microSDHC, but it appears that the microSDHC which came with the xM is far from dead, and all of those “hard poweroff” did never require anything more than boot time automatic check/repair.

Alan Ott wrote:

Hello,

How is the Beagle Board running Angstrom Linux at tolerating hard
poweroffs? Of course of interest are the filesystem, the integrity of
the SD card itself, and the hardware.

if you are concerned with hard power off, then make it a soft power
off instead. Add a small battery that can run the BB for a minute or
so and sync/unmount your file systems when you lose power...

Alfonso Martone <alfonso.martone@gmail.com> [2011-03-14 21:40:22]:

That said, you sholdn't call it an issue. Since December my xM is
happily running in the car of a coworker, off its original microSDHC on
which I installed Ubuntu 10.10. It has an internet connection (UMTS USB
key) and a GPS to tweet its position from time to time. This means it
has a "hard poweroff" everytime the motor stops (thus, at least 20-30
times per week). It does many disk writes because it starts a full
Ubuntu Unity desktop and auto-updates Ubuntu packages daily. This means
that the microSDHC is quite stressed with "writes". To date, it never
had problems. Some weeks ago I bought and prepared another 4Gb
microSDHC, but it appears that the microSDHC which came with the xM is
far from dead, and all of those "hard poweroff" did never require
anything more than boot time automatic check/repair.

You've good luck then, I've seen some completly dead SD cards after such
sudden power-offs. I would suggest to either make whole SD card read-only or
don't rely on it at all. If you need to do that ro/rw mix on the same SD card,
use some battery assisted battery shutdown. The better option is to not rely
on some random firmware in the SD cards and use rather NAND flash directly
(not an option on the XM).

-- ynezz

Alan Ott <alan@signal11.us> [2011-03-14 12:24:34]:

Does anyone have specific experience or data on this? Is a Beagle Board
safe to hard power-off? What about Angstrom? Is there a good way to get
to a read-only mount of the disk? I've tried the basic remount command,
but of course files are open on the filesystem (syslog, etc), and it
fails. Does having ext3 journaling make it really poweroff tolerant for
this type of system (where hard poweroff is the norm rather than the
exception)?

It's possible to run Angstrom from the read-only filesystem, but you need to
tweak it little bit. I think, that it's quite hard to make the running /
read-only.

I feel like I have a pretty good idea what the answers are, but the data
corruption I've experienced in the past make me a bit hesitant about it.

It's definitely possible, but it needs some additional work (and knowledge).

-- ynezz

No complaints... just observations:
http://dl.dropbox.com/u/1861361/error_log2.png

No failures or detected errors yet. Roughly 1000 power cycles, and
still going.

It is a bit painful to watch.

Nothing too scientific here... Extended boot time, by a few seconds,
every ~90minutes (automated fsck/repair?).

-Chris

Hi Chris,

Thanks for the info. Please keep us posted if you find anything in your
continued testing. How long do you plan to let it run?

Alan.

" How long do you plan to let it run?"

At least over this weekend; and then until I get around to going to
the back of the lab to turn it off.
Lets say 600 cycles/24 hours. How about 3,000 cycles / 5 days?

At 1.5 power-cycles/day that would be approximately 5 years worth of
power switching.

It should be noted that I'm switching the entire subsystem (includes
AC supply, xM, our custom xM I2C expansion bus with I2C Muxi, DACs,
A2D, and DIO, and ~10 I2C-coupled external gadget-cards).

Any requests?

-Chris

Nothing specific. That seems about the kind of abuse that my system will
get. Thanks for sharing!

Alan.

It would be interesting to have varying degrees of
cache fill and system activity.

Regards

cww

Alan Ott wrote: