Beaglebone problems

Hi,

I received a new A5 BeagleBone yesterday and I have had nothing but
problems.

Basically the OS stays up but all of a sudden the network connectivity
dies (I got booted out of SSH).
I'm running the latest Demo image, I tried:

1 - opkg update; opkg upgrade
via SSH that was a bad idea, so I tried again via USB.. That took
forever for some unknown reason (4h+) and ended in a ton of "no space
left on device" errors, even though no filesystem was at 100% as per
'df -h'

After reboot, fsck took forever and had to nuke a ton of inodes:
[ 260.903271] EXT4-fs (mmcblk0p2): ext4_orphan_cleanup: deleting
unreferenced inode 542727
...

2 - Wiped and SD card and imaged back to stock as per instructions.

So I am not sure the problem is the same as the A4 version as
obviously R219 is not there.

Thanks,
Eric

Can you tell us a little bit about your setup? How is the board being power? Can you send us some printouts from the serial port?

Gerald

Hi Gerald,

Yes, setup is bone stock A5 right out of the box. Serial # is 0812BB00xxxxxxx - this tells me it’s not a rebuild as per the info posted.
I bought the EXACT recommended power adapter from DigiKey and I have tried powering it via USB or via the power adapter.
The latest demo that shipped with the device (2012-12-14) was unusable for more than say half hour (at least via ethernet).

Basically randomly it would stop responding to ping request and boot me out of ssh, longest without drops was 32 min.

Steps I tried to isolate the problem:

  • Reimaged SD card with fresh 2012-12-14 copy from http://www.angstrom-distribution.org/demo/beaglebone/
    Result: No change
  • Tried patching os via opkg update;opkg upgrade. After 3-4h as it was finishing I say a ton of “no space left on device” errors and basically the OS was borked from that point.
    Result: Nothing useful.
  • Tried just powering via USB and just with the 5V adapter.
    Result: It does not make any difference.
  • After noticing that the crontab of ntpdate seemed to cause issues because of the large skew, I disabled it
    Result: No change, sadly
  • Just an hour ago or so, I downloaded the 2012-01-27 image.
    Result: So far, it seems a lot more reliable! I had some ping timeouts after running ntpdate manually, so I rebooted (warm). So far 0 dropouts.
  • Wondering if the “reboot” as opposed to power cycle was somehow why I stopped having dropouts, I just did a power off cycle and running this far without a problem.

So if it is a software defect, maybe some steps to reproduce:

1 - Boot of factory A5 release
2 - ping that IP for an extended period of time, note packet drops.

I hope this is enough info Gerald, I will post later tonight if I can get a “productive” evening with it.

Thanks for your help,
Eric

Thanks for the information! I will have the guys setup the same test at the factory using the A5, and see what results we get.

Gerald

Ok, corrections:
2012-02-14 instead of 2012-12-14 (typo)

dmesg shows nothing of interest on 02-14, on 01-27 I see this but I am not sure how relevant it is:

[ 217.488428] gpio_request: gpio-56 (sysfs) status -16

[ 217.488453] export_store: status -16

I may have spoken too soon, even though the 2012-01-27 image is not perfect it is much better than the 02-14 one.

I still have dropouts but they are not as long or as bad 02-14.
I notice a certain pattern:

All of a sudden ping time go from 4ms to 200+ms, then some dropouts occur. As this happens randomly, I have very little time to capture any extra info but I do not see anything crazy running via top or in the process list.

Thanks again,
Eric

OK. Let us see what we come up with!

Gerald

Many thanks Gerald, sorry for not capturing all this in one reply, it’s an evolving story :smiley:
Eric

No problem!

Gerald

Any news Gerald?

I got the same behaviour from Ubuntu (http://elinux.org/
BeagleBoardUbuntu)
I also understand why it would have passed QA, it is not a constant
problem.

I'm started to think I may have got a dud. If you can't replicate I
will go talk to Digi-Key about an RMA (I am in Canada, I think that
would be my quickest option).
Please advise as soon as you can..

Thanks
Eric

Hold on that response... I brought the unit to "work" and we're ping
flooding it right now and it looks like this hold up :S
If this was a case of a bad ethernet cable, my pride will never
recover :smiley:

Eric

We have it runnning for several hours now and so far it looks good on our end. Keep me posted!

Gerald

Ok Gerald, deeply sorry to waste your time, on this I will go hide under a rock (the network engineer who forgot to change the stupid patch cord).

:beer:

Eric

No problem at all! There are a lot of variables in all of this. You never know which one it will be!

Gerald

Please note that with the stock (11-11, at least on BeagleBone A4) microSD, if you just do "opkg update; opkg upgrade", it will fail.

It will require a long time (hours) not because of the (now large) number of packages to update, but because opkg will run out of "disk" space on the /tmp directory.

In the demo image, the /tmp filesystem is not stored on the microSD but in RAM (as it is a tmpfs). Thus, "forgotten" files and directories will waste precious RAM, slow the upgrade, and eventually crash the update process with "no disk space" errors.

This is probably due to a bug in the "opkg" installation (which does not erase the packages/directories from the /tmp directory). Stopping manually the update and rebooting and restarting update will work because at every reboot the /tmp gets cleaned. A full rewrite of a fresh demo image (as of http://www.angstrom-distribution.org/demo/beaglebone/ ) is definitely better.

Hi,