Boot failure "external abort on non-linefetch" in cpsw_probe with any image after Wi-Fi install

Bought a BBB [0 047132904547 A6] last month, and had about two hours of delight, followed by literal days of fruitless struggle Googling for clues. I’m about out of ideas.

Contacting Beagleboard produced:
Did you send the same info to beagleboard@googlegroups.com?

So here it is… Maybe it will help someone…

There are lots of lines in my boot log that look like errors, such as:

Your kernel is out of date, please upgrade to "v3.8.13-bone35" first..

Contact "whoever" you got the image from for directions..

Regards,

I guess ^^^ that's your problem. I can't see why the WiFi will interfere
with the CPSW. After you update the image as suggested by Robert
check if you get a similar PHY timeout message. If yes, you want
to look at https://groups.google.com/forum/#!topic/beagleboard/9mctrG26Mc8

The CPSW driver should handle the error gracefully but then that's another
problem.

@RobertCNelson: “Your kernel is out of date, please upgrade to “v3.8.13-bone35” first…”

That kernel version is from attempts to boot Ubuntu. Maybe I could find a newer Ubuntu image, but I have the same kernel panic problem trying to boot Angstrom. It only says:

Booting kernel from Legacy Image at 80007fc0 …

Image Name: Angstrom/3.8.13/beaglebone
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 4270776 Bytes = 4.1 MiB
How do I tell if that is out of date? It is specifically the image the RMA people told me to use - BBB-eMMC-flasher-2013.09.04.img.xz

There seem to be a maddening variety of images available, but I haven’t found any newer production versions.

@Vaibhav:

PHY reset timed out
I guess ^^^ that’s your problem. I can’t see why the WiFi will interfere with the CPSW.

I’ve seen lots of boot logs show those lines and still boot successfully. Your links and others I’ve found are about the ethernet dying after an hour or more, or failing to start maybe 1/50 times while the rest of the board boots successfully.

My Ubuntu boots do show a later version of phy not found - just before the fatal error:

@Vaibhav,

Just found this:
http://web.archiveorange.com/archive/v/vEQ2y6LmBVoCWJwTbcnr

about a problem with clocks not being enabled before mdio: probe - which seems directly applicable to my Ubuntu fail!

“Make the driver control the device clocks. Appearantly, the Davinci
platform probes this driver with the clock all powered up, but on OMAP,
this isn’t the case.Make the driver control the device clocks. Appearantly, the Davinci
platform probes this driver with the clock all powered up, but on OMAP,
this isn’t the case.”

“Certainly, with respect to CPSW & MDIO, this patch is not enough and
requires further investigation. I have started looking at this and
hopefully will have some solution soon…”

But that was over a year ago - was it resolved?

http://elinux.org/BeagleBoardUbuntu

Kept updated once a month, and kernel's can easily be upgraded as they
are released...

Regards,

Robert,

Robert,

Sorry if I seem dense or argumentative, but I can't find any prebuilt images
using the 3.8.13-bone35 kernel. The newest images say bone32. I chose the
Ubuntu 12.04 image for the LTS, which is important to my eventual project.

LOL! "LTS" doesn't meet crap on arm, good luck getting anything fixed
that does not affect x86. Canonical does not have enough arm
developers to support it.. They only support the current release...

But at this point I'd try anything that might actually boot.

I don't suppose swapping your bone35 kernel into an image I have is
simple... I found

Example install saucy: http://elinux.org/BeagleBoardUbuntu#Saucy_13.10

then run:
wget http://rcn-ee.net/deb/saucy-armhf/v3.8.13-bone35/install-me.sh
sudo /bin/bash install-me.sh

sudo reboot

Regards,

Robert,

Robert,

Sorry! Me again…

I have your Saucy image. Verified. But it appears it requires your setup_sdcard script to be run, from a Linux machine. I’ve been using Win32DiskImager… So I’ve copied it to the netbook, and I see the script, but when I plug in my uSD card it says my system can’t mount ext4. I suppose that means Intrepid can’t directly create the uSD image either. Is there some option to write your output back to a local file and use Win32DiskImager to write it from Windows?

Scroll down you find both a flasher and microsd version for you non linux users.

As per the commit logs of the mainline kernel a patch adding pm_runtime_*
calls in the driver went in around 3.7-rc3. I believe the images that you
have which are based on v3.8 will have them in place.

I still can't think of a reason why it would worked earlier. Could be a race
condition or could even be a hardware thing. Try changing the kernel build
to narrow it down. And yes, the error that you see would typically come up
when the init fails (typically clock).

Vaibhav,

Robert,

Tried the prebuilt Ubuntu 13.10 Flasher image:
BBB-eMMC-flasher-ubuntu-13.10-2013-12-17-2gb.img.xz

Pretty much the same result as the Ubuntu uSD, and all the other images I’ve tried. Lots of apparently nonfatal errors:

U-Boot SPL 2013.10-00015-gab7a95a (Nov 08 2013 - 16:01:27)
reading args
spl: error reading image args, err - -1

WARNING: Caches not enabled
NAND: 0 MiB
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
*** Warning - readenv() failed, using default environment

Net: not set. Validating first E-fuse MAC
Could not get PHY for cpsw: addr 0
cpsw, usb_ether

Different boot file sizes:

reading uEnv.txt
1313 bytes read in 3 ms (426.8 KiB/s)
Importing environment from mmc … ← still this line that I fear is copying my problem
gpio: pin 55 (gpio 55) value is 1
Checking if uenvcmd is set …
gpio: pin 56 (gpio 56) value is 1
Running uenvcmd …
reading zImage
3334336 bytes read in 313 ms (10.2 MiB/s)
reading initrd.img
2996231 bytes read in 282 ms (10.1 MiB/s)
reading /dtbs/am335x-boneblack.dtb
24884 bytes read in 8 ms (3 MiB/s)
Kernel image @ 0x80200000 [ 0x000000 - 0x32e0c0 ]

Flattened Device Tree blob at 815f0000

Booting using the fdt blob at 0x815f0000
Using Device Tree in place at 815f0000, end 815f9133

Still the same MAC addresses:

[ 0.117178] cpsw.0: No hwaddr in dt. Using 90:59:af:4d:71:eb from efuse
[ 0.117199] cpsw.1: No hwaddr in dt. Using 90:59:af:4d:71:ed from efuse

Still the same kernel panic:

[ 2.683230] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 2.689662] davinci_mdio 4a101000.mdio: no live phy, scanning all
[ 2.696404] davinci_mdio: probe of 4a101000.mdio failed with error -5
[ 2.703442] Detected MACID = 90:59:af:4d:71:eb
[ 2.707998] Unhandled fault: external abort on non-linefetch (0x1008) at 0xe089e000
[ 2.716213] Internal error: : 1008 [#1] SMP ARM
[ 2.720957] Modules linked in:
[ 2.724162] CPU: 0 Not tainted (3.8.13-bone32 #1)
[ 2.729465] PC is at cpsw_probe+0x528/0xbc8
[ 2.733849] LR is at ioremap_page_range+0xd8/0x16c

[ 2.923102] [] (cpsw_probe+0x528/0xbc8) from [] (driver_probe_device+0xa4/0x1e4)
[ 2.932676] [] (driver_probe_device+0xa4/0x1e4) from [] (__driver_attach+0x68/0x8c)
[ 2.942529] [] (__driver_attach+0x68/0x8c) from [] (bus_for_each_dev+0x70/0x84)

I don’t think any image is going to boot this board. I am concerned that every one of them displays the line:

Importing environment from mmc …

If that means what it says, it seems like it is copying whatever configuration problem is killing my cpsw_probe to each new attempt at booting or flashing.

The only environment lines that jump out at me are:

ethact=cpsw
ethaddr=90:59:af:4d:71:eb

usbnet_devaddr=90:59:af:4d:71:eb

The original MAC addresses from ifconfig, before the boot failures:
eth0 Link encap:Ethernet HWaddr 90:59:AF:4D:71:EB
ra0 Link encap:Ethernet HWaddr 00:0C:43:00:7D:7F
usb0 Link encap:Ethernet HWaddr 6E:5A:F6:F0:F3:45

That “importing” line is echoed directly from the env:

fdt_high=0xffffffff
fdtaddr=0x80F80000
fdtfile=am335x-boneblack.dtb
findfdt=if test $board_name = A33515BB; then setenv fdtfile am335x-evm.dtb; fi; if test $board_name = A335X_SK; then setenv fdtfile am335x-evmsk.dtb; fi;if test $board_name = A335BONE; then setenv fdtfile am335x-bone.dtb; fi; if test $board_name = A335BNLT; then setenv fdtfile am335x-boneblack.dtb; fi
importbootenv=echo Importing environment from mmc …; env import -t $loadaddr $filesize
kloadaddr=0x80007fc0
loadaddr=0x80200000

Help! I need some better suggestions than to try yet another image.

Loren

Send the board in under an RMA and get it looked at.

Gerald

RMA initiated. Hope they reveal what went wrong with my board!

Found one clue, my fears about importbootenv are probably unfounded:
http://lists.busybox.net/pipermail/buildroot/2013-September/078135.html

We will check out the HW and if it is bad, will will repair it.

Gerald

We will check out the HW and if it is bad, will will repair it.

Hi Gerald,

In case it's not a logistics nightmare (which it could very well be), could
you let the list know if it's really a HW thing? Based on the multiple
things that Loren has tried out, my hunch is it might be. However, there's
always the possibility of this being due to some subtle s/w bug that
someone would need to chase down and make things more reliable.

Depends on if we get the board to look at.

Gerald

Hi,

I’m experiencing a similar behavior in a board based on the BeagleBone (the BB itself hasn’t been modified):

The boot log:

[ 2.155945] mmc0: new high speed SDHC card at address aaaa
[ 2.162581] mmcblk0: mmc0:aaaa SU08G 7.40 GiB
[ 2.169448] mmcblk0: p1 p2
[ 2.172606] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 2.179049] davinci_mdio 4a101000.mdio: no live phy, scanning all
[ 2.187284] davinci_mdio: probe of 4a101000.mdio failed with error -5
[ 2.194505] Detected MACID = bc:6a:29:84:8d:3a
[ 2.199173] Unhandled fault: external abort on non-linefetch (0x1008) at 0xd0898000
[ 2.207399] Internal error: : 1008 [#1] SMP ARM
[ 2.212149] Modules linked in:
[ 2.215364] CPU: 0 Not tainted (3.8.13-bone35 #1)
[ 2.220681] PC is at cpsw_probe+0x530/0xbcc
[ 2.225077] LR is at ioremap_page_range+0xd8/0x16c
[ 2.230101] pc : [] lr : [] psr: a0000113
[ 2.230101] sp : cf05fe38 ip : cf04d260 fp : cf43aa98
[ 2.242123] r10: 00000001 r9 : cf43ad40 r8 : d0898000
[ 2.247598] r7 : cf0d4800 r6 : 00000000 r5 : cf0d4810 r4 : cf43a800
[ 2.254436] r3 : 00000000 r2 : 00000000 r1 : 4a100e13 r0 : d0898000
[ 2.261279] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 2.268937] Control: 10c5387d Table: 80004019 DAC: 00000015
[ 2.274959] Process swapper/0 (pid: 1, stack limit = 0xcf05e240)
[ 2.281254] Stack: (0xcf05fe38 to 0xcf060000)
[ 2.285822] fe20: 00000000 00000000
[ 2.294398] fe40: cf439e08 cf43ad40 00000000 c014cff0 22222222 00000020 00000000 cf439e88
[ 2.302974] fe60: cf439e08 cf439e08 00000008 c014cee0 00000000 cf439e08 cf0d1488 cf4329c0
[ 2.311548] fe80: 00000000 c014d8a0 cf0474b8 c005e690 00000000 00000003 cf0d1488 00000000
[ 2.320123] fea0: c0a2342c cf0d4810 cf0d4818 cf0d4810 cf0d4844 c0a2342c c0998794 c09b6000
[ 2.328698] fec0: 00000000 cf05e008 00000000 c0381024 00000000 cf0d4810 cf0d4844 c0998794
[ 2.337272] fee0: 00000000 c0381210 00000000 c0998794 c03811a8 c037f860 cf047478 cf0d0c80
[ 2.345848] ff00: c0998794 cf4329c0 c098e038 c03807e8 c07f9eea c07f9eea 00000000 c0998794
[ 2.354421] ff20: c091ac78 c092d984 c090df10 c038175c 00000007 c091ac78 c092d984 c090df10
[ 2.362995] ff40: c09b6000 c0008894 c090df10 0000f434 c092d9b0 00000008 00000007 c091ac78
[ 2.371568] ff60: c092d984 c09b6000 c09b6000 000000f3 c091ac80 c08e8918 00000007 00000007
[ 2.380140] ff80: c08e8270 00000000 00000000 c0605bbc 00000000 00000000 00000000 00000000
[ 2.388711] ffa0: 00000000 c0605bc4 00000000 c000d478 00000000 00000000 00000000 00000000
[ 2.397282] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2.405854] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 e7feefe6 e6bafaae
[ 2.414456] [] (cpsw_probe+0x530/0xbcc) from [] (driver_probe_device+0xa4/0x1e4)
[ 2.424036] [] (driver_probe_device+0xa4/0x1e4) from [] (__driver_attach+0x68/0x8c)
[ 2.433888] [] (__driver_attach+0x68/0x8c) from [] (bus_for_each_dev+0x70/0x84)
[ 2.443374] [] (bus_for_each_dev+0x70/0x84) from [] (bus_add_driver+0xdc/0x218)
[ 2.452861] [] (bus_add_driver+0xdc/0x218) from [] (driver_register+0x9c/0x124)
[ 2.462350] [] (driver_register+0x9c/0x124) from [] (do_one_initcall+0x8c/0x150)
[ 2.471936] [] (do_one_initcall+0x8c/0x150) from [] (kernel_init_freeable+0x108/0x1cc)
[ 2.482071] [] (kernel_init_freeable+0x108/0x1cc) from [] (kernel_init+0x8/0xe4)
[ 2.491657] [] (kernel_init+0x8/0xe4) from [] (ret_from_fork+0x14/0x3c)
[ 2.500394] Code: e59f164c ebfe1b48 ea0000d1 e58485c0 (e5982000)
[ 2.506770] —[ end trace 9974d47096abe9bf ]—

Some more information, i decided to give it a try with the image Angstrom-Cloud9-IDE-eglibc-ipk-v2012.02-core-beaglebone-2012.02.14.img.xz available at http://www.angstrom-distribution.org/demo/beaglebone/.

Surprisingly this is what i’m getting:

-Boot 2011.09-00010-g81c8c79 (Feb 13 2012 - 14:48:03)

I2C: ready
DRAM: 256 MiB
No daughter card present
NAND: HW ECC Hamming Code selected
16 MiB
MMC: OMAP SD/MMC: 0
*** Warning - readenv() failed, using default environment

Net: cpsw
Hit any key to stop autoboot: 0
SD/MMC found on device 0
reading uEnv.txt

33 bytes read
Loaded environment from uEnv.txt
Importing environment from mmc …
reading uImage

3137440 bytes read

Booting kernel from Legacy Image at 80007fc0 …

Image Name: Angstrom/3.2/beaglebone
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 3137376 Bytes = 3 MiB
Load Address: 80008000
Entry Point: 80008000
Verifying Checksum … OK
XIP Kernel Image … OK
OK

Starting kernel …

Uncompressing Linux… done, booting the kernel.
[ 0.068294] _omap_mux_get_by_name: Could not find signal uart1_cts.uart1_cts
[ 0.068315] omap_hwmod_mux_init: Could not allocate device mux entry
[ 0.068465] _omap_mux_get_by_name: Could not find signal uart2_cts.uart2_cts
[ 0.068483] omap_hwmod_mux_init: Could not allocate device mux entry
[ 0.068644] _omap_mux_get_by_name: Could not find signal uart3_cts_rctx.uart3_cts_rctx
[ 0.068728] omap_hwmod_mux_init: Could not allocate device mux entry
[ 0.106314] cpuidle-am33xx cpuidle-am33xx.0: failed to register driver
[ 0.261139] _omap_mux_get_by_name: Could not find signal leds-gpio
[ 0.651272] omap2_set_init_voltage: unable to get clk dpll1_ck
[ 0.657458] omap2_set_init_voltage: unable to set vdd_mpu_iva
[ 0.663475] omap2_set_init_voltage: unable to get clk l3_ick
[ 0.669409] omap2_set_init_voltage: unable to set vdd_core
[ 0.927024] hub 1-0:1.0: over-current condition on port 1
[ 1.186994] hub 1-0:1.0: over-current condition on port 1
[ 1.447002] hub 1-0:1.0: over-current condition on port 1
[ 1.706975] hub 1-0:1.0: over-current condition on port 1
[ 1.966988] hub 1-0:1.0: over-current condition on port 1
[ 2.226976] hub 1-0:1.0: over-current condition on port 1
[ 2.486983] hub 1-0:1.0: over-current condition on port 1
[ 2.746982] hub 1-0:1.0: over-current condition on port 1
[ 3.006980] hub 1-0:1.0: over-current condition on port 1
[ 3.266980] hub 1-0:1.0: over-current condition on port 1
[ 3.526979] hub 1-0:1.0: over-current condition on port 1
[ 3.786975] hub 1-0:1.0: over-current condition on port 1
[ 4.046994] hub 1-0:1.0: over-current condition on port 1
[ 4.306975] hub 1-0:1.0: over-current condition on port 1
[ 4.566981] hub 1-0:1.0: over-current condition on port 1
[ 4.827045] hub 1-0:1.0: over-current condition on port 1
[ 5.086988] hub 1-0:1.0: over-current condition on port 1
[ 5.346982] hub 1-0:1.0: over-current condition on port 1
[ 5.606985] hub 1-0:1.0: over-current condition on port 1
[ 5.866990] hub 1-0:1.0: over-current condition on port 1
[ 6.126976] hub 1-0:1.0: over-current condition on port 1
[ 6.386980] hub 1-0:1.0: over-current condition on port 1
[ 6.646976] hub 1-0:1.0: over-current condition on port 1
[ 6.906974] hub 1-0:1.0: over-current condition on port 1
[ 7.166987] hub 1-0:1.0: over-current condition on port 1
[ 7.427043] hub 1-0:1.0: over-current condition on port 1
[ 7.687078] hub 1-0:1.0: over-current condition on port 1
[ 7.947139] hub 1-0:1.0: over-current condition on port 1
[ 8.207116] hub 1-0:1.0: over-current condition on port 1
[ 8.467097] hub 1-0:1.0: over-current condition on port 1
[ 8.727113] hub 1-0:1.0: over-current condition on port 1
[ 8.987448] hub 1-0:1.0: over-current condition on port 1
systemd-fsck[56]: Angstrom-Cloud9-: clean, 28959/874496 files, 748256/3494137 blocks
[ 9.247089] hub 1-0:1.0: over-current condition on port 1
[ 9.507206] hub 1-0:1.0: over-current condition on port 1

For what i’ve read online about this:

  1. too much current is being consumed by whatever is plugged into that port or

  2. the power FET (U13?) for the USB port has been damaged causing it to indicate an over current condition

On my design (BB-based), P1 (i’m assuming it’s referring to P10 Ethernet connector) is not soldered. And i doubt it’s due to the power FET.

Maybe if i compile the kernel without davinci_mdio?