Beaglebone Black Ethernet Phy Not Detected on Boot.

krd · July 22, 2014, 1:34pm

I updated my kernel to 3.15.6-bone5, just as you’ve said. But, as it seems, this does not fix the issue completely. What I’m observing is lack of ethernet connection after some boots. They characterise with initial hangup, I can see ‘C’ being put on the console until a watchdog located on cape connected to my BBB reboots the BBB. It happens on its own, without any extraordinary operation being run on this system. For example two times last night. I have to plug out and in the power cable to fix the issue, SW reboot doesn’t help. Power is delivered to BBB via our cape. Another way to reproduce it is to plug out the sd card and wait for watchdog to reboot BBB. After that, ‘C’ are printed to the console(OS on eMMC is not bootable), and when I insert back the sd it instantly boots from it. No connection after that too. Do you have any ideas on what’s going on and how to fix this?

Below I’m attaching the log from this:

Welcome to minicom 2.7

OPTIONS: I18n
Compiled on Jan 1 2014, 17:13:19.
Port /dev/ttyUSB0, 14:22:11

Press CTRL-A Z for help on special keys

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
U-Boot SPL 2014.04-00014-g47880f5 (Apr 22 2014 - 13:23:54)
reading args
spl_load_image_fat_os: error reading image args, err - -1
reading u-boot.img
reading u-boot.img

U-Boot 2014.04-00014-g47880f5 (Apr 22 2014 - 13:23:54)

I2C: ready
DRAM: 512 MiB
NAND: 0 MiB
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
*** Warning - readenv() failed, using default environment

Net: not set. Validating first E-fuse MAC
Could not get PHY for cpsw: addr 0
cpsw, usb_ether
Hit any key to stop autoboot: 0
gpio: pin 53 (gpio 53) value is 1
mmc0 is current device
gpio: pin 54 (gpio 54) value is 1
SD/MMC found on device 0
reading uEnv.txt
1696 bytes read in 6 ms (275.4 KiB/s)
gpio: pin 55 (gpio 55) value is 1
Loaded environment from uEnv.txt
Importing environment from mmc …
using: am335x-boneblack.dtb…
Checking if uenvcmd is set …
gpio: pin 56 (gpio 56) value is 1
Running uenvcmd …
reading zImaget it. After that, ‘C’ are printed to the console(OS on eMMC is not bootable), and when i insert back the sd it instantly boots from it. No ethernet after that too. Below I’m attaching the log f
6224648 bytes read in 343 ms (17.3 MiB/s)
reading initrd.img
2957458 bytes read in 218 ms (12.9 MiB/s)
reading /dtbs/am335x-boneblack.dtb
31128 bytes read in 11 ms (2.7 MiB/s)
Kernel image @ 0x82000000 [ 0x000000 - 0x5efb08 ]

Flattened Device Tree blob at 88000000

Booting using the fdt blob at 0x88000000
Using Device Tree in place at 88000000, end 8800a997

Starting kernel …

[ 0.581100] omap_init_mbox: hwmod doesn’t have valid attrs
[ 2.361161] ti_reset_probe: missing ‘resets’ child node.
[ 2.381165] pinctrl-single 44e10800.pinmux: pin 44e10950.0 already requested by 48024000.serial; cai
[ 2.393187] pinctrl-single 44e10800.pinmux: pin-84 (48030000.spi) status -22
[ 2.400599] pinctrl-single 44e10800.pinmux: could not request pin 84 (44e10950.0) from group pinmuxe
[ 2.413381] omap2_mcspi 48030000.spi: Error applying setting, reverse things back
[ 2.542846] Error: Driver ‘tfp410’ is already registered, aborting…
[ 2.617805] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 2.624614] sr_init: platform driver register failed for SR
Loading, please wait…
modprobe: chdir(3.15.6-bone5): No such file or directory
modprobe: chdir(3.15.6-bone5): No such file or directory
modprobe: chdir(3.15.6-bone5): No such file or directory
systemd-fsck[211]: BBB_OS: clean, 43555/63232 files, 219629/262144 blocks
[ 8.170751] libphy: PHY 4a101000.mdio:03 not found
[ 8.175793] net eth0: phy 4a101000.mdio:03 not found on slave 1

The IP Address for eth0 is: 192.168.44.145
The IP Address for usb0 is: 192.168.7.2
beaglebone login: root
Linux beaglebone 3.15.6-bone5 #1 Fri Jul 18 14:35:50 CEST 2014 armv7l
root@beaglebone:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 192.168.44.250 0.0.0.0 UG 0 0 0 eth0
192.168.7.0 * 255.255.255.252 U 0 0 0 usb0
192.168.44.0 * 255.255.255.0 U 0 0 0 eth0

root@beaglebone:~# ping 192.168.44.250
PING 192.168.44.250 (192.168.44.250) 56(84) bytes of data.
From 192.168.44.145 icmp_seq=1 Destination Host Unreachable
From 192.168.44.145 icmp_seq=2 Destination Host Unreachable
From 192.168.44.145 icmp_seq=3 Destination Host Unreachable
^C
— 192.168.44.250 ping statistics —
6 packets transmitted, 0 received, +3 errors, 100% packet loss, time 5004ms
pipe 3
root@beaglebone:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 192.168.44.145 icmp_seq=1 Destination Host Unreachable
From 192.168.44.145 icmp_seq=2 Destination Host Unreachable
From 192.168.44.145 icmp_seq=3 Destination Host Unreachable
^C
— 8.8.8.8 ping statistics —
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4008ms

Jay · July 23, 2014, 10:54pm

When you see the repeating C’s, I believe that is the ROM not finding u-boot. You have an issue with the mailbox drivers

I use the following mailbox values for my kernel config

CONFIG_MAILBOX=y |

|

CONFIG_OMAP_MBOX is not set |

CONFIG_OMAP2PLUS_MBOX is not set |

CONFIG_OMAP_MBOX_KFIFO_SIZE is not set |

The davinci mdio driver should report a phymask and that value is used to update the device tree.

I load the following drivers in the following order.

libphy
smsc
davinci_cpdma
davinci_mdio
ti_cpsw

I also remove the second phy slave from the device tree.

I would verify your device tree blob has the correct values for your pin setup.

Regards

Loren_Amelang · July 24, 2014, 8:10pm

The davinci mdio driver should report a phymask and that value is used to update the device tree.

Back when I had this problem I tried hard to find out where the phymask comes from, and never succeeded. At that time people who received a phymask of fffffffe booted successfully, those with fffffffb failed. Do you know where the mask is found and how to change it?

I also remove the second phy slave from the device tree.

That seems like a great idea, if only to stop all the useless messages about it never being found. Can that be done in the uEnv.txt, like when you disable HDMI, or do you have to rebuild the device tree binary? Would setting the phymask to ffffffff accomplish the same thing?

Loren

Jay · July 25, 2014, 11:01pm

They phymask comes from a hardware register read by the davinci_mdio driver, which gets passed to the linux phy libraries. The problem is that the cpsw driver gets the value from device tree, which is hardcoded to address 0. Usually the values are the same (address 0), but sometimes the phy gets registered to a different address, usually in my case address 2. You calculate the address using the phymask. If you changed the phymask than, you pointing back to address 0, so that wouldn’t help you.

I rebuilt the dtb file.

Jerin_George · November 4, 2014, 4:40am

Hi,
I am using a BBB Rev C with latest Angstrom image and i have seen this issue with eth not getting detected at boot up. This came at the last stages of my project delivery. How can this be corrected. Does moving to the latest debian image solves this issue ?

regards,
Jerin George

Andrew_Glen · November 4, 2014, 4:42am

As far as I know, and as already documented in this thread, the only reliable fix is to remove C24 and C30.

John_Syn · November 4, 2014, 5:26am

As far as I know, and as already documented in this thread, the only reliable fix is to remove C24 and C30.

If you read the full thread, Gerald say that if you remove these capacitors, the board may not start at all.

Regards,
John

Andrew_Glen · November 4, 2014, 6:36am

Yes, and reading the thread even more fully you’ll find my report of running thousands of automated test restarts with these parts removed, with a 100% success rate.

We use these boards a lot, running 24-7 in this configuration, and have had zero hardware faults. With any luck we have nearly exhausted Murphy’s law with our software.

Andrew.

John_Syn · November 4, 2014, 6:53pm

Yes, and reading the thread even more fully you’ll find my report of running thousands of automated test restarts with these parts removed, with a 100% success rate.

We use these boards a lot, running 24-7 in this configuration, and have had zero hardware faults. With any luck we have nearly exhausted Murphy’s law with our software.

Hi Andrew,

I accept that you have done these tests, but removing test two capacitors from the reset line means the device will come out of reset before the power supply has stabilized and without a capacitor, the reset switch will bounce several times. That is not a good idea. Perhaps you are just lucky given your setup, but removing C24 and C30 is a bad idea. Making these capacitors smaller may fix your problem but I suggest that you do have something there to delay the reset line.

Regards,
John

Jerin_George · November 5, 2014, 6:03am

HI Andrew & John,

Thank you for your reply. I guess that leaves me with no choice but to tweak the hardware & also update the kernel to the latest version by Robert.
Hopefully that will fix the issue for ever.
I will keep you posted on the status.

regards,
Jerin

rathod.pratik12 · November 19, 2014, 7:31am

Hello,

I am also experiencing the same issue here of etherenet not working on some boot-ups.

I am using A6C board. My other problem is it is my requirement to use Android on beaglebone black only from internal storage i.e. eMMC. To support all necessary functionality of Android 4.2.2, I have to use 3.2.0 kernel and U-Boot 2013.01.01. I think latest (3.8 or any other) kernels are not supported for android to boot from eMMC.

When etherenet is not working on boot, I see this logs,
U-boot:

USB Host mode controller at 47401800 using PIO, IRQ 0
Net: not set. Validating first E-fuse MAC
PHY reset timed out
cpsw
Hit any key to stop autoboot: 0

And, in kernel logs, I find these logs,

<4>[ 1.307249] davinci-mcasp.0: alias fck already exists
<6>[ 1.790881] davinci_mdio davinci_mdio.0: davinci mdio revision 1.6
<6>[ 1.797342] davinci_mdio davinci_mdio.0: detected phy mask fffffffb
<6>[ 1.804569] davinci_mdio.0: probed
<6>[ 1.808128] davinci_mdio davinci_mdio.0: phy[2]: device 0:02, driver SMSC LAN8710/LAN8720

…

<3>[ 22.106395] PHY 0:00 not found
<3>[ 22.109607] PHY 0:01 not found
<6>[ 22.122567] ADDRCONF(NETDEV_UP): eth0: link is not ready

Is it possible to apply software fixes in kernel 3.2.0 ? Can you provide me what changes are required to resolve this issue?

Also, I found this on “Known_Issues” of beaglebone black
---------------------8<---------------------8<---------------------8<---------------------8<---------------------8<---------------------8<---------------------8<

Ethernet PHY Default Configuration [A3,A4,A5,A6]

The mode pin setting for mode bit 2 connects to the wrong pin on the LAN8710. It goes to pin 15 and should go to pin 14 instead. This should not cause any operational issues as the internal registers are set correctly in Uboot by the default SW that is provided. If you are not using UBoot or have a custom UBoot, you will need to set the register inside the LAN8710 for proper operation. There is a preepmtion issue in SW that is currently being worked. There was a theory that this error was causing the issue. As long as you set the correct values in your initialzation code, this will not cause this issue and as the default UBoot correctly sets the register correctly for all modes and auto negotiate enabled which is what the default mode was intended to be.

---------------------8<---------------------8<---------------------8<---------------------8<---------------------8<---------------------8<---------------------8<

Is this the same issue we are refering to?
I am using custom u-boot. What are the changes inside u-boot required to fix it ?

Thank you for your time.

Regards,
Pratik

alexschneider250 · November 20, 2014, 6:39pm

Hello,

This Ethernet trouble also happens with my BeagleBone Black boards, quite frequently on Rev C (PCB Rev B6), and very rarely on Rev A6 (PCB Rev B5). I tried various Linux kernels, including the latest one from here (3.8.13-bone67), however that keeps happening anyway.

I read section 3.7 of the LAN8710A data sheet (Configuration Straps) and I agree with Loren Amelang: the trouble may be really caused by incorrect strap values, which depend on voltage levels at the respective LAN8710A pins during reset.

That assumption is backed by the observation that, whenever the “eth0: link not ready” thing happens, either both LEDs of the Ethernet Connector are off or only the yellow one (LED2) is off (and the green one is not blinking in both cases). Since these LEDs reflect the transceiver mode of operation, which is controlled by MODE[2:0] configuration straps, their strange behavior may indicate some wrong bit values loaded to MODE[2:0]. They are loaded when nRST pin is deasserted, and the timing is critical, according to subsection 5.5.3 of that data sheet (Power-On nRST & Configuration Strap Timing).

Also, according to the subsection mentioned above, the time interval between when external power supplies reach 80% and nRST pin is deasserted must be no less than 25 ms. Without the capacitor C24 on the board, that time is around 20 ms, I measured that. So, removing C24 does not seem to be safe.

Alex

Gerald_Coley1 · November 20, 2014, 8:50pm

If you have what you think are he correct trappings, let me know. They are the same for all revisions.

Also, if you reset the board after it is up, the strappings are overridden by the states on those pins from the processor that override the strapping options.

Gerald

alexschneider250 · November 21, 2014, 10:04pm

Hi Gerald,

I meant “strap values”, not connections on the board. As far as I understand it, correct strappings alone cannot always ensure correct bits in the respective registers of the transceiver chip. The power-on and reset timing is also important, and this timing, unlike strappings, is different at least for some revisions.

In my experiments, a reset performed with RESET button never resolved the “phy not found” problem. A power-on reset as well as a reset with POWER button helped, but not always. Cannot the transceiver sometimes enter into some unresponsive state, which makes it impossible for the processor to override the strappings?

Alex

Gerald_Coley1 · November 22, 2014, 12:38am

All the SW has to do itvwrite to the registers and not rely on the straps. Hmm I have been saying that for 3+ years now.

Gerald

alexschneider250 · November 22, 2014, 8:55am

But the SW can do that only when the transceiver chip is always in a “writable” state, which is unfortunately not the case.

Jerin_George · November 24, 2014, 7:18am

As suggested in this discussion i moved to 3.14.1 kernel and everything went well for the first 48 hours.
After that i could see that the ETH stopped responding for close to 10 seconds.
Then it came back.

Test set up:-
I’m using BBB for Data acquisition thru ETH. For testing i have connected BBB and 2 PC in a switch. One PC is running a data acquisition software to collect data from BBB. Another PC is running the software “Total Network Monitor” and it keeps on logging the Network status by pinging to both BBB & the other PC.

After 48 Hours :-
I could see that the Total Network monitor reported that link to BBB was lost for close to 10 seconds.

Is this a known issue ? Is there any fix to this.

regards,
Jerin

alexschneider250 · November 24, 2014, 3:42pm

It appears that the issue is known for a long time: several registers of the LAN8710A Ethernet transceiver sometimes get wrong values at power-up, in spite of correct pin strapping configurations. One of those wrong values is PHYAD (PHY address in the Special Modes Register), which is erroneously initialized to 2, while the processor expects it to be 0. This makes it impossible for the processor to communicate with the transceiver and override those wrong values.

Here, they suggest that this issue is inherent in the LAN8710A by Microchip, and may be caused by some interference from the clock signal. However, Microchip did not admit that.

The messages within this topic propose 3 main solutions:

Rebuilt the device tree to make it somehow work with a PHY address that can take on 0 or 2, or probably some other values
Remove C24 capacitor (this is not safe)
Change the file drivers/net/ethernet/ti/davinci_mdio.c and rebuild the Linux (this patch, as far as I understand, will update the device tree in such a way that other, non-zero PHY addresses, will become also acceptable)

I have tried to do another thing: to rewrite wrong register values with U-Boot, using commands like “mdio write 2 18 0xe0”, and I managed to make the content of those registers looking like that of a successfully started transceiver (including the PHYAD and MODE). However, to successfully apply these changes, a reset with RESET button is required. And if I just run “reset” command in U-Boot, the transceiver doesn’t work properly after reboot, even though it already has the right PHY address: 0. In this case the registers look like as if the auto-negotiation fails, and as if the link partner (i.e. the processor) doesn’t have auto-negotiation ability (the Auto Negotiation Expansion Register contains 0).

If that worked, it would be possible to append all required “rewriting commands” into “bootcmd” variable, thus forcing U-Boot to rewrite wrong values automatically. But there is another obstacle on this way: the U-boot I have (2014.10-dirty) does not have “saveenv” command, for some reason. So, I cannot save any changes in environmental variables there.

If anyone solved this problem by modifying the device tree or by some U-Boot script, please share the details.

rathod.pratik12 · November 25, 2014, 5:00am

Well, may be you can add your mdio command inside u-boot source code’s default bootcmd and rebuild the u-boot.
In my case, I updated my cpsw platform data (I use 3.2 kernel, so no device tree) inside kernel dynamically to fix this problem. It seems to be working for me.

Regards,
Pratik

rathod.pratik12 · November 25, 2014, 5:02am

I also believe the issue mentioned here : http://e2e.ti.com/support/arm/sitara_arm/f/791/t/366351.aspx is the same as we are facing in bbb.

Regards,
Pratik