Beaglebone Black Ethernet Phy Not Detected on Boot.

Christopher_and_Chri · December 30, 2014, 1:37pm

Based on all the comments and discussions I’ve seen before and after purchasing my BBB, I’m sure someone will figure out how to fix the issue. However, unfortunately my knowledge isn’t that advanced yet, so while I can understand what folks are saying (usually,) I’m not at the level to be able to figure this out It does sound like folks have fixes in the works though.

OOC, have you tried any other OSes to see if the problems with your board persist? I know openSuSE has one as well, but not necessarily support for all the GPIO built-in.

alexschneider250 · February 5, 2015, 6:16pm

Hello everybody,

After I and my colleagues had fruitlessly tried many things, our hardware developer came up with a working solution: delaying the PHY initialization performed by the U-Boot we use (v2014.10, git checksum c43fd23cf619856b0763a64a6a3bcf3663058c49). This ensured that U-Boot code tried to access LAN8710A only after this chip had come out of reset. So, inserting “udelay(1000000);” at the beginning of “board_eth_init()” function in board/ti/am335x/board.c and recompiling U-Boot worked in our experiments.

We haven’t yet done many experiments, but at least the board, whose PHY failed almost every time before, always starts successfully now. I would appreciate if you repeat this experiment with the delayed PHY initialization and report the results here.

I guess the problem was, as described in the reply from Texas Instruments mentioned by Micka (I misunderstood that earlier), that LAN8710A starts to function correctly at a slightly higher voltage level than the microprocessor, and may come out of reset too late with respect to the microprocessor. This agrees with the observation that a higher capacity on nRESET_INOUT only worsens things, because it makes the slope of the reset pulse more flat, thus increasing the time lag between the starts of the two chips.

@ c2h2: Thank you for your reply. As far as I understand, you delayed the reset of the microprocessor with respect to the reset of LAN8710A by means of some RC circuit, right? If this is correct, could you please provide more details about that circuit?

@ Vince Caldeira: Could you please explain, why you marked the topic as complete? If you know any better solution of the problem, please share that with us.

Regards,

Alex

bkozak · February 9, 2015, 4:42pm

Alex,

I’ve compiled a boot-loader with the delay in the Ethernet initialization and have done some quick testing on it. I haven’t seen any issues yet (where I normally would) but have only tested one board so far. To be exact, I’ve power cycled a BBB 20 times and confirmed that the Ethernet was working after each boot. Normally I would have seen at least a few Ethernet lockups after 20 resets on this board. There may be something to this.

Regards,
Bill

atomiklan · February 9, 2015, 9:06pm

If this ends up being the solution, can you please post a guide for those not as up to speed as you on precisely how to implement this fix? Thanks

John_Zhang · February 13, 2015, 5:58pm

Hi Alex,

I added this delay 1 second idea on the u-boot version v2015.01 and did the power on / off test on BBB. I did 263 board boot up. Among them 254 times Ethernet interface successfully started up. The other 9 times the u-boot just could not detect Phy. Below is the u-boot log comparison between a successful boot and a (Ethernet) bad boot -

Successful one:

U-Boot 2015.01-dirty (Feb 13 2015 - 00:46:39)

Watchdog enabled
I2C: ready
DRAM: 512 MiB
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
Using default environment

Net: not set. Validating first E-fuse MAC
cpsw
Hit any key to stop autoboot: 0
gpio: pin 53 (gpio 53) value is 1
switch to partitions #0, OK
mmc0 is current device
gpio: pin 54 (gpio 54) value is 1
Checking for: /uEnv.txt …
Checking for: /boot.scr …
Checking for: /boot/boot.scr …
Checking for: /boot/uEnv.txt …

…

Bad boot:

U-Boot 2015.01-dirty (Feb 13 2015 - 00:46:39)

Watchdog enabled
I2C: ready
DRAM: 512 MiB
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
Using default environment

Net: not set. Validating first E-fuse MAC
Phy 0 not found
cpsw
Hit any key to stop autoboot: 0
gpio: pin 53 (gpio 53) value is 1
switch to partitions #0, OK
mmc0 is current device
gpio: pin 54 (gpio 54) value is 1
Checking for: /uEnv.txt …
Checking for: /boot.scr …
Checking for: /boot/boot.scr …
Checking for: /boot/uEnv.txt …

…

Note the “Phy 0 not found” line shown in the bad log.

So unfortunately this idea does not work here. The Ethernet failure rate is about the same with or without this 1-second delay added or not. The hunt has to go on…

John Zhang

RobertCNelson · February 13, 2015, 7:45pm

It's id changed those 9 times, look at the kernel patch and port it to
the cpsw driver in u-boot:

https://github.com/RobertCNelson/linux-dev/blob/master/patches/beaglebone/phy/0003-cpsw-search-for-phy.patch

Regards,

John_Zhang · February 14, 2015, 7:04am

Hi Robert,

Instead of trying to port the patch to u-boot, I updated my kernel from the BBB stock v3.8.13-bone47 to the latest v3.8.13-bone70. Then re-doing the power on/off test right now. Currently there are about 200 times power cycling and 8 times u-boot reported “Phy 0 not found”. But for all those 8 times the v3.8.13-bone70 kernel successfully started the Ethernet interface.

With the stock v3.8.13-bone47 kernel version, every time u-boot reports the “Phy 0 not found” error, then kernel can not successfully start the Ethernet interface either.

So seems that this problem is solved in the v3.8.13-bone70 kernel version.

John Zhang

alexschneider250 · February 14, 2015, 3:48pm

If a changed ID is a problem, why does the network work after changing that ID manually in U-Boot and then rebooting, as I described here?

Alex

alexschneider250 · February 14, 2015, 5:01pm

Hi John,

Thank you for doing this experiment. Did you power off your board right after checking for “Phy 0 not found” in U-Boot console, without booting Linux?

Have you ever tried to let Linux boot after seeing that “Phy 0 not found”?

Regards,

Alex

alexschneider250 · February 14, 2015, 5:25pm

Hi Bill,

Thank you for doing that experiment.

I run two automated tests with the “delayed PHY initialization”, each one with 130 power-ups over 130 minutes (one power-up per minute), where I still saw a few “Phy 0 not found” messages.

In the first test, BBB periodically reset itself by means of a MOSFET connecting nRESET_INOUT to the ground, each time after running Linux for one minute. Only one “Phy not found” happened.

In the second test, BBB periodically disconnected itself from the power supply, by means of a relay, again, each time after running Linux for one minute. Seven “Phy not found” happened.

So, it appears that 1s delay in U-Boot code doesn’t always help. However, in these two tests, I never checked what happened with the network in Linux, after “Phy not found”. Because in both tests BBB immediately reset itself by that MOSFET as soon as U-Boot could not read from the chip with PHY address 0 (I checked this in special test code in U-Boot). But I think it is quite possible that the network still works OK in Linux, even after “Phy not found” happens in U-Boot, since 1 second is enough to ensure that LAN 8710A is OK. It could just have another PHY address, e.g. 2 instead of normal 0. In this case, the network would work, even in U-Boot, despite of “Phy not found” shown in console, as I described here.

We need more experiments.

Regards,

Alex

John_Zhang · February 14, 2015, 6:52pm

Hi Alex,

My test is to power on the BBB for 1 minute. Right after the kernel boots up BBB starts to ping another PC connected to the same Ethernet switch to show whether the Ethernet interface successfully started or not.

My first software combination was “u-boot v2015.10 with the 1-second ethernet delay start” + “linux kernel v3.8.13-bone47”. The test lasted 263 sessions and the ethernet interface failed 9 times. Every time u-boot reported “Phy 0 not found”, then linux kernel would fail the ping test also.

My second combination was “u-boot v2015.10 with the 1-second ethernet delay start” + “linux kernel v3.8.13-bone70”. The test lasted 543 sessions. U-boot reported 25 times “Phy 0 not found”. But for all those 25 u-boot fail sessions the linux kernel successfully started Ethernet interface and the ping test succeeded.

Regards,

John

RobertCNelson · February 16, 2015, 3:22pm

Well, it was solved a short time after "bone47"...

Based on the thread, it looked like you were more interested in make
it work 100% of the time in u-boot.. That's why i pointed to the
patch. As it would still be nice to get this fixed for u-boot..

Regards,

bkozak · February 17, 2015, 6:31pm

Hi Alex,

In my test, like John Zhang, I also had my BBB sending a ping after boot to confirm that Ethernet was working. In my experience the “Phy 0 not found” log message will occur even when the Ethernet seems to work normally.

Regards,
Bill

Richard-tx · March 9, 2015, 6:01am

I have a BBB that sometimes fails to have a useable network interface at power up. Removing power and reapplying power does resolve the issue. Is there a fix for this issue? Would an upgrrade to Rev C fix this?

Sascha_Ittner · April 24, 2015, 9:09am

Hi Jay,

I’ve run into the same problem, so thank you much for your work. However it seems more natural to me to patch the cpsw driver to use the first MAC which is found rather than modifying the DT at runtime. Please see my suggestion (based on 3.12.30):

`
diff -Naur linux.orig/arch/arm/boot/dts/am335x-bone-common.dtsi linux/arch/arm/boot/dts/am335x-bone-common.dtsi
— linux.orig/arch/arm/boot/dts/am335x-bone-common.dtsi 2014-10-09 15:46:37.000000000 +0200
+++ linux/arch/arm/boot/dts/am335x-bone-common.dtsi 2015-04-23 23:38:14.210206750 +0200
@@ -322,7 +322,7 @@
};

&cpsw_emac0 {

phy_id = <&davinci_mdio>, <0>;

phy_id = <&davinci_mdio>;
phy-mode = “mii”;
};

diff -Naur linux.orig/drivers/net/ethernet/ti/cpsw.c linux/drivers/net/ethernet/ti/cpsw.c
— linux.orig/drivers/net/ethernet/ti/cpsw.c 2014-10-09 15:46:37.000000000 +0200
+++ linux/drivers/net/ethernet/ti/cpsw.c 2015-04-24 08:16:54.401495090 +0200
@@ -1810,6 +1810,20 @@
slave->port_vlan = data->dual_emac_res_vlan;
}

+static int match_first_phy(struct device *dev, void *data)
+{

const char *dn = dev_name(dev);
const char *mn = (const char *) data;
while (*mn) {
if (*dn != *mn)
return 0;
dn++;
mn++;
}

Sascha_Ittner · April 24, 2015, 9:59am

On a good phy boot I see the following:

[ 2.810749] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 2.817206] davinci_mdio 4a101000.mdio: detected phy mask fffffffe
[ 2.833517] libphy: 4a101000.mdio: probed
[ 2.837871] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver unknown

On a ‘bad phy’ boot I see the following (differences highlighted):

[ 2.806763] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 2.813213] davinci_mdio 4a101000.mdio: detected phy mask fffffffb
[ 2.829512] libphy: 4a101000.mdio: probed
[ 2.833875] davinci_mdio 4a101000.mdio: phy[2]: device 4a101000.mdio:02, driver unknown

I’ve experienced the same issue and found your patch https://github.com/RobertCNelson/linux-dev/blob/master/patches/beaglebone/phy/0003-cpsw-search-for-phy.patch to fix it.

However it seems to be more natural to me to patch the cpsw driver to use the first phy that it will found. Please see my suggestion for that (it’s based on kernel 3.12.30):

`
diff -Naur linux.orig/arch/arm/boot/dts/am335x-bone-common.dtsi linux/arch/arm/boot/dts/am335x-bone-common.dtsi
— linux.orig/arch/arm/boot/dts/am335x-bone-common.dtsi 2014-10-09 15:46:37.000000000 +0200
+++ linux/arch/arm/boot/dts/am335x-bone-common.dtsi 2015-04-23 23:38:14.210206750 +0200
@@ -322,7 +322,7 @@
};

&cpsw_emac0 {

phy_id = <&davinci_mdio>, <0>;

phy_id = <&davinci_mdio>;
phy-mode = “mii”;
};

diff -Naur linux.orig/drivers/net/ethernet/ti/cpsw.c linux/drivers/net/ethernet/ti/cpsw.c
— linux.orig/drivers/net/ethernet/ti/cpsw.c 2014-10-09 15:46:37.000000000 +0200
+++ linux/drivers/net/ethernet/ti/cpsw.c 2015-04-24 08:16:54.401495090 +0200
@@ -1810,6 +1810,20 @@
slave->port_vlan = data->dual_emac_res_vlan;
}

+static int match_first_phy(struct device *dev, void *data)
+{

const char *dn = dev_name(dev);
const char *mn = (const char *) data;
while (*mn) {
if (*dn != *mn)
return 0;
dn++;
mn++;
}

zmatt · April 25, 2015, 5:26am

You can also fix most strapping options, including the PHY address, by writing to mdio register 0x12 (see datasheet page 60).

This whole issue still sounds really weird though. The RXD3/PHYAD2 line has internal pull-down in the PHY, internal pull-down in the AM335x, and external pull-down. How on earth can the PHY manage to sample it high?

This also makes no sense:

I guess the problem was, as described in the reply from Texas Instruments mentioned by Micka (I misunderstood that earlier), that LAN8710A starts to function correctly at a slightly higher voltage level than the microprocessor, and may come out of reset too late with respect to the microprocessor. This agrees with the observation that a higher capacity on nRESET_INOUT only worsens things, because it makes the slope of the reset pulse more flat, thus increasing the time lag between the starts of the two chips.

The PHY latches its strapping options (except REGOFF) at reset deassertion (rising edge of nRESET). It has a minimum time (but no maximum) between power supplies valid and reset deassertion, so if it were true that it considers its supplies to be valid rather late then a longer reset time would be required, not a shorter one…

I have another interesting case to add to the list: I found my BBB unreachable via network. It had network on boot, but about three days later I noticed it was unreachable. On closer inspection I observed that its link led was inverted: with cable disconnected it was on, after connecting cable it blinked briefly and then turned off. Its speed led was continuously on, hence probably also inverted. The device it was connected to correctly detected link up/down on cable connect/disconnect, but reported that auto-negotiation was not supported by link partner and ended up selecting 10base-T half-duplex.

The BBB itself had just a single link down message in its log and did not notice the reconnections at all. “mii-tool -v -v” produced garbage: it showed all registers as fffb, which I discovered is its way of saying “can’t communicate with phy”. (I get the same output if I change the phy’s address via register 0x12. Incidentally, this also produces a “link down” and the cpsw driver does not recover if I change the address back… not very resilient.)

The inverted link led is really weird though: according to the phy datasheet it corresponds to the REGOFF strapping option, which (unlike all other options) is only sampled at power-on and disables the internal 1.2V core voltage regulator. However, I measured PHY_VDDCR and it was in fact at 1.2V, and after a reset the phy functioned normally again, no power cycle was needed. Very very strange.

I do hope that on the x15 the phy reset(s) will be on GPIO? (Also to avoid pointlessly resetting them when rebooting, which is especially undesirable when using the integrated switch.)

zmatt · April 25, 2015, 5:42am

The RXD3/PHYAD2 line

Sorry I meant RXCLK/PHYAD1 of course, though exactly the same is true for that line.

The inverted link led is really weird though: according to the phy datasheet it corresponds to the REGOFF strapping option, which (unlike all other options) is only sampled at power-on and disables the internal 1.2V core voltage regulator.

Also wanted to add the observation that this option of course really has no sane way of changing at runtime while everything is powered up…

My BBB has no cape btw, nor did it have any external connections other than DC power and ethernet at the time iirc

Fernando_Derkoski · May 19, 2015, 8:24pm

Hello,

This problem happens to me as well, I have 3 beaglebones black Rev C, with the latest kernel, 3.8.13-bone70. Anyone knows how to resolve this problem?

RobertCNelson · May 19, 2015, 8:27pm

unlikely... show us the output off:

dmesg | grep mdio

sudo ifconfig -a

Regards,