Beaglebone Black Ethernet Phy Not Detected on Boot.

Hello,

I have noticed very rare cases (~1/50) of the ethernet phy on the Beaglebone Black not being detected on boot, and requiring a hard reset (as opposed to calling ‘reset’ from the command line) to get it to work/be detected again.

This problem has been mentioned in a couple of other threads (below) concerning different topics (i.e. problems getting the BBB to boot, and the ethernet phy ‘dying’ some time after initially working fine), with no solution/workaround for this specific problem being suggested - so I thought I’d start a thread specifically for it.
https://groups.google.com/forum/#!msg/beagleboard/Vp4pxwHm8BU/Iaw3p5xm0MoJ

https://groups.google.com/forum/#!topic/beagleboard/aXv6An1xfqI

In the first thread mlc/Mike discussed his response to the problem as follows:

"I had issues with the network not coming up on boot, and it was traced
down to problems with the SYS_RESETn line.

I had a level translator connected to SYS_RESETn, to drive a 5V chip.
It was powered by a 5V rail. If the 5V rail powered up “differently”
than the 3.3V rail (not sure of the exact relationship), I guess it
pulled the SYS_RESETn line to weird levels that affected the network
chip but not the main processor. I’m now using a GPIO to drive the
external 5V chip now, instead of the SYS_RESETn line.

Anyway, the moral is be very, very careful with SYS_RESETn, because it
can cause hard-to-trace problems with networking."

I see that the A6 Revision of the Beaglebone Black has some changes to the SYS_RESETn line:

Based on notification from TI, in random instances there could be a glitch in the SYS_RESETn signal from the processor where the SYS_RESETn signal was taken high for a momentary amount of time before it was supposed to. To prevent this, the signal was ORed with the PORZn (Power On reset).” (http://elinux.org/Beagleboard:BeagleBoneBlack#Revision_A6_.28Production_Version.29)

Is it likely that this modification will improve/resolve the issue I am seeing with the ethernt phy not resetting/powering-up correctly?, seeing as the SYS_RESETn signal also feeds into the nRST pin on the ethernet phy (The SYS_RESETn line is left untouched in my application).

Some additional observations from dmesg concerning this use:

On a good phy boot I see the following:

[ 2.810749] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 2.817206] davinci_mdio 4a101000.mdio: detected phy mask fffffffe
[ 2.833517] libphy: 4a101000.mdio: probed
[ 2.837871] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver unknown

Followed later by:
[ 21.286920] net eth0: initializing cpsw version 1.12 (0)
[ 21.301166] net eth0: phy found : id is : 0x7c0f1

On a ‘bad phy’ boot I see the following (differences highlighted):
[ 2.806763] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 2.813213] davinci_mdio 4a101000.mdio: detected phy mask fffffffb
[ 2.829512] libphy: 4a101000.mdio: probed
[ 2.833875] davinci_mdio 4a101000.mdio: phy[2]: device 4a101000.mdio:02, driver unknown

Followed later by:
[ 21.346861] net eth0: initializing cpsw version 1.12 (0)
[ 21.354379] libphy: PHY 4a101000.mdio:00 not found
[ 21.359469] net eth0: phy 4a101000.mdio:00 not found on slave 0

So it looks like the ‘davinci_mdio_reset’ function see the phy in both instances, but reports differently on the bad boot. I am not sure what to make of this.

I am using the Debian 7.2 Rootfs and the ‘RobertCNelson’ kernel ‘3.12.0-bone8’.

Regards,
Andrew.

I am just now looking at this issue. The A6 revision was not put in place to fix this issue.

Gerald

We are experiencing the same issue, using the A5 version. Roughly 1% to 3% of the times on boot up, the unit fails to find the PHY. On next boot up works fine.On very very rare occasions, it will fail to find the PHY 2x in a row, but haven’t seen that in a few days now since started driving SYS_Resetn as below. Before started driving the Sys_Resetn line, wold see it miss in the 10+% range.

The startup SYS_Resetn glitch from the CPU has been observed on multiple units. To counter that we drove the SYS_Resetn line low { open collector } for 400 msec with occasional improvements, but the problem has never really gone away. Using the external open collector reset, we have also added an adiditonal 4K7 pullup to 3V3. We are also trying driving the line with an active 3V3 device rather than depending on the pullup…

Don’t think see any issues with the way 3V3 is coming up.

Dave

Try removing C24. See if that helps.

Gerald

Hi All,

After removing C24 and C30 (next to the large unpopulated 20-pin header P2 on the bottom of the board) we ran 1000 power cycles and had a 100%
success rate - i.e. board booted and phy detected every time.

We used a programmable power supply and some scripts processing the uart output to count observed
instances of “libphy: PHY 4a101000.mdio:00 not found” and “net eth0:
phy found : id is : 0x7c0f1”, and momentarily interrupted the power supply after seeing either.

We ran the same test on an unmodified board and had a failure rate of 54/1000

Regards,
Andrew Glen.

Guess this issue is solved, but this seems like the best place for my comment.

I always see “davinci_mdio 4a101000.mdio: detected phy mask fffffffe”, never “fffffffb”, so that may be the critical part of your error.
But I also always see

Does the C24 value change in revision A6A suppose to fix this problem? I still got this issue on my two BBB A6A. Do we need to increase C24 further, or have it removed completely since U16 is added?

Can I add in what I’m seeing on a A5C board.

When the board boots either with or without an ethernet cable connected I don’t get any lights on the ethernet socket. Whereas my older Beaglebone (white version) lights the orange ethernet LED almost immediately after power is applied.

I can plug in different cables and different switches into the ethernet port and no go. These cables and switches work fine with the BB white.

Appropriate dmesg lines are always the same:

[ 0.762015] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6
[ 0.762046] davinci_mdio 4a101000.mdio: detected phy mask fffffffe
[ 0.763087] libphy: 4a101000.mdio: probed
[ 0.763120] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
[ 0.763468] cpsw 4a100000.ethernet: NAPI disabled
[ 5.682733] gadget: using random self ethernet address
[ 5.888585] net eth0: initializing cpsw version 1.12 (0)
[ 5.890892] net eth0: phy found : id is : 0x7c0f1
[ 5.891109] libphy: PHY 4a101000.mdio:01 not found
[ 5.896181] net eth0: phy 4a101000.mdio:01 not found on slave 1
[ 5.960077] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

Now here comes the crazy part. The only way I’ve managed to get the ethernet lights to come on is to plug in an ethernet cable directly from the BB white to this BB Black (sometimes this doesn’t work first time). Then both the orange and green lights come on and stay on on the Black. Then I get:

[ 17.528196] libphy: 4a101000.mdio:00 - Link is Up - 100/Full
[ 17.528261] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

If I set an IPv4 address manually I can’t ping between the boards tho. Even tho ifconfig is showing some packets flying around:

eth0 Link encap:Ethernet HWaddr C4:ED:BA:7B:EB:F7
inet addr:192.168.5.199 Bcast:192.168.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:41 errors:0 dropped:0 overruns:0 frame:0
TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:13493 (13.1 KiB) TX bytes:30970 (30.2 KiB)
Interrupt:56

Craziness does not stop there. About 50% of the time I can then unplug the ethernet cable from the BB Black and the ethernet lights (on the now empty port) stay lit. I can then plug a cable in between the BB Black and a switch and see activity on the green LED (orange LED is still lit). But still can’t actually get any packets out of it. When I do this no new dmesg’s appear.

Is this faulty or some symptom similar to what else is in this post?

I have tried computer USB power, 1A external DC power and a 2.1A USB power supply. This was observed with the original eMMC software version and I have since flashed it to 2013-09-04 version with no difference.

regards,
Damon.

Should C24 and C30 be removed from A5A BBBs that begin experencing strange Ethernet problems?

I have two A5As that have both decided to keep their link lights on at all times (even without a cable).
Also the Ethernet switch does not recognize either of the affected BBBs (tried several known good cables and ports).

Regards,
Joe

That is up to you if you want to rip parts off. I am not recommending it.

Gerald

Correct. You can also solve the issue by using the correct SW as well. The issue is that the processors interferes with the default address settings when the PHY reads the pins. If the SW looks for the other addresses, it works fine.

Also, removing these caps will also create issues where the board does not boot at all. Not exactly a good trade.

Gerald

My SW guy is currently out. There is a change that needs to be made. I will get with him next week to make sure the change gets done.

Gerald

My "debian" image is now hosted here:

http://beagleboard.org/latest-images

For ubuntu you can find images from the same script that generates
that image ^ here:

http://elinux.org/BeagleBoardUbuntu

But honestly, due to ubuntu's current short eol cycles (9 months),
ubuntu isn't that great anymore for long run embedded applications. So
i've been losing interest in support ubuntu.

Regards,

Gerald said:
The issue is that the processors interferes with the default address settings when the PHY reads the pins. If the SW looks for the other addresses, it works fine.

Could we get a clue about which “addresses” are the problem?

Maybe the addresses 4a100000 vs. 4a101000?

davinci_mdio 4a101000.mdio: phy[0]:
device 4a101000.mdio:00

The base address of the PHY. Only one address per PHY. I believe it is 0 to 7. It is my understanding that the fix was pushed up a week ago. Robert’s image should handle this.

It is not the MAC address. PHY NOT FOUND means that at the one address of 00 the PHY did not respond because the PHY has a different address.

Gerald

Look in the LAN 8710A data sheet from SMSC. I would cut an paste it, but Microchip has cut and paste blocked.

http://www.microchip.com/wwwproducts/Devices.aspx?product=LAN8710A Section 3.7.1

Gerald

Got it! (Chrome print to PDF, copy from the print…)

If anyone has any further information on the software fix/patch for
this issue I would be very interested in hearing about it (and
backporting into the kernel from late last year I am using) - or even
the best way to search for patches to particular files.

Regards,
Andrew.

I know what I have seen. I know what I have replicated. And I know the SW fix takes care of it.

I also know it does not happen on every board and doe snot happen every time. Do keep in mind that the board reset is not a HW reset. It is a SW reset.

The fix is in the latest image from Robert. http://beagleboard.org/latest-images/

Gerald