possible workaround for BeagleBone Ethernet PHY problems

Hi,

though I don't own a BeagleBone (yet) I noticed a lot of issues being reported regarding non working or faulty Ethernet ports.

We are building a AM3503/17 based boards which AFAIK use the same Ethernet block, we are also using the LAN8720A PHY which is nearly identical to the LAN8710A used on the Bone.

I remember seeing similar problems with our device. Sometimes it just worked, sometimes it didn't and often the link detection was just extremely unreliable.

I can't guarantee that my solution will work for the bone problem, but I think it's worth a try. The required change can be found at
<https://github.com/pironex/pia-linux-kernel/commit/bd65d378c5b152ea4bde127f9d3684b0c5c0737c> (to get the patch, just add ".patch" to the URL.)

It simply disables the automatic "powerdown via link energy detection" feature (which doesn't seem to work as expected) for the PHY.

Hope I could help
Bjï¿œrn

Interesting...
I've not personally seen this issue pop up on my Bones but I've not
yet played with them extensively with the Ethernet link going up and
down often. For applications where power consumption isn't critical,
your patch seems to achieve the goal.

Do you know if this is an SMSC PHY errata that should be published?
Or could layout of the PHY traces or magnetics selection have an
impact? I'm interested to learn more if you're willing to share more
of what you've found hardware wise.

Thanks,
Andrew

Hi,

though I don't own a BeagleBone (yet) I noticed a lot of issues being
reported regarding non working or faulty Ethernet ports.

We are building a AM3503/17 based boards which AFAIK use the same Ethernet
block, we are also using the LAN8720A PHY which is nearly identical to the
LAN8710A used on the Bone.

I remember seeing similar problems with our device. Sometimes it just
worked, sometimes it didn't and often the link detection was just extremely
unreliable.

I can't guarantee that my solution will work for the bone problem, but I
think it's worth a try. The required change can be found at
<https://github.com/pironex/pia-linux-kernel/commit/bd65d378c5b152ea4bde127f9d3684b0c5c0737c>
(to get the patch, just add ".patch" to the URL.)

It simply disables the automatic "powerdown via link energy detection"
feature (which doesn't seem to work as expected) for the PHY.

Thanks for sharing! I've applied this patch to the BeagleBone kernel
[1]. It would be great to collect some data points from people seeing
this issue. I'm not currently reproducing it, but it could be my
network just has too much traffic. Can folks seeing this issue try
out my pre-built kernel [2] by copying uImage-beaglebone.bin as uImage
into the FAT partition?

[1] http://groups.google.com/group/beagleboard/browse_thread/thread/776a4ca8ca3c06e/09e25371b09eb2a4#09e25371b09eb2a4
[2] www.beagleboard.org/~share/beaglebone-debug-20120110/

Interesting...
I've not personally seen this issue pop up on my Bones but I've not
yet played with them extensively with the Ethernet link going up and
down often. For applications where power consumption isn't critical,
your patch seems to achieve the goal.

Do you know if this is an SMSC PHY errata that should be published?

I was not able to find one and stopped looking, because power consumption was not essential for our project. On the other hand, I remember not recognizing any measurable difference in overall system consumption after applying the patch.

Or could layout of the PHY traces or magnetics selection have an
impact? I'm interested to learn more if you're willing to share more
of what you've found hardware wise.

Unfortunately we couldn't find any hardware related issues outside the PHY IC. I noticed the behaviour on 2 different board layouts with different transmitters.

I have had problems with my ethernet port, as I have described on
another thread.

I have replaced uImage with your uImage-beaglebone.bin, and so far it
is working. But since it was an intermittent fault, it will take a few
days before I know for sure whether your patch makes a difference.
I'll leave the board running, and report back in a couple of days.

Well that was fast, the PHY is once again not responding.

If I reboot the board (either soft or hard), I get a kernel message:
[ 21.001710] PHY 0:00 not found
See also:
http://groups.google.com/group/beagleboard/browse_thread/thread/a84133aef678f1c9/58847ee0f33b060b?lnk=gst&q=ethernet#58847ee0f33b060b

I suggest you send it in under an RMA so we can look at it from an electrical point of view.

Gerald

I'd rather not. I'm in Europe (Denmark), and I looked up the shipping
cost of returning the board. It is quite a bit more than I paid for
the board (incl. shipping) in the first place :frowning:

I suggest you do it. Just put in the RMA request and see what happens.

Gerald

I have had the board in our lab, and tried to measure if there were
any obvious problems. The only thing I found was that nRST is at about
1.4V, so it looks like the reset signal is not being driven. The PHY
is also not sending link pulses, but doesn't necessarily need to be
because of the reset signal, the PHY coould be in powerdown mode. But
ss can be seen from the kernel message (PHY 0:00 not found) it is not
responding on the management interface, which definitely should not
happen unless the part is reset.

I'm going to try to solder a pullup on the reset signal, that should
help. But I won't have time until tomorrow. I'll let you know the
result.

Thanks!

I left the BeagleBone on last night to try to reproduce the PHY
problem again (without the new patch). It took about 19 hours for it
to go down.

root@beaglebone:~# dmesg | tail
[ 7.154876] gadget: high speed config #1: Linux File-Backed
Storage
[ 17.944933] eth0: no IPv6 routers present
[ 6514.724823] gpio_request: gpio-56 (sysfs) status
-16
[ 6514.724848] export_store: status
-16
[ 6677.539971] gpio_request: gpio-38 (sysfs) status
-16
[ 6677.539996] export_store: status
-16
[ 6677.542518] gpio_request: gpio-56 (sysfs) status
-16
[ 6677.542539] export_store: status
-16
[70308.135033] PHY: 0:00 - Link is
Down
[70308.188652] ip_tables: (C) 2000-2006 Netfilter Core
Team
root@beaglebone:~#
uptime
19:21:24 up 19:43, 1 user, load average: 0.00, 0.01,
0.05

I have now uploaded the new patched kernel and I'll see what happens.

I suffer faulty Ethernet but only on the latest demo build or daily
build of the software, currently I'm running 2011.12.13, nothing newer
works.

As I mentioned before, I found that on my board nRST to the PHY was
only 1.4V. According to the LAN8710A datasheet, Vih(min) is also 1.4V,
so it is not surprising that the PHY sometimes starts up, and
sometimes doesn't, since there is no noise margin at all.

It turns out that the reset signal is driven by an open-collector
buffer U16A, with a 10K pull to VDD_3V3A. I measured the VDD_3V3A
supply to be 3.3V. I replaced R23 with a 3.6K resistor, and now I
measure 2.5V for nRST. The board is currently running, and I am going
to leave it on for a couple of days to be sure that the ethernet does
not shut down again.

There must be an internal pull down on the reset pins on either the
LAN8710A or the AM3359 (perhaps both), nothing else is connected to
this net. There is no mention of an internal pull resistor in the
LAN8719A DC specs. I haven't checked the 3359 datasheet.

I would suggest that you check the nRST voltage on some more boards,
if they all have marginal values, you should change the value of R23
to something lower.

My fix, where I replaced R23 (10K) with 3.6K only worked for a short
while. The initial reset voltage was ~2.5V, but after perhaps half an
hour it fallen to 1.6V, and the Ethernet port was once again not
working. I then replaced the pull-up resistor with 340R, and measured
3.1V for reset. This time I put a scope on the signal, and found that
the reset signal was toggling between ~3.1V and ~2.6V. I am pretty
sure that if I replaced R23 with 3.6K again, I would see the voltage
toggling between 3.3V and 1.6V (the average of these is pretty close
to the 2.5V I measured with a multimeter).

One explanation for this is that there may be a manufacturing defect
on the board, shorting the reset signal to another signal. Another
possibility is that the CPU is driving the reset signal from the
NRESET_INOUT pin, because it can't be the U16A buffer, which should
easily be able to drive the reset signal to ground, even with a 340
ohm pull to 3.3V.

I have sent full details, including a scope shot, to beagleboard RMA.
I will also be returning the board in a couple of days, so they can
take a look at it.

Yes, there may be a manufacturing defect. That is why we want the board back. We have shipped 4500 boards so far, on only have a few that show this issue, and none of them are in our possession. We may have something that we are missing in our production testing. In order to fix it, we have to figure out what it is. And based on what it is we can figure out how to test for it and resolve the issue.

Gerald

I shipped the board today. I expect the 'Beagle Hospital' :wink: will
receive it on Monday.

Thank you!

Gerald

For me, at least, the patch seems to have worked. I have been running
the patched kernel for 48+ hours on an idle BeagleBone and the
ethernet still works. This is the longest I have been able to run it
without the ethernet dropping.

Thanks,

Octavio

After a reboot this morning the ethernet stopped working again. I will
send it back also.