Beaglebone Black Ethernet Phy Not Detected on Boot.

Ok, so this just happened to me. What I think ultimately caused it in my case is the OS hung on shutdown, so I had to hard-power-off the board with the power button. When it came back up, no network :confused: Connected via the serial terminal and was seeing the same things as others had reported. I have a rev. C board, and this was the first problem I’ve had with it, running 24x7 since it arrived back in July :slight_smile:

Linux envmon 3.8.13-bone47 #1 SMP Fri Apr 11 01:36:09 UTC 2014 armv7l

From what I gleaned from what some folks commented a few posts above, this is what I did, and so far it seems to have worked for me:

  • Logged in as root to the board via the serial terminal
  • ran the command “init 1” to take the OS into emergency maintenance mode. (I wanted to be in single-user mode because of what I was about to do.)
  • Once there, I pressed the reset button on the board once.

When the board rebooted, I had network again :slight_smile:

Pratik,

Thank you very much for sharing you experience with this issue. Could you please provide more details on what and how you changed in cpsw platform data?

Actually, rebuilding the kernel sounds a bit dangerous to me, because it may have some side effects, in my opinion. For instance, that “dynamical fix” you mentioned results in the situation when the processor communicates to the transceiver chip using some other address, not the default 0. OK, now everything works, it’s fine. But what if some piece of code unknown to you still relies on that default address 0, and it gets executed some time later, under certain conditions? Theoretically, it could cause something even worse than this current issue.

Regards,
Alex

to myersco...@gmail.com:

Hi,

I noticed that simply pushing RESET button actually helps sometimes, in my recent experiments. At the same time, pushing POWER button may not help sometimes. I have an impression that this issue is a bit different by different people.

Have you tried resetting the board by means of RESET button many times, without “init 1” command, to see whether RESET button alone can help?

Regards,
Alex

Hi Alex,

What I tried to say that the fix I applied had same logic which was described by “Jay @ Control Module Industries” in this discussion. Since I can not use device tree features of latest kernels, I made the changes which can fit in kernel 3.2 which is supplied by TI Android code. In my tests (which included many resets) it seem to be working fine.

I also believe that rebuilding the kernel is not dangerous as long as we know what we are doing. :slight_smile:

Regards,
Pratik

Hi!

I would assume that things would be fine without going into single-user mode. If it happens again, I can try as you suggested and post back :slight_smile:

Is there a finished flash-image available, which fixes the sw issues? (I wont solder on my board so there must be a sw only solution!)

So i’m still wondering, why also the last revision has still sw on hw and sw, after so long time.
Its very annoying having a nstable hw and so for this old product :frowning: Seems i sell this board and getting a RPI for that.

Hi Pratik,

As I studied the latest kernel code (3.8.13-bone67), I noticed that the patch mentioned by “Jay @ Control Module Industries” is already there, but apparently it doesn’t help.

After a lot of experiments with U-Boot, I’m more convinced that the problem cannot be solved with U-Boot only.

Actually, I included those “rewriting” commands I mentioned earlier in the “bootcmd” variable and rebuilt the U-Boot. But that didn’t work at all, i.e. the required registers of LAN8710A transceiver weren’t rewritten (maybe some delay is required before and after those commands). I also tried rewriting them “manually” in U-Boot command line, as I mentioned earlier, but this time I tried various combinations of “Soft Reset”, “Power Down” and “Restart Auto-Negotiate” commands afterwards. It was all futile.

Even when transceiver settings like PHY address, mode, etc. are correct (as a result of rewriting them manually), the processor doesn’t recognize the transceiver after reset command by U-Boot. And the state of some transceiver registers, containing info about its link partner, indicates that this is the problem of the link partner, i.e. the processor. And in this case, only a power-on or “button” reset can bring the processor to the state where it can detect the transceiver.

Also, I read the processor Ethernet Subsystem registers in U-Boot, both when the transceiver is detected and when it is not. And the difference seems to be in the way the processor uses the content of MDIOALIVE register (the PHY acknowledge status register), because in both cases that content is correct. For example, if the the transceiver powers up with PHY address 0 then the bit 0 of MDIOALIVE is set to 1, if the transceiver powers up with PHY address 2, the bit 2 of MDIOALIVE is set to 1, etc. So, this means that the processor gets the acknowledge from PHY with address 2 after trying to access it, according to the page 2212 of AM335x Sitara Processors Manual, and knows that a transceiver with PHY address 2 is around. But then, may be some time later, the processor fails to get the data from the transceiver, because it tries to read by a wrong PHY address, which is reflected by the state of the MDIOUSERACCESS0 register (page 2216 of AM335x manual).

When the PHY address is 0, the content of MDIOUSERACCESS0 is 0x23e01058, which means that the data 0x1058 (bits 15-0) was read from the PHY address 0 (bits 20-16), from the PHY register 31 (bits 25-21) and PHY acknowledged the read transaction (bit 29). When the PHY address is 2, the content of MDIOUSERACCESS0 is 0x0020ffff, i.e. the processor attempted to read from PHY address 0 (despite of the content of MDIOALIVE suggesting that PHY address is 2) and, of course, failed. This failure happens somewhere in U-Boot code (I guess, in drivers/net/davinci_emac.c), of course, but the same seems to happen in the kernel code (drivers/net/ethernet/ti/davinci_mdio.c) in spite of that patch, whose purpose is to update the device tree using the content of MDIOALIVE register.

Maybe all this happens because the step 4 of the MDIO module initialization procedure, mentioned on page 1972 of AM335x manual, is not present in the code? In that step, it is necessary to save the PHY address in the MDIOUSERPHYSEL0 register, to monitor the the link status of the respective PHY.

Alex

Hi Alex,

In past couple days, I went through the same path as you did. However, with my limited tests, I see improvement on 3.8.13-bone68. I have BBB Rev C from Element14 with a generic 5V 2.8A switch power supplier.

- with 3.8.13-bone47 (the original image) power cycle 50 times, there were 2 "phy not found"
- with 3.8.13-bone68 (prebuild image from http://rcn-ee.net/, power cycle 50 times, there were no "phy not found"
I also tested 3.14.22-ti-r31, power cycle 50 time, and no "phy not found".

I'll do more test to see if the issue show up. In you tests, did you see any improvement in 3.8.13-bone67 vs earlier version ?

Thanks

KeOu

Hi KeOu,

I guess with 3.8.13-bone67 I had “phy not found” as frequently as with the original image. On one board with 3.8.13-bone67, I had just one occurrence of “phy not found” in 50 power-on resets. On the other board with the same kernel I had 22 “phy not found” out of 50 power-on resets. Then I updated the kernel to 3.8.13-bone68, and still had that error, 12 times out of 50 power-on resets.

To perform that kernel update, I followed the instructions from the error message that showed up after I attempted to run “install-me.sh” from http://rcn-ee.net/deb/wheezy-armhf/v3.8.13-bone68/ Could you please describe in more detail what prebuilt image you tried and how exactly you installed that? Did you also install some other U-Boot?

Some time ago, I installed and tried one of the rcn-ee.net images following a procedure similar to that described here. But it didn’t help.

There is one important thing to remember: “phy not found” happens before autoboot in U-Boot, and if that happens, no matter which Linux is loaded afterwards - the transceiver chip doesn’t work properly. So, both 3.8.13-bone67 and 3.8.13-bone68 fail to detect the chip, if it wasn’t detected by U-Boot.

Regards,
Alex

Alex,

I did more testing by using a remote power switch; using 3 BBBs with 3 different Kernel, no modification on U-Boot, Kernel.

  • 3.8.13-bone47 - that came with the board
  • 3.8.13-bone68 - upgrade via on board script /opt/scripts/tools/update_kerenl.sh
  • 3.14.22-ti-r31 - download from run-ee.net

My test scripts turn power switch wait for 50 seconds, send TCP SYN request per second for 5 times, if no ACK received, declare Failure; then power off for 5 seconds, repeat.

One thing I found is that with Console port connected, there was no failure on all my tests (other then bone47). Do you have console port connect when testing ?

This is my test results, each test has 600 power cycles,

Test 1 Test 2 Test 3 Test 4
3.8.13-bone47 50 8.33% (a) 46 7.76% (a) 41 6.83% (a) 40 6.67% (b)
3.8.13-bone68 20 3.33% (a) 0 0% (b) 0 0% (b) 21 3.5% (a)
3.14.22-ti-r31 0 0% (b) 27 4.5% (a) 0 0% © 24 4% (a)

(a) no console connected
(b) with console connected
© with patch - DGGND to console GND, VDD_3V3 to Console Tx, VDD_3V3 to Console Rx.

I just start looking into this, and have not digest all the notes / manual yet; Hope the information help.

Regards

KeOu

KeOu,

Thank you for the data. It looks like “phy not found” does not happen very often with your boards.

I also run that “update_kernel.sh” to update the kernel to 3.8.13-bone68. And I tried 3.14.22-ti-r31 too, and had a lot of “phy not found” in this case as well. In all my experiments the console port was always connected to a PC.

Regards

Alex

Micka,

Alex,

This is very interesting; in your case, the debug connection did not make any difference. Assuming there is a difference on the TTL connection, can you try ©, with patch between GPIO and debug port. In my case, with this patch, I do not see any ethernet failure.

P9 1 (GND) to J1 (serial GND)
P9 3 (VDD_V3V) to J4 (serial RXD)
P9 4 (VDD_V3V) to J5 (serial TXD)

Without debug port connection, I just run ping from terminal continuously; the successful response shows Ethernet come up after power cycle.

These were discussion on the TI e2e forum, this seems to be the same issue we have here
Phy Address Issue beaglebone U-boot - Sitara Processors Forum - Sitara™ Processors - TI E2E Community

Regards

KeOu

Alex,

This is very interesting; in your case, the debug connection did not make any difference. Assuming there is a difference on the TTL connection, can you try ©, with patch between GPIO and debug port. In my case, with this patch, I do not see any ethernet failure.

P9 1 (GND) to J1 (serial GND)
P9 3 (VDD_V3V) to J4 (serial RXD)
P9 4 (VDD_V3V) to J5 (serial TXD)

Without debug port connection, I just run ping from terminal continuously; the successful response shows Ethernet come up after power cycle.

These were discussion on the TI e2e forum, this seems to be the same issue we have here
Phy Address Issue beaglebone U-boot - Sitara Processors Forum - Sitara™ Processors - TI E2E Community

Regards

KeOu

Where I can find the variable CONFIG_OMAP_PLATFORM_RESET_TIME_MAX_USEC ?
Did you try to increase it Robert Nelson ? Normally this should fix the problem with the phy not detected.

If not, why ?

Thx you,

I remember a thread in TI’s forum stating that this issue can’t be resolved by software. There is a hardware problem with the PHY which requires either a power cycle or a reset of PHY. And for BBB the reset line isn ot connected. To solve this issue via software a connection from one GPO to this reset line would be required.

Try to find this thread after TI has changed its webboard-software, things are way more complicated there…

OK, I found the thread I mentioned: http://e2e.ti.com/support/arm/sitara_arm/f/791/t/347189

KeOu,

I made the connections you described and powered the board up, but still had that “phy not found”, 7 times out of 10.

Also, I did another interesting experiment: in U-Boot, I manually changed the PHY address of the transceiver to 2, after it had started successfully with the address 0 (“mdio write 0 18 0xe2”). Then I restarted the CPU by the U-Boot command “reset” and noticed that the network works anyway, with PHY address 2, both in U-Boot and when Linux runs, although U-Boot shows the message “Phy not found” and fails to list MDIO buses (i.e. shows “0 - Generic PHY <–> cpsw” instead of the expected “2 - SMSC LAN8710/LAN8720 <–> cpsw” after running the command “mdio list”).

So, that means that a “nonstandard” PHY address of a transceiver alone is not a problem, it is totally acceptable to the CPU, and I was wrong when I said that this patch doesn’t work. There is some difference between this “non-zero address case” and the case when the network really doesn’t work. In the latter case the transceiver not only has a wrong PHY address (which wouldn’t be a problem alone), but also remains in some “unhealthy” state, that could be changed only by the reset signal on the nRST.

Regards

Alex

Micka,

Yes, you’re right, the network can still work after “Phy not found” was shown, although I observed this situation very rarely (I guess, only once), when I simply powered the board up. But this situation can be created artificially, if you intentionally change PHY address to a non-zero value, as I described in my previous message (there is probably some bug in U-Boot). It’s just convenient to call this problem “phy not found”.

As for the PRCM.PRM_RSTTIME, I’m studying the U-Boot code and so far don’t find any place where this register is written, except for some code apparently intended to be run on some other processors, not on am33xx. And I think the macro CONFIG_OMAP_PLATFORM_RESET_TIME_MAX_USEC is also related to some other boards or processors, because the code where it is used assumes that the PRM_RSTTIME register contains some number of 32.768 kHz clock cycles, in bits [9:0] RSTTIME1, while the same register on AM335x contains some number of 24 MHz clock cycles (CLK_M_OSC) in bits [7:0] RSTTIME1. I need to read the datasheet more to understand better what it is.

Regards

Alex

Karl,

Thank you for the link. So, according to that thread, we cannot start the board reliably without modifying the hardware.

But what about doing something with nRESETIN_OUT pin? The datasheet says that the pin can be used to reset external devices, although it recommends using the pin as input only (page 1149). I wonder whether a special reset signal, generated on that pin to reset the transceiver chip, will work or not.

Regards

Alex