My board is back from RMA, with what looks like a new ethernet chip. It is
ever so slightly raised up along one edge, though the pin alignment and
soldering job are perfect. I guess it could have been that way before, but
usually that's a sign of manual replacement. The only info I was able to
get from the RMA Team was:
---
After running the diagnostic tests, we found that there was a Ethernet
malfunction. We have fixed the issue and everything is properly working.
---
The board was carefully solvent cleaned after the repair; a little glob of
glue or rosin I had noticed before is now gone. But I noticed lots of tiny
solder splashes on the bottom of the board, mostly along the expansion
connector pins. A couple of them could have been a real problem if the
board coating hadn't protected the traces. All popped off easily with a
fingernail or blunt plastic tool.
So far, the board boots fine and works as expected.
The differences between booting and panic:
< cpsw, usb_ether
---
> Phy not found <-- with bad ethernet, just before reading uEnv.txt
> PHY reset timed out
> cpsw, usb_ether
< [ time ] pinctrl-single 44e10800.pinmux: could not request pin 21 on
device pinctrl-single
< systemd-fsck[85]: Angstrom: clean, 49509/112672 files, 354728/449820
blocks
< [ time ] libphy: PHY 4a101000.mdio:01 not found <-- with good ethernet!
< [ time ] net eth0: phy 4a101000.mdio:01 not found on slave 1 <-- last
line before logo
<
< .---O---.
< | | .-. o o
< | | |-----.-----.-----.| | .----..-----.-----.
< | | | __ | ---'| '--.| .-'| | |
< | | | | | |--- || --'| | | ' | | | |
< '---'---'--'--'--. |-----''----''--' '-----'-'-'-'
< -' |
< '---'
---
> [ time ] pinctrl-single 44e10800.pinmux: could not request pin 21 on
device pinctrl-single
> [ time ] Unhandled fault: external abort on non-linefetch (0x1008) at
0xe09fe000
So in both conditions it complains about "phy not found"! With a bad chip,
it complains near the beginning of U-Boot. With working ethernet, it
complains at the very end of kernel boot. It seems like someone who knows
the details of cpsw_probe needs to figure out how to make it report a
failed ethernet chip gracefully. And why libphy still reports an error when
the ethernet is good and boot is successful.
I'm finally able to login and view files. I'm wondering if these are
standard, or are they leftover from the RMA testing:
---
root@beaglebone:/# cat /media/BEAGLEBONE/uEnv.txt
optargs=quiet drm.debug=7
root@beaglebone:/# cat /media/BEAGLEBONE/uEnv.txtboot
optargs=run_hardware_tests quiet
---
After receiving the board back, I couldn't use VNC or SSH, though I could
ping the ethernet ports. In both cases Wireshark showed my external request
followed by an immediate RST from the BBB. I tried re-installing the
previous VNC package, but it said "Package x11vnc (0.9.13-r0.8) installed
in root is up to date. Still, the trick to make it load itself didn't seem
to work. I found
http://feeds.angstrom-distribution.org/feeds/v2012.12/ipk/eglibc/all/angstrom-x11vnc-xinit_1.0-r2.0_all.ipk
.
and that installed and worked immediately after a restart. The "netstat
-lntu" command did not see it until after it was active, even though it did
seem to see all the other open ports immediately after booting.
SSH was trickier. I finally found
Redirecting to Google Groups
-----
"ssh_exchange_identification: Connection closed by remote host"
From looking at the script above (/etc/init.d/dropbear) it seems like the
identity file in /etc/dropbear/dropbear_rsa_host_key might be causing the
problem and the script recreates them if they don't exist. So I removed it
and started dropbear (/etc/init.d/dropbear start) again and it generated
new keys and then I could ssh in. It now works! (The side effect of doing
this is you also have to remove a line in the client's ~/.shh/know_hosts
because the identity of the beaglebone has changed.)
-----
My /etc/dropbear/dropbear_rsa_host_key file was zero-length, so I removed
it. The "dropbear start" command didn't work for me, a BBB restart was
required after I manually deleted the key file. I also unchecked the
"History" box in TeraTerm - and it saved a new RSA fingerprint. Now works
with default password choice and blank password field, and also works with
Tunnelier.
Thanks for updating the thread. Good to know it's working now.
Other random things I just learned...
At least on Windows, when the USB cable is connected, there is a "Gadget
Serial" device USBSER000 from "Linux Developer Community" available as a
COM port (ttyGS0 in the BBB), alongside the "USB Serial Port" VCP0 from
FTDI which is my debug console adapter COM port (ttyO0 in the BBB). The
"gadget" port is only active after boot is complete, so I didn't have much
opportunity to see it before! But it claimed a lower COM port number, so I
assume it installed along with the ethernet gadget when I first connected
via USB.
That leaves the question, could I have somehow fried my ethernet chip? I
checked my incoming cable and it is fully DC isolated. The connector on the
BBB is fully DC isolated. It is not a POE-capable connector, there is no
diode array that could feed power into the grounded pin 8. So if I did
something to cause my failure, it was not through the ethernet cable.
Hmm this could be a one-off case. I guess if there are more instances like
this then someone needs to
dig deeper. For now just hack away