Problem: A start job is running for LSB ...

g41 · July 8, 2015, 7:18pm

3 B3s all same rev all running 4.1.0-bone9 configured identically to boot via TFTP/NFS (not all at the same time I hasten to add). 1 machine works perfectly. The other 2 both reach the same point then hang. Any thoughts?

TAIA

[ 0.000000] Booting Linux on physical CPU 0x0

[ 0.000000] Initializing cgroup subsys cpuset

[ 0.000000] Initializing cgroup subsys cpu

[ 0.000000] Initializing cgroup subsys cpuacct

[ 0.000000] Linux version 4.1.0-bone9 (jevans@acermint) (gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11) ) #1 Wed Jul 1 13:31:10 BST 2015

[ 0.000000] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=50c5387d

[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache

[ 0.000000] Machine model: TI AM335x BeagleBone Black

[ 0.000000] cma: Reserved 16 MiB at 0x9f000000

[ 0.000000] Memory policy: Data cache writeback

[ 0.000000] CPU: All CPU(s) started in SVC mode.

[ 0.000000] AM335X ES2.0 (sgx neon )

[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 129920

[ 0.000000] Kernel command line: console=ttyO0,115200n8 root=/dev/nfs rw rootfstype=ext4 rootwait fixrtc nfsroot=192.168.1.110:/home/bone/rootfs,vers=3 ip=192.168.1.11:192.168.1.110:192.168.1.254:255.255.255.0::eth0:off

xxx--------elided-------------xxx

[ OK ] Started Trigger Flushing of Journal to Persistent Storage.

[ OK ] Started LSB: Tune IDE hard disks.

[ OK ] Started Create Volatile Files and Directories.

Starting Network Time Synchronization…

Starting Update UTMP about System Boot/Shutdown…

[ OK ] Started Update UTMP about System Boot/Shutdown.

[ OK ] Started Network Time Synchronization.

[ OK ] Reached target System Time Synchronized.

[ OK ] Found device /dev/ttyO0.

[ 12.709682] omap_rng 48310000.rng: OMAP Random Number Generator ver. 20

[ 12.966802] tda998x 0-0070: found TDA19988

[ 12.997206] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).

[ 13.004323] [drm] No driver support for vblank timestamp query.

[ 13.070366] tilcdc 4830e000.lcdc: No connectors reported connected with modes

[ 13.134136] [drm] Cannot find any crtc or sizes - going 1024x768

[ 13.214789] Console: switching to colour frame buffer device 128x48

[ 13.227549] tilcdc 4830e000.lcdc: fb0: frame buffer device

[ 13.233424] tilcdc 4830e000.lcdc: registered panic notifier

[ 13.339779] [drm] Initialized tilcdc 1.0.0 20121205 on minor 0

[ 13.376166] omap-sham 53100000.sham: hw accel on OMAP rev 4.3

[ 13.432581] omap-aes 53500000.aes: OMAP AES hw accel rev: 3.2

[ ***] A start job is running for LSB: Raise network interf…28s / no limit)[ 44.972386] nfs: server 192.168.1.110 not responding, still trying

William_Hermans · July 8, 2015, 7:44pm

Yeap, take the one that does boot out of the loop. Then try booting the other two. If the first to boot works. Then you have a problem with your configuration, or possibly network.

Are you using GbE + GbE switch between the server, and the bones ?

William_Hermans · July 8, 2015, 8:10pm

Also, this seems to be a decent “guide” to help you troubleshoot.

https://access.redhat.com/solutions/28211 It’s redhat, but Linux is Linux so long as there are not any redhat-isms.

William_Hermans · July 8, 2015, 11:27pm

ooopps Focused on the aftermath and not the error.

Anyhow g4 which rootfs are you using ? I was just reading a bug report for sid last year that indicates that the error:

***] A start job is running for LSB: Raise network interf…28s / no limit) can be related to systemd-sysv trying to use / load a resource ( $network ) which is not available. With that said, it is odd that one image works while the other two do not. Unless they all work fine one at a time . . . Then I’m still leaning towards limited bandwidth or somehow a server misconfiguration.

RobertCNelson · July 8, 2015, 11:30pm

He's got the jessie image, which has systemd. ( i thought systemd/nfs
would just not be working period, so i had kept my mouth shut when he
got it working.. )

Regards,

William_Hermans · July 8, 2015, 11:54pm

It would be interesting if all 3 will boot individually to get a look at ps aux from each board.

William_Hermans · July 8, 2015, 11:55pm

and pstree for that mater.

g41 · July 9, 2015, 10:32am

He's got the jessie image, which has systemd. ( i thought systemd/nfs would
just not be working period, so i had kept my mouth shut when he got it
working.. )

Yep. The Jessie minimal image.

g41 · July 9, 2015, 10:33am

and pstree for that mater.

Sadly no can do. The 2 that do not boot will not boot at all. Ever.

William_Hermans · July 9, 2015, 5:47pm

humm, I wonder if there is an easy way to diff the images. I’ve never used diff, so do not rightly know.

Robert ?

g41 · July 9, 2015, 7:17pm

I got down to the stage of using Wireshark to trace the DHCP requests made during boot. The good board issues one, immediately gets an address and proceeds to the TTY. The others issue 2 DHCP requests that are never acknowledged, despite apparently continuing to retry (as per the serial log content). 'Interestingly' the same boards are slow at exactly the same point when booting off an SD card. So I really am at a loss here. Bear in mind all 3 have identical flashed images, uEnv settings and are using the same TFTP server + NFS filesystem.

There is a bug report from late 2014 regarding a problem in ifup (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771943) which mentions system.debug-shell. I've got this added to the kernel command line but not sure if it is accessible.

Does the omap-image-builder Ubuntu distro work with NFS?

TAIA.

William_Hermans · July 9, 2015, 7:32pm

Ok, that is interesting. So before making any changes( you really do not want to complicate thing right now, any more than you have to ). From the NFS server issue the command:

sudo exportfs

and paste the output to us here. I’m thinking from the sound of it assuming every board is configured exactly alike, that you only have one address export. Which is why you’re only getting one board to load over NFS / TFTP

William_Hermans · July 9, 2015, 7:35pm

Also, it would be useful to know how you have DHCP setup. e.g. Do you have and external to the NFS server ->DHCP server(router wish DHCP server enabled)? Or is the DHCP server also running on the NFS server machine ?

Mike · July 9, 2015, 8:04pm

So you see the dhcp request, bump up the logging level on your dhcp server, look at it's logs. Could be a config issue with dpcp server. Does Wireshark show unique mac address in the dhcp request?

Mike

William_Hermans · July 9, 2015, 8:15pm

BTW

Does the omap-image-builder Ubuntu distro work with NFS?

Last I heard, Ubuntu period did not work with NFS root. There were a couple other distro’s too. ARCH does, and one other that I can not recall offhand. Oh right, Angstrom + NFSroot is also a nogo.

RobertCNelson · July 9, 2015, 8:17pm

and the Debian "Wheezy" 8.x works fine under nfs..

(my guess it's really sysv vs systemd...)

(If one where to switch our jessie iamge from systemd to the legacy
sysv it probally would work under nfs..)

Regards,

William_Hermans · July 9, 2015, 9:00pm

(my guess it’s really sysv vs systemd…)

I do not. I do not doubt that systemd is problematic where nfsroot is concerned. But one board is booting, and there other two are not, with dhcp requests seemingly being ignored.

What this tells me is that the two boards in question are for some reason not considered to exist as far as the server is concerned. I have very little with setting up / using a Debian dhcp server, so I am not sure how that would come into play. But I do know that using static IPs, the NFS server will ignore any addresses not listed in the NFS exports file, and not exported via exportfs -a.

The problem here seems very similar, but perhaps not directly related. The server for some reason is just seemingly ignoring the other two clients.

@g4 Also if you are unable to resolve this problem otherwise. You can try to setup static IPs for each individual BBB, and then make 3 seperate NFS exports in the NFS exports file. These exports could link to the same directory, but would need to be preceeded by a static IP for each of the BBBs. You can also use the range modifier ( xxx.xxx.xxx.xx[0-9] etc) Then perhaps from there we can be closer to an explanation.

So, this problem is really hard to troubleshoot over the internet. Because . .

You have not given us enough information really. So we have no idea what steps you’ve done, to get where you are now. Too many variables to consider.
You say there are all exactly alike but we do not really know that for sure. Not a character “hit” but no telling what is going on behind the scenes.

g41 · July 9, 2015, 10:41pm

So you see the dhcp request, bump up the logging level on your dhcp server, look at it's logs. Could be a config issue with dpcp server.

Agreed. But I'm going to have to rewire my network with a more dev. friendly DHCP server. Right now it's all being allocated by a BT ADSL router.

Does Wireshark show unique mac address in the dhcp request?

Yep. Even down to recognizing the Texas Instruments ID

g41 · July 10, 2015, 12:53am

and the Debian "Wheezy" 8.x works fine under nfs..

OK. I will have to give that a shot. Are there any issues using with the 4.1.0 kernel?

William_Hermans · July 10, 2015, 1:12am

g4, I’m pretty sure the only real known issues relate to systemd, and upstart in the case of Ubuntu