Debian 8.1 / kernel 4.1.x test releases are unstable

Results from overnight test:

I used the worst rebooters for some tests:

(1) System bb1cf1 got installed with 3.19.3-bone4: no more reboot
uptime
03:23:37 up 14:50, 1 user, load average: 0.00, 0.01, 0.05

(2) System bb6c1f: installed with 4.1.1-ti-r2 #1 SMP PREEMPT Wed Jul 8 17:03:29 UTC 2015 armv7l GNU/Linux: 2 reboots
Jul 13 00:55:17 bb6c1f kernel: [ 0.000000] Booting Linux on physical CPU 0x0
Jul 13 01:51:02 bb6c1f kernel: [ 0.000000] Booting Linux on physical CPU 0x0

(3) System bb4f8e still has 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux: but cpufreq-set -g performance: no more reboot
uptime
03:29:57 up 9:01, 1 user, load average: 0.02, 0.06, 0.05

As I have to leave for the day, I will let all my systems run for at least 12h without changes.
If then still like this, I will do (1) and (3) on some more devices.

@RobertCNelson: If you have further suggestions which image to test, let me know.

— Guenter (dl4mea)

(3) System bb4f8e still has 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux: but cpufreq-set -g performance: no more reboot

Interesting . . . If memory serves correctly, that was the “fix” for an older kernel. So possibly older code crept into the newer ?

OK.

I took my Rev.C unit (1c:ba:8c:d9:5e:dd) and loaded “bone-debian-8.1-lxqt-4gb-armhf-2015-07-05-4gb.img” onto a

16 GB uSD card. Unit, power supply and card are “trusted.”

Absolutely no changes to the image, just install, boot, run. No updates, additions or modifications.
No cape, only connections are 5V power and Ethernet.

Times are GMT/UTC. I define the boot completion as the time when systemd updates the internal time from the network.

Initial boot completion: Jul 12 19:12:09
Autonomous reboot: Jul 13 10:54:19

This time it took 15 hours for the autonomous reboot to occur. I’ll let this one keep going, and report.

— Graham

What does “cpufreq-set -g performance” do?

Sounds like it would lock the BBB at max CPU clock speed, or at least, not let it go down to the lowest speeds.

I found the generic Debian docs on cpufreq-set, but not the BBB specific instruction set and meanings.

— Graham

What does "cpufreq-set -g performance" do?
Sounds like it would lock the BBB at max CPU clock speed, or at least, not
let it go down to the lowest speeds.

Correct...

I found the generic Debian docs on cpufreq-set, but not the BBB specific
instruction set and meanings.

It's a generic kernel interface, nothing bbb specific about it..

Regards,

debian@beaglebone:~$ uptime
11:58:48 up 23:22, 1 user, load average: 0.22, 0.07, 0.06
debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linux
debian@beaglebone:~$ cat /etc/dogtag
BeagleBoard.org Debian Image 2015-03-01
debian@beaglebone:~$ cpufreq-info
cpufrequtils 008: cpufreq-info © Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
driver: cpufreq-dt
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 300 us.
hardware limits: 300 MHz - 1000 MHz
available frequency steps: 300 MHz, 600 MHz, 800 MHz, 1000 MHz
available cpufreq governors: conservative, ondemand, userspace, powersave, performance
current policy: frequency should be within 300 MHz and 1000 MHz.
The governor “ondemand” may decide which speed to use
within this range.
current CPU frequency is 300 MHz.
cpufreq stats: 300 MHz:0.04%, 600 MHz:0.00%, 800 MHz:0.00%, 1000 MHz:99.96% (4)

I’ll let it idle longer, but pretty sure it will have no problems. This board is an element14 REVC for what that’s worth.

Oh, and in case this might be relevant. The board is powered by USB, with only ethernet plugged in.

Interesting. I'd look between 4.0-rcx and 4.0. Good luck.

Ok, so what I am wondering is: Why if this is kernel does the kernel work fine for me. When using a Wheezy 7.8 rootfs ? It is fairly safe to say that in my own case this is not related to kernel, or kernel modules . . .right ?

Tried cpufreq-set -g performance on a BBB but got a reset after a few hours anyway. Problem appears to be some other…

Results after two days overnight test:

(1) System bb1cf1 got installed with 3.19.3-bone4: still no reboot
uptime
04:19:11 up 1 day, 13:51, 2 users, load average: 0.00, 0.01, 0.05

(2) System bb6c1f: installed with 4.1.1-ti-r2 #1 SMP PREEMPT Wed Jul 8 17:03:29 UTC 2015 armv7l GNU/Linux: 2 reboots to a total of 4
Jul 13 00:55:17 bb6c1f kernel: [ 0.000000] Booting Linux on physical CPU 0x0
Jul 13 01:51:02 bb6c1f kernel: [ 0.000000] Booting Linux on physical CPU 0x0
Jul 13 21:46:08 rc6c1f kernel: [ 0.000000] Booting Linux on physical CPU 0x0
Jul 14 04:02:45 rc6c1f kernel: [ 0.000000] Booting Linux on physical CPU 0x0

(3) System bb4f8e still has 4.1.0-rc8-bone9 #1 Wed Jun 17 00:05:43 UTC 2015 armv7l GNU/Linux: but cpufreq-set -g performance: rebooted 15:50 after around 20h uptime, before it had 6 reboots within 24h

(4) some other systems ran with cpufreq-set -g performance, feeling is that the number of reboots decreased

My conclusion:

  • cpufreq-set -g performane seems to improve the situation, but does not solve it.
  • 3.19.3-bone4 is stable
    — Guenter (dl4mea)

Still trucking along here:

debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linux
debian@beaglebone:~$ uptime
22:18:07 up 1 day, 9:41, 1 user, load average: 0.08, 0.03, 0.05

By the way, I’m using default “ondemand” cpufreq governor

Could this possibly be related to how “clean” provided AC mains is ? I’m just curious, as we’ve never had any of these problems, but we’re also completely off grid. Also for the record our power here is very stable and clean. No blips, spikes, or any abnormalities one might see being connected to grid power.

Anyway my last comment was a bit of a stretch. Seeing as this only effects some boards, and not all. However it does strike me as odd that both of you are having issues with the same kernel I’m running right_now. When it is running rock solid so far for me.

Which leads me to believe that something on the rootfs is perhaps somehow to blame. Either that, on something on these failing boards is somehow slightly out of tolerance. I’ll let this run a while longer just to make sure before moving on to a Jessie image.

Do also keep in mind that while we do not own 100’s of BBB’s we do own 5, and none of these have ever shutdown without a reason . . . 2 A5A’s and 3 element14 REVC’s

Please give 4.1.2-ti-r3 some testing, as it has pm/cpuidle fixes from ti.

sudo apt-get update
sudo apt-get install linux-image-4.1.2-ti-r3

Regards,

I installed 4.1.2-ti-r3 on 13 devices. Without executing cpufreq-set -g performance.
First impression is not good, as I had 3 reboots since then, but more info after the night about 8h.

Please give 4.1.2-ti-r3 some testing, as it has pm/cpuidle fixes from ti.

sudo apt-get update
sudo apt-get install linux-image-4.1.2-ti-r3

Robert, would it be helpful if I ran this on the wheezy 7.8 image ?

err, Wheezy 7.8 rootfs is what I meant to type . . .

That shouldn't matter.. Mine rebooted after 3 hours.. I'm also
unplugging the ethernet to test an offline "npm install xyz" script..
what's odd, it rebooted on "git pull"..

Regards,

Ok. Well before I reboot and run that linux-image . .

debian@beaglebone:~$ uptime
13:57:28 up 2 days, 1:21, 1 user, load average: 0.20, 0.17, 0.11
debian@beaglebone:~$ uname -a
Linux beaglebone 4.1.0-rc8-bone9 #1 Tue Jun 16 23:45:22 UTC 2015 armv7l GNU/Linu