Hello,
We are using tens of BeagleBones running the same 3.2.30-psp23 kernel and all powered using a 5V/2A power adaptor. On almost one third of nodes system clock is ticking too fast, approximatelly 4 times faster (i.e. Fri Nov 30 16:15:19 UTC 2012 which is correct time versus Mon Dec 3 17:02:37 UTC 2012). We are also running ntpd on all nodes, but larger time difference causes it to panic and exit (-g flag helps, but only for a very short time).
After some investigation I’ve noticed that some of the nodes boot with 32 KHz clock and some with 24MHz clock, but there is no rule that a particular clock is causing a time skew:
[node1]
$ grep -i clock /var/log/dmesg
[ 0.000000] OMAP clockevent source: GPTIMER2 at 24000000 Hz
[ 0.000000] omap_dm_timer_switch_src: Switching to HW default clocksource(sys_clkin_ck) for timer1, this may impact timekeeping in low power state
[ 0.000000] OMAP clocksource: GPTIMER1 at 24000000 Hz
[ 0.000000] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
[ 0.173176] Switching to clocksource gp timer
[ 2.273396] omap_rtc omap_rtc: setting system clock to 2000-01-01 00:00:00 UTC (946684800)
[node2]
$ grep -i clock /var/log/dmesg
[ 0.000000] OMAP clockevent source: GPTIMER2 at 24000000 Hz
[ 0.000000] OMAP clocksource: GPTIMER1 at 32768 Hz
[ 0.000000] sched_clock: 32 bits at 32kHz, resolution 30517ns, wraps every 131071999ms
[ 0.174926] Switching to clocksource gp timer
[ 2.253112] omap_rtc omap_rtc: setting system clock to 2012-10-25 19:56:51 UTC (1351195011)
I tried to compare /sys/kernel/debug/clock/summary files among nodes (after having mounted debugfs as per http://processors.wiki.ti.com/index.php/AM335x_PSP_User’s_Guide#Clock_Management_details) and I noticed that timer1_fck has different parent clock:
dziugas@evalg-data1:/tmp/clock$ diff -u1 node1.summary node2.summary
— node1.summary 2012-11-09 10:40:01.660509296 +0100
+++ node2.summary 2012-11-09 10:42:02.932847298 +0100
@@ -73,3 +73,3 @@
timer2_fck sys_clkin_ck 24000000 1
-timer1_fck sys_clkin_ck 24000000 1
+timer1_fck clk_32768_ck 32768 1
timer0_fck clk_rc32k_ck 32000 0
@@ -170,3 +170,3 @@
tclkin_ck none 12000000 0
-sys_clkin_ck virt_24m_ck 24000000 8
+sys_clkin_ck virt_24m_ck 24000000 7
virt_26m_ck none 26000000 0
@@ -177,2 +177,2 @@
clk_32khz_ck clkdiv32k_ick 32768 0
-clk_32768_ck none 32768 1
+clk_32768_ck none 32768 2
So I suspect that for some reason parent clock is changed after the boot, but system clock is still calculating jiffies according to the frequency that was available during the boot and it causes a time drift.
Does anybody else have the same problem and could assist finding a solution for it? Recent clock changes by TI mainly deal with suspend feature (http://e2e.ti.com/support/dsp/sitara_arm174_microprocessors/f/791/p/180128/654964.aspx) which we do not use, so there should be something else causing this clock problem. Our nodes are placed in different physical locations, so unreliable power source could hardly be the cause. For now we plan to run ntpdate every minute instead of ntpd, but this is a workaround rather than solution.
Thanks,
Džiugas