GPT1 timer issue aka UART3 hang issue on Beagle Board

Hi all,

History:

Syed Mohammed, Khasim schreef:

Hi all,

History:

On Beagle board, the UART3 port suddenly hangs but the board remains alive with other peripherals connected like LCD or USB

Analysis done so far:

1) Initially we thought it was an UART 3 issue
2) After connecting a Lauterbach debugger I observed that GPT1 timer which is configured system timer for kernel doesn't generate interrupts at required intervals (HZ=128).
3) Generally the kernel operates this timer in auto-reload and one shot mode,
4) When dynamic tick feature is disabled this makes GPT1 timer operating only in auto-reload mode.
5) According to auto-reload mode the Timer counter register will be automatically loaded with the value present in Timer load register and starts counting till it overflows.
6) For every overflow an interrupt occurs that updates the kernel timers.

The issue:

1) When the UART3 console hangs, if I look at GPT1 register set the Counter register is counting from an unknown value (actually the counter register should count from 0xFFFFFF00) after overflow it should get auto re-loaded to 0xFFFFFF00. But for some reason I see it reloaded to 0 or some random value
2) When the counter starts counting from random value there are no more overflow interrupts till it overflows which might take a long time (haven't checked it yet)
3) As the counters are wrong, there is no overflow interrupt and hence the kernel timer doesn't get updated so the UART console hangs that is waiting on jiffies I believe.
4) NOTE: This issue is observed only on Beagle board so far. Other boards like overo, evm, zoom, etc running with OMAP3 are doing fine.

Observation:

1) When UART console hangs from debugger, I correct the GPT1 counter register by writing 0xFFFFFF00, and run the debugger.
2) This will bring back the Console to a live state

Immediate fix available:

1) Instead of using 32K timer for GPT1, if we use MPU timer then this hang will not happen.

CONCLUSION:

1) The issue as to why GPT1 counter register is getting corrupted or reloaded to a junk value when running with 32K clock source is the ISSUE to be resolved for beagle.

I changed OE to use MPU_TIMER:
http://gitweb.openembedded.net/?p=org.openembedded.dev.git;a=commitdiff;h=070e0ebc5812218fc2a26c807d236e6a769de960

regards,

Koen

hi friends,
CAn anyone tell me howto use the character lcd display on omap 2430 board. i want to test it and use it at assembly level

(adding several people to the cc)

Hello,

On Beagle board, the UART3 port suddenly hangs but the board remains
alive with other peripherals connected like LCD or USB

(and sysrq-q works)

1) The issue as to why GPT1 counter register is getting corrupted or
reloaded to a junk value when running with 32K clock source is the
ISSUE to be resolved for beagle.

Several of us discussed this briefly yesterday. Richard Woodruff
commented that a similar issue was seen on some OMAP boards due to
glitches on the 32kHz clock input. Richard, perhaps you could provide
more detail about the glitch waveforms, what they were caused by, and how
they were fixed (board redesign?)

Since these timer problems currently only appear to affect some (but not
all) BeagleBoards, this seems like the most promising approach. Richard,
it would be ideal if you could ensure that something is added to the
Timers section of the TRM about this issue.

Kevin, do you recall why we are using the 32kHz timer for clockevents
rather than sys_ck? ("MPU TIMER" seems to be a misnomer here.) Is
this due to resonator jitter on sys_ck?

A few comments on Khasim's notes - several people from #beagle have been
kind enough to send GPTIMER register dumps to me when this problem
occurred. Some of them did show "junk" values in TCRR. However, on the
BeagleBoard here, I personally haven't observed this. Here generally the
timer does not raise an interrupt when the counter passes either the
overflow condition or the match condition.

Appreciation also to Philip Balister, Dirk Behme, Koen Kooi, and Steve
Sakoman, all of whom tested patches for this problem and took the time
to send debug data.

- Paul

Several of us discussed this briefly yesterday. Richard Woodruff
commented that a similar issue was seen on some OMAP boards due to
glitches on the 32kHz clock input. Richard, perhaps you could provide
more detail about the glitch waveforms, what they were caused by, and
how they were fixed (board redesign?)

Yes, I've seen the described behavior a few times over the years on OMAP. In some case it was programmatic errors and in others it was a hardware issue.

Programmatically:

- Improper accesses before posting is complete will cause unpredictable results to GPTIMER

- There used to be a restriction about bus to function clock ratio's which would cause an issue if posting was enabled

- Bugs in early HLOS timer management code would stop the timer wrongly.

Hardware:

- The 32KHz clock signal back on 2430 was shared among a few things and it was seen that some noise could be induced on the line. When this happened a glitch was felt in the GPTIMER and the counter was seen to jump.
        - using a sys_clk input instead of 32K never had errors
        - using a clean bench 32KHz reference was ok
        [x] Finally the hardware clock distribution was changed on 2430SDP and the error went away. I think 3430SDP design followed this.

- On the old 1510 which used a different timer block there was also some restrictions on when you could update registers due to some timer module re-synchronization issues with the 32K clock.

Since these timer problems currently only appear to affect some (but
not all) BeagleBoards, this seems like the most promising approach.
Richard, it would be ideal if you could ensure that something is added > to the Timers section of the TRM about this issue.

The TRM section does have warnings about problem areas related to posting. It might also be kind of expected that signals are all clean in any final design.

Kevin, do you recall why we are using the 32kHz timer for clockevents
rather than sys_ck? ("MPU TIMER" seems to be a misnomer here.) Is
this due to resonator jitter on sys_ck?

Well, you can run on a 32KHz timer and go to OFF mode but I don't think you can do the same on sys_ck. This functional clock will keep a request pending for sys_ck. When it's on the 32K its not an issue as that says on.

You would have to do some messy timer handoff to get it working.

Regards,
Richard W.

Paul Walmsley <paul@pwsan.com> writes:

(adding several people to the cc)

Hello,

On Beagle board, the UART3 port suddenly hangs but the board remains
alive with other peripherals connected like LCD or USB

(and sysrq-q works)

1) The issue as to why GPT1 counter register is getting corrupted or
reloaded to a junk value when running with 32K clock source is the
ISSUE to be resolved for beagle.

Has anyone tried with tickless and/or hrtimers disabled? If either of
these are enabled, the clockevent timer will be continuously
reprogrammed by the generic timer subsystem. Maybe there is still a
bug/race in the gptimer/dm-timer programming code.

There were some recent changes around using posted mode that should
probably be investigated.

Several of us discussed this briefly yesterday. Richard Woodruff
commented that a similar issue was seen on some OMAP boards due to
glitches on the 32kHz clock input. Richard, perhaps you could provide
more detail about the glitch waveforms, what they were caused by, and how
they were fixed (board redesign?)

Since these timer problems currently only appear to affect some (but not
all) BeagleBoards, this seems like the most promising approach. Richard,
it would be ideal if you could ensure that something is added to the
Timers section of the TRM about this issue.

Kevin, do you recall why we are using the 32kHz timer for clockevents
rather than sys_ck? ("MPU TIMER" seems to be a misnomer here.) Is
this due to resonator jitter on sys_ck?

Terminolgy is a bit confusing here, since there is a 32k sync timer
and a 32k source clock for the GPtimers. Here's a simplified summary:

#ifdef CONFIG_OMAP_32K_TIMER
clockevent: GPT1 with 32k source
clocksource: 32k sync timer (free-running)
#else
clockevent: GPT1 with sys_clk source
clocksource: GPTn with sys_clk source (free-running)
#endif

When I configured things this way, I was aiming for the ability to hit
retention or even OFF in idle when using CONFIG_OMAP_32K_TIMER. And
the CONFG_OMAP_MPU_TIMER was to be more for the case when you want
real high-resolution timers.

Using GPT1 (in WKUP) with a 32k source, it will always be on and able
to wakeup the system. Since this is the clockevent, the
dynticks/tickless subsytem will "just work" and retention can be hit
in tickless.

If using sys_ck as the GPT1 source clock, I don't believe you can hit
retention or OFF mode.

From: Kevin Hilman [mailto:khilman@deeprootsystems.com]
Sent: Thursday, August 07, 2008 5:53 AM
To: Paul Walmsley
Cc: Syed Mohammed, Khasim; discussion@beagleboard.org; Dirk Behme; Koen Kooi; Steve Sakoman; Coley,
Gerald; Kridner, Jason; igor.stoppa@nokia.com; Woodruff, Richard; tony@atomide.com
Subject: Re: GPT1 timer issue aka UART3 hang issue on Beagle Board

Paul Walmsley <paul@pwsan.com> writes:

> (adding several people to the cc)
>
> Hello,
>
>
>> On Beagle board, the UART3 port suddenly hangs but the board remains
>> alive with other peripherals connected like LCD or USB
>
> (and sysrq-q works)
>
>> 1) The issue as to why GPT1 counter register is getting corrupted or
>> reloaded to a junk value when running with 32K clock source is the
>> ISSUE to be resolved for beagle.

Has anyone tried with tickless and/or hrtimers disabled?

I have tried this combination, disabling dyn tick or tickles timers and HRT. Also tried enabling them all. The observation is:

With Dynamic tick disabled, the timer is operated in auto reload - this is the main cause for trouble as it loads from a random value instead of getting loaded with what is present in Load register.

If Dynamic tick is enabled, the timer operates in oneshot and auto reload mode based on the cycle time computed in upper layer, here again one shot seems to work fine but when the timer enters in auto reload mode, the same problem (described above).

If either of
these are enabled, the clockevent timer will be continuously
reprogrammed by the generic timer subsystem. Maybe there is still a
bug/race in the gptimer/dm-timer programming code.

<snip>

Regards,
Khasim

"Syed Mohammed, Khasim" <khasim@ti.com> writes:

From: Kevin Hilman [mailto:khilman@deeprootsystems.com]
Sent: Thursday, August 07, 2008 5:53 AM
To: Paul Walmsley
Cc: Syed Mohammed, Khasim; discussion@beagleboard.org; Dirk Behme; Koen Kooi; Steve Sakoman; Coley,
Gerald; Kridner, Jason; igor.stoppa@nokia.com; Woodruff, Richard; tony@atomide.com
Subject: Re: GPT1 timer issue aka UART3 hang issue on Beagle Board

Paul Walmsley <paul@pwsan.com> writes:

> (adding several people to the cc)
>
> Hello,
>
>
>> On Beagle board, the UART3 port suddenly hangs but the board remains
>> alive with other peripherals connected like LCD or USB
>
> (and sysrq-q works)
>
>> 1) The issue as to why GPT1 counter register is getting corrupted or
>> reloaded to a junk value when running with 32K clock source is the
>> ISSUE to be resolved for beagle.

Has anyone tried with tickless and/or hrtimers disabled?

I have tried this combination, disabling dyn tick or tickles timers
and HRT. Also tried enabling them all. The observation is:

With Dynamic tick disabled, the timer is operated in auto reload -
this is the main cause for trouble as it loads from a random value
instead of getting loaded with what is present in Load register.

OK, if it is happening in auto-reload mode, at least this makes the
gptimer/dm-timer stop/start/reprogram code less suspect since the
problem is happening even when the timer is not being reprogrammed.
My next guess is that something is not being properly saved/restored
across retention.

Is there any pattern to when you see the bogus value being reloaded?

e.g., is there any correlation between hitting MPU retention in idle
and this bogus value? I think it is worth a simple experiment to
disable retention and see if the problem still happens.

In theory, this shouldn't have any affect since GPT1 is in the WKUP
domain, but I think it is worth an experiment.

Other than that, I'm out of ideas until I can experiment on my own
board. Do you have a .config or test setups where this is easy to
reproduce? or does it happen all the time with
CONFIG_OMAP_32K_TIMER.

Kevin

Hello,

By the way, here's a link to Khasim's full original post:

http://groups.google.com/group/beagleboard/browse_thread/thread/566283f7496d19eb/6af9e338ea5104d5?lnk=gst&q=khasim+issue#6af9e338ea5104d5

Looks like Gerald may be traveling and unavailable for the moment to
comment on 32kHz clock routing.

Paul Walmsley <paul@pwsan.com> writes:

>
>> On Beagle board, the UART3 port suddenly hangs but the board remains
>> alive with other peripherals connected like LCD or USB
>
> (and sysrq-q works)
>
>> 1) The issue as to why GPT1 counter register is getting corrupted or
>> reloaded to a junk value when running with 32K clock source is the
>> ISSUE to be resolved for beagle.

Has anyone tried with tickless and/or hrtimers disabled? If either of
these are enabled, the clockevent timer will be continuously
reprogrammed by the generic timer subsystem.

All of the testing on my Beagle has been with NO_HZ, so the clockevent
timer is running in one-shot mode and is being reprogrammed regularly.

There were some early suspicions that TCRR was not being loaded correctly,
so modified omap_dm_timer_set_load_start() to read TCRR after the write
and to loop until the read value matched the write. This didn't resolve
the hang.

Maybe there is still a bug/race in the gptimer/dm-timer programming
code.

Could be, but strange that it would only affect certain OMAP3 chips?

There were some recent changes around using posted mode that should
probably be investigated.

The problem still happened when 7622c36330e1fd365dc258bbb40996ec9a1539b6
and f620756fb38895c1095b797f8fdf74243e128a1d were reverted here.

Using GPT1 (in WKUP) with a 32k source, it will always be on and able
to wakeup the system. Since this is the clockevent, the
dynticks/tickless subsytem will "just work" and retention can be hit
in tickless.

If using sys_ck as the GPT1 source clock, I don't believe you can hit
retention or OFF mode..

Ah, thanks, that makes sense.

It would be interesting to try GPTIMER12 instead, which supposedly has its
own 32kHz internal oscillator and might not be affected by noise on the
external 32kHz line. Unfortunately, attempting to use it seems to wedge
the Beagle very early during boot - perhaps there is some firewall issue?

- Paul

A quick followup to this; the GPTIMER12 issue was caused by a bad IRQ
assignment - patch posted to linux-omap.

- Paul

Hello,

thanks for the shared experience, Richard,

Hardware:

- The 32KHz clock signal back on 2430 was shared among a few things and
it was seen that some noise could be induced on the line.

It will be good when Gerald gets back. Perhaps he can give this issue a
look.

Looking at the BeagleBoard rev B4 Hardware Reference Manual[1], the 32kHz
clock is not shared. That is, it runs from the oscillator[2] to the
TWL4030 to the OMAP. There is a testpoint spur TP23 between the TWL4030
and the OMAP which may be worth some attention; also the values of C68/C69
might bear a second look.

> Richard, it would be ideal if you could ensure that something is added
> > to the Timers section of the TRM about this issue.

The TRM section does have warnings about problem areas related to
posting. It might also be kind of expected that signals are all clean
in any final design.

It seems harmless to add an explicit note, since this issue apparently is
not expected and can recur, and thus hopefully save time and money:

"WARNING: GPTIMER counters are sensitive to noise on the 32kHz clock input
and can skip ticks or warp to unpredictable values if the sys_32k input
clock is out of specification when the 32kHz clock source is used. See
the OMAP34xx Data Manual or the power companion IC Data Manual for clock
input specifications."

- Paul

[1]. <http://www.beagleboard.org/uploads/Beagle_HRM_B4.pdf> pp. 117,
119.

[2]. FX135 data sheet, <http://www.foxonline.com/pdfs/fx135.pdf>

Hello,

> It would be interesting to try GPTIMER12 instead, which supposedly has its
> own 32kHz internal oscillator and might not be affected by noise on the
> external 32kHz line. Unfortunately, attempting to use it seems to wedge
> the Beagle very early during boot - perhaps there is some firewall issue?

A quick followup to this; the GPTIMER12 issue was caused by a bad IRQ
assignment - patch posted to linux-omap.

This workaround patch set switches Beagle to use GPTIMER12 (the "secure
GPTIMER"):

   http://www.pwsan.com/omap/gptimer_workaround_4.tar.gz

It survived 20 hours on my Beagle; further testing would be appreciated.
Koen has graciously built a kernel for OE with the patch included; this is
available here:

   http://ewi546.ewi.utwente.nl/~koen/uImage-2.6.26-r59-beagleboard.bin

...

If there is an issue with the 32kHz clock input, this workaround will only
address one affected component; there could be other affected features.
A superficial look at the clock tree and the TWL4030 TRM suggests that
GPIOs, other GPTIMERs, and USB on Beagle might be affected.

- Paul