4.1.16-bone18 #26 PREEMPT & CPU stuck for 90 second

Hi,

I run 3 process doing serial test ( write on ttyO with a 5ms sleep between each send ) and i got this yesterday :

[28744.255809] INFO: rcu_preempt self-detected stall on CPU
[28744.255852] 0: (1 ticks this GP) idle=af5/1/0 softirq=1389832/1389832 fqs=0
[28744.255862] (t=9593 jiffies g=880748 c=880747 q=3814)
[28744.255874] rcu_preempt kthread starved for 9593 jiffies!
[28744.255881] Task dump for CPU 0:
[28744.255889] swapper R running 0 0 0 0x00000002
[28744.255962] [] (unwind_backtrace) from [] (show_stack+0x11/0x14)
[28744.255990] [] (show_stack) from [] (rcu_dump_cpu_stacks+0x6b/0xac)
[28744.256011] [] (rcu_dump_cpu_stacks) from [] (rcu_check_callbacks+0x3fb/0x704)
[28744.256030] [] (rcu_check_callbacks) from [] (update_process_times+0x2f/0x54)
[28744.256051] [] (update_process_times) from [] (tick_sched_handle+0x21/0x4c)
[28744.256065] [] (tick_sched_handle) from [] (tick_sched_timer+0x3f/0x68)
[28744.256078] [] (tick_sched_timer) from [] (__run_hrtimer+0x61/0x1e0)
[28744.256090] [] (__run_hrtimer) from [] (hrtimer_interrupt+0xcb/0x220)
[28744.256109] [] (hrtimer_interrupt) from [] (omap2_gp_timer_interrupt+0x23/0x28)
[28744.256136] [] (omap2_gp_timer_interrupt) from [] (handle_irq_event_percpu+0x5b/0x174)
[28744.256151] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x39/0x58)
[28744.256168] [] (handle_irq_event) from [] (handle_level_irq+0x8b/0xd8)
[28744.256183] [] (handle_level_irq) from [] (generic_handle_irq+0x23/0x2c)
[28744.256195] [] (generic_handle_irq) from [] (__handle_domain_irq+0x57/0xac)
[28744.256208] [] (__handle_domain_irq) from [] (omap_intc_handle_irq+0x85/0x8c)
[28744.256236] [] (omap_intc_handle_irq) from [] (__irq_svc+0x3b/0x80)
[28744.256245] Exception stack(0xc0a4ff50 to 0xc0a4ff98)
[28744.256255] ff40: 00000001 00000000 00000000 c001c041
[28744.256265] ff60: c0a4e000 c0a510f8 00000000 00000000 c0b01a48 00000001 c0aebb70 c0a51104
[28744.256275] ff80: c0a4e03c c0a4ff98 c000efff c000f000 40000033 ffffffff
[28744.256294] [] (__irq_svc) from [] (arch_cpu_idle+0x28/0x2c)
[28744.256314] [] (arch_cpu_idle) from [] (cpu_startup_entry+0x1e1/0x248)
[28744.256346] [] (cpu_startup_entry) from [] (start_kernel+0x33f/0x348)
[28744.256441] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 90s! [swapper:0]
[28744.256450] Modules linked in: tda998x usb_f_acm u_serial usb_f_rndis g_multi usb_f_mass_storage u_ether libcomposite ti_am335x_adc rtc_ds1307 tmp102 mma8452 snd_soc_tlv320aic3x snd_soc_davinci_mcasp snd_soc_edma snd_soc_omap snd_soc_core ad799x industrialio_triggered_buffer kfifo_buf snd_compress snd_pcm_dmaengine industrialio gpio_mcp23s08 snd_pcm snd_seq snd_seq_device uio_pruss tilcdc snd_timer omap_aes omap_sham omap_rng drm_kms_helper rng_core snd spi_omap2_mcspi ti_am335x_tsc soundcore ti_am335x_tscadc uio_pdrv_genirq leds_gpio uio
[28744.256588] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.16-bone18 #26
[28744.256594] Hardware name: Generic AM33XX (Flattened Device Tree)
[28744.256603] task: c0a552f0 ti: c0a4e000 task.ti: c0a4e000
[28744.256615] PC is at arch_cpu_idle+0x28/0x2c
[28744.256624] LR is at arch_cpu_idle+0x27/0x2c
[28744.256633] pc : [] lr : [] psr: 40000033
[28744.256633] sp : c0a4ff98 ip : c0a4e03c fp : c0a51104
[28744.256642] r10: c0aebb70 r9 : 00000001 r8 : c0b01a48
[28744.256649] r7 : 00000000 r6 : 00000000 r5 : c0a510f8 r4 : c0a4e000
[28744.256657] r3 : c001c041 r2 : 00000000 r1 : 00000000 r0 : 00000001
[28744.256667] Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment kernel
[28744.256675] Control: 50c5387d Table: 9deb0019 DAC: 00000015
[28744.256684] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.16-bone18 #26
[28744.256690] Hardware name: Generic AM33XX (Flattened Device Tree)
[28744.256707] [] (unwind_backtrace) from [] (show_stack+0x11/0x14)
[28744.256732] [] (show_stack) from [] (watchdog_timer_fn+0x191/0x1c4)
[28744.256747] [] (watchdog_timer_fn) from [] (__run_hrtimer+0x61/0x1e0)
[28744.256759] [] (__run_hrtimer) from [] (hrtimer_interrupt+0xcb/0x220)
[28744.256773] [] (hrtimer_interrupt) from [] (omap2_gp_timer_interrupt+0x23/0x28)
[28744.256791] [] (omap2_gp_timer_interrupt) from [] (handle_irq_event_percpu+0x5b/0x174)
[28744.256804] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x39/0x58)
[28744.256819] [] (handle_irq_event) from [] (handle_level_irq+0x8b/0xd8)
[28744.256832] [] (handle_level_irq) from [] (generic_handle_irq+0x23/0x2c)
[28744.256844] [] (generic_handle_irq) from [] (__handle_domain_irq+0x57/0xac)
[28744.256857] [] (__handle_domain_irq) from [] (omap_intc_handle_irq+0x85/0x8c)
[28744.256873] [] (omap_intc_handle_irq) from [] (__irq_svc+0x3b/0x80)
[28744.256880] Exception stack(0xc0a4ff50 to 0xc0a4ff98)
[28744.256889] ff40: 00000001 00000000 00000000 c001c041
[28744.256900] ff60: c0a4e000 c0a510f8 00000000 00000000 c0b01a48 00000001 c0aebb70 c0a51104
[28744.256909] ff80: c0a4e03c c0a4ff98 c000efff c000f000 40000033 ffffffff
[28744.256923] [] (__irq_svc) from [] (arch_cpu_idle+0x28/0x2c)
[28744.256936] [] (arch_cpu_idle) from [] (cpu_startup_entry+0x1e1/0x248)
[28744.256951] [] (cpu_startup_entry) from [] (start_kernel+0x33f/0x348)

is it a bug on the kernel ?

Micka,

It is not a bug.

https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

Regards,
John

But the CPU is stuck for 90 second, is that normal ?

Micka,

Best to read this article to understand why this happens. My guess is your app is probably starving the kernel of time to run its other functions and hence you get this warning. Most likely you are disabling interrupts for extended periods of time. Running multiple threads, each running every 5ms is not a good idea considering the thread switching time.

Regards,
John

Ok, thank you for your answer .

Micka,