PRU "DMA" signalling to Linux Kernel cache issue

Hello,

I encounter a caching issue with my application. I will describe what I want to do, how I planned to do it and what goes wrong:

WHAT:

By using the PRU on my BBB, I want to timestamp a periodic rising edge on one input pin in a nanosecond scale and signal it to a Linux Kernel module on the ARM.

HOW:

To receive an interrupt in my kernel module, I bridged the pin with the rising edge to a second one (timer4 interrupt).
This interrupt fires a few microseconds after the event happened.

To read values from the PRU with best determinism and lowest latency, I allocated some DDR memory with dma_alloc_coherent() in my kernel module and handout the address via debugfs to the PRU.

The PRU is in endless loop:

wait for rising edge, read out the PRU cycle counter and write the cycle counter to the DDR memory address.
This works like a charm and I got the event’s cycle counter snapshot in my kernel module!

The kernel module interrupt is firing a few microseconds after the event and has some jitter I want to avoid.

So I decided it would be best to burst the actual cycle counter to a second ram address for a ten thousand times by the PRU so when the Kernel module reads this ddr location, It knows the difference from the event’s cycle counter and the cycle counter now.

This does not work!

WRONG:

Initially everything appered to be working. I was reading out for example:

event cycles: 1000
now cycles: 4300

great!

But to test the “now cycles” counter, I added to the kernel module to read it thousand times in a loop. Guess what? It is thousand times the same.

I tried a few options. For example to write and read two alternating memaddresses for this “now counter” by kernel and pru but nothing gives the results I expected.
It seems like anyone caches the results…

Any help? So many thanks… I hope the problem can be understood.
Tom

Hello,

I encounter a caching issue with my application. I will describe what I want
to do, how I planned to do it and what goes wrong:

WHAT:

By using the PRU on my BBB, I want to timestamp a periodic rising edge on
one input pin in a nanosecond scale and signal it to a Linux Kernel module
on the ARM.

OK, I think I understand what you want to do.

HOW:

To receive an interrupt in my kernel module, I bridged the pin with the
rising edge to a second one (timer4 interrupt).
This interrupt fires a few microseconds after the event happened.

To read values from the PRU with best determinism and lowest latency, I
allocated some DDR memory with dma_alloc_coherent() in my kernel module and
handout the address via debugfs to the PRU.

The PRU is in endless loop:

wait for rising edge, read out the PRU cycle counter and write the cycle
counter to the DDR memory address.
This works like a charm and I got the event's cycle counter snapshot in my
kernel module!

That's pretty clever.

The kernel module interrupt is firing a few microseconds after the event and
has some jitter I want to avoid.

So I decided it would be best to burst the actual cycle counter to a second
ram address for a ten thousand times by the PRU so when the Kernel module
reads this ddr location, It knows the difference from the event's cycle
counter and the cycle counter now.

This does not work!

WRONG:

Initially everything appered to be working. I was reading out for example:

event cycles: 1000
now cycles: 4300

great!

But to test the "now cycles" counter, I added to the kernel module to read
it thousand times in a loop. Guess what? It is thousand times the same.

Just to make sure, did you declare the variables as volatile? If you
forgot, the compiler could be playing tricks on you by not reloading
them.

So, I am a little suspicious of the numbers, both of their magnitude
and of the relative difference. If the interrupt latency is few us,
then the cycle count difference should be few hundred:
PRU runs at 200MHz and your code presumably does the cycle count read,
DDR write and a loop/backward jump.

To check the exact timing, you'd have to provide the actual code,
either the relevant loop snippet or put the whole thing on pastebin.

Are you sure that you're using the correct word size?