PRU Clocks phase shifting with respect to each other??

Hi,

I am trying to debug a weird behavior.

I have code running on PRU 1 that creates a specific waveform on 8 pru output pins (pru1_0 - pru1_7).

I am trying to confirm the accuracy of my PRU 1 code by running code on PRU 0 that watches the pins on PRU 1 (physically linked) and measures their period, duty, etc. against the CYCLE counter in the PRU0 control register.

All is well and good except that some of my measurements are off by ±1 instruction… so where I hope to see 1280 instructions in a period, I am seeing 1279 or 1281.

Interestingly this changes… apparently regularly? If I take the same measurement over and over I will get measurements for about 2 seconds where the result is -1 and then 2 seconds of measurements where the result is +1… then back to -1, etc.

One explanation for this is that there is a slight phase shift between the PRU1 and PRU0 clocks.

…But that goes against my understanding that the PRU clocks are both derived directly from the main CPU clock.

Still, the phase shift explanation fits the data really really nicely.

Is a phase shift between the PRU clocks even possible?

Thanks,

Bill

You should really have your test equipment running in a different clock domain than your device under test, and then trigger off your unit under test.
Very small phase or time differences between the unit under test and the test equipment can lead to major data differences.
You are trying to generate and measure on the same clock edges. Very tiny time shifts can move you a whole clock cycle.
— Graham

Thanks Graham.

Yeah, I know. The thing is that having the two share the same chip makes things really fast. In my app I need to make about 125M different measurements, so the difference between a 1ms test and a 10ms makes a huge difference.

In fact, I don’t need accuracy down to a single instruction. The ±1 thing is not going to be a problem from an application perspective, I just want to know what is going on so that I can properly account for it.

There are a number of things that it could be, but a phase shift seems like the best fit at this point.

I thought about the notion that we are on the same clock edge, but that doesn’t fit quite right in my mind. If things were truly on the same clock edge… or even 2ps off, then given that everything is running in the same way every time, that 2ps should hit and get registered the same way each time. It just doesn’t make sense that it is dancing this way and that… or at least that doesn’t make sense given what I know about how microcontrollers work. Perhaps there is more to the story than I am aware of.

B

Bill:

If you are on the clock edge, and by definition all transitions occur on the clock edges, and most data is clocked in on a clock edge.
Then a slight drift in power supply voltage or temperature can change things a few nano seconds, and you will move ahead or back one count.

You are living on the edge so to speak.

Or, if you can, throw an inverter in the external clock line between the source and measurement PRU, so they run a half clock cycle apart.

If you can’t fix it, break it big.

If you read the spec on a frequency counter, you see an accuracy spec, and a disclaimer plus/minus one count to deal with these effects.

You really see it if you try to measure a counter’s time base with the same counter.

— Graham

Have you looked at the wall clock differences, or just the number of
instructions executed as a proxy for the time taken? It's possible
for PRU instructions to stall (accessing resources outside the PRU
domain, or when both PRU cores try to access the same resource). So
you might be seeing a timing effect (real-world signal propagation
time between the two PRU cores moving forward/backward a clock edge),
or you may be running a different number of instructions in the same
amount of time. Or both.

You can use the Ethernet timer in the PRU as a "wall clock" that is
unaffected by any PRU instruction stalls, but if you're wanting to
rapidly and accurately measure time periods, use the hardware timers
in capture mode. That's what they're for, and you'll have single
clock cycle accuracy. The PRU will always have at least a few
instructions of uncertainty since it takes more than one instruction
to sample the value and capture the current "time".

Fascinating! Thanks guys.

Graham,
Frequency counting is exactly what I am trying to here, and ±1 is exactly what I am seeing, so that makes perfect sense. I was not aware that the PRU clocks were physically available on the BBB! What fun! Where can I find the leads to mess with? Perhaps I could overclock them? Oh the possibilities!

Charles,
I’m not sure that I am understanding your suggestions correctly.

I think that the hardware timers present a couple of problems. Please correct me where I am wrong. First, I think they run at max 25Mhz, so they have a resolution of 40nS, this is 4x worse than the ±5nS resolution I’m currently getting using my CYCLE clock scheme… clock edge problems included. Also, and perhaps more importantly, I need to measure the phase shift between the waveform on various pins and so I would need to start and stop the hardware timer with 2 different signals. Is this possible? Simple? My uniformed instinct is hard and slow.

I’m also interested in the Ethernet “wall clock” idea. I just read up on it, and it seems to be another 200Mhz clock, just like the CYCLE counter. I’m under the impression that the CYCLE counter does not stall (that is what the STALL counter is for) and so I’m not sure what the ethernet counter would provide… if I needed a second counter it would be super great, but for this app one is enough.

Thanks again,

Bill

Could you use a second BBB as the capture device instead? That would isolate the PRUs from each other and possibly allow for the HW capture that was mentioned.

Bill:

I don’t think the PRU clocks are externally available.

I don’t know what your interface looks like, but if there was a clock line involved in the data transfer, or marking when data is valid, I would look at inverting or adding some delay in that line going to the receiving PRU. If your interface does not look like that, then I don’t have a suggestion, other than using a second BBB for monitoring.

— Graham

Thanks so much guys.

The suggestion of using two BBB is great for getting the test program off the same clock domain. The issue there for me is speed. The easy way to get two BBBs working together would be to have them talk via ethernet. This would be quite slow compared to my current setup which swaps all info through PRU shared memory… perhaps an order of magnitude slower, perhaps 2? I really need the speed in order to keep this test in the realm of hours rather than days or weeks.

It is possible that I could speed things up using SPI or some such? If I find myself needing that last drop of accuracy I’ll go for it.

For the time being I think you all have my weird behavior… ehm, my program’s weird behavior… pretty well explained, and so I will just live with the ±1 instruction wiggle in my frequency calculations.

Thanks so much for the help. I learned a lot.

Best,

Bill

Fascinating! Thanks guys.

Graham,
Frequency counting is exactly what I am trying to here, and +-1 is exactly
what I am seeing, so that makes perfect sense. I was not aware that the PRU
clocks were physically available on the BBB! What fun! Where can I find
the leads to mess with? Perhaps I could overclock them? Oh the
possibilities!

The PRU clock is not available outside the chip. It is driven by an
internal PLL (see the TRM sections 8 and 15 for details).

Charles,
I'm not sure that I am understanding your suggestions correctly.

I think that the hardware timers present a couple of problems. Please
correct me where I am wrong. First, I think they run at max 25Mhz, so they
have a resolution of 40nS, this is 4x worse than the +-5nS resolution I'm
currently getting using my CYCLE clock scheme... clock edge problems
included. Also, and perhaps more importantly, I need to measure the phase
shift between the waveform on various pins and so I would need to start and
stop the hardware timer with 2 different signals. Is this possible?
Simple? My uniformed instinct is hard and slow.

The eCAP timers are in the PD_PER_L4LS_GCLK domain, which typically
runs at 100 MHz. So you should have the same or better timing
granularity as with the PRU. Yes, the PRU runs at 200 MHz, but you
need at least a 2 instruction inner loop, which has a maximum
effective frequency of 100 MHz.

To use the hardware timers, refer to section 15.3 of the AM335x
Technical Reference Manual (TRM), which shows how you can configure
the eCAP timer to measure pulse timings. I suspect one of the methods
shown in section 15.3.3.x will do what you need.

Also note that in addition to the "standard" eCAP modules connected to
the ARM core (referred to above), there is a dedicated eCAP module in
the PRU domain which you might find easier to deal with (since you're
writing PRU code for this already). The documentation is a bit fuzzy
on the clock for the PRU eCAP module (or I didn't find the correct
page/section). But it should be at least 100 MHz, and might even run
at the 200 MHz PRU clock rate.

Regardless, it's a *LOT* easier to measure pulse timing using hardware
designed for that purpose vs. writing code and dedicating an entire
PRU core to just measuring edges. And fortunately, the AM3358 is
aimed at industrial control so it has several fancy timers (and
capture/compare units, and quadrature encoder inputs, and PWM outputs,
and other great stuff!) ready for you to use! :slight_smile:

A few other bits of information is needed to get to the bottom of this.
- When you run the code on PRU0, what is running on PRU1? Do you put it in
reset or at least have it doing something that cannot interfere?

- Is everything in the local memory to the PRU in use or in the shared memory?

- Is there any access outside of the PRUSS (i.e. DRAM, non PRUSS peripherals)?

- Is the ARM side also reading/probing the memory belonging to the PRU in use?

- Does the STALL register indicate that is occurring for some reason and
enough to account for the difference?

There should be no shift between both PRUs clocks.

Other possibilities: are your PRUs communicating with each other or with the main application e.g. via PRU shared RAM? In this case it may be possible there are some wait states necessary to synchronise in case of parallel accesses.