BeagleBone Black GPIO max frequency/bus latency (without using PRU)?

Andrew_P_Lentvorski · August 24, 2015, 11:12pm

I’ve been trying to hunt down the maximum frequency on the BeagleBone Black GPIO pins.

This seems to be dominated by the transaction latency across the L3/L4 interconnect. Fair enough. So …

What’s the latency number?

I’ve measured about 166ns per transaction (I can create a roughly 3MHz toggle which is 2 pin flips which requires 6MTransactions/s which is 166.66ns per transaction). But I don’t know how to calculate that number from the documentation.

I’ve been through the TI reference manuals, the TI support forums, and a bunch of other things, but nobody seems to be able to cough up an actual number for this.

Anybody have some references to frequencies and bus wait numbers? They may be out there, but GTMF/RTFM doesn’t seem to be sufficient.

I don’t need turbo speed, but the fact that it’s entirely possible that I may not even be able run at 1MHz (something painfully easy for most M0 or M3/M4 cores) is, frankly, a bit of a shock.

Thanks.

William_Hermans · August 25, 2015, 12:26am

Andrew,

So this is word of mouth from these very forums perhaps a couple years ago but I do recall someone ( as in someone qualified to know - don’t remember exactly who ) saying that technically, GPIO’s can be toggled on/off at 100-200 Mhz - Using a PRU.

Without using the PRU’s . . . well the link you posted a link to in your other post was featured on HaD months ago, and is still the highest rate I’ve seen to date. Frankly I do not think this is a hardware limitation so much as an OS or software limitation. So no way really to generically document that in the hardware specs.

Another thing to keep in mind is that the AM335x processors are generic purpose processors, where many / all M0, M0+, M3/M4 are more specific purpose. Geared towards doing a few things very well, and fast.

Anyway, if you need more speed, perhaps an RTOS or even just tightening up the Linux “message pump” ( e.g. RT kernel ) might prove helpful ?

Charles_Steinkuehler · August 25, 2015, 12:39am

I've been trying to hunt down the maximum frequency on the BeagleBone Black
GPIO pins.

This *seems* to be dominated by the transaction latency across the L3/L4
interconnect. Fair enough. So ...

What's the latency number?

I've *measured* about 166ns per transaction (I can create a roughly 3MHz
toggle which is 2 pin flips which requires 6MTransactions/s which is
166.66ns per transaction). But I don't know how to *calculate* that number
from the documentation.

I've measured 40 ns from the PRU. I'm not sure if the CPU can match
this, but I'd be surprised if it couldn't.

I've been through the TI reference manuals, the TI support forums, and a
bunch of other things, but *nobody* seems to be able to cough up an actual
number for this.

This is a bit lower level than you'll find in most reference manuals,
and falls into the category of "if it's _really_ important to you,
contact the manufacturer and verify"...and I hope you're buying a
*LOT* of parts, because this is the sort of thing that is subject to
change with die revisions.

Anybody have some references to frequencies and bus wait numbers? They may
be out there, but GTMF/RTFM doesn't seem to be sufficient.

I don't need turbo speed, but the fact that it's entirely possible that I
may not even be able run at 1MHz (something *painfully* easy for most M0 or
M3/M4 cores) is, frankly, a bit of a shock.

So from my measurements with the PRU, the sustained write speed for
the GPIO is 40 ns per write. With the GPIO clock at 100 MHz, that's
2.5 clocks per write, which is actually a pretty respectable
round-trip synchronization delay for crossing the various clock
domains involved.

Charles_Steinkuehler · August 25, 2015, 12:41am

That's using the PRU direct I/O, *NOT* the GPIO pins. There's a
difference.

Even the PRU can only get about 12.5 MHz out of a GPIO pin, although
it can _easily_ get 100 MHz toggle rates (200 MHz update rate) from
the PRU direct outputs.

William_Hermans · August 25, 2015, 12:46am

That’s using the PRU direct I/O, NOT the GPIO pins. There’s a
difference.

Ah ok, I remember I think Gerald( but as usual my memory is hazy here ) saying that the PRU can generate a 100Mhz or 200Mhz square wave. I obviously assumed this was via GPIO, but I was wrong it seems. My bad.

Andrew_P_Lentvorski · August 25, 2015, 2:43am

I’ve been trying to hunt down the maximum frequency on the BeagleBone Black
GPIO pins.

This seems to be dominated by the transaction latency across the L3/L4
interconnect. Fair enough. So …

What’s the latency number?

I’ve measured about 166ns per transaction (I can create a roughly 3MHz
toggle which is 2 pin flips which requires 6MTransactions/s which is
166.66ns per transaction). But I don’t know how to calculate that number
from the documentation.

I’ve measured 40 ns from the PRU. I’m not sure if the CPU can match
this, but I’d be surprised if it couldn’t.

Well, there could be some silliness involving the fact that the memory is mmap’d in Linux. A TLB access or something similar might be required that could add overhead.

This is a bit lower level than you’ll find in most reference manuals,
and falls into the category of “if it’s really important to you,
contact the manufacturer and verify”…and I hope you’re buying a
LOT of parts, because this is the sort of thing that is subject to
change with die revisions.

It actually surprises me that this information isn’t documented. However, I presume it’s because most people using this high-end a processor really only use the peripherals. The only thing most people really use the GPIO’s for is generating interrupts.

I suspect I could live with things as they stand, but this is really going to make things … annoying.

I may be better off just trying to do nasty things to the McSPI subsystem. SPI really doesn’t like bi-directional data lines, though.

Andrew_P_Lentvorski · August 30, 2015, 9:13am

I can confirm this with a scope. After porting the kernel module from this topic:
https://groups.google.com/forum/#!topic/beagleboard/dyuax5415dc

I measured exactly 12.5MHz. Apparently mmap is doing something stupid that slows down the access from user space.

zmatt · November 3, 2015, 12:12pm

It maps the memory as “strongly ordered”, which means the writes get flagged with a “delivery confirmation required” bit and the cpu will sit and wait for it.

chaitanyakumarreddym · February 27, 2017, 7:05am

Hi

I am trying to toggle PRU GPIO at high frequency but I am getting the output at 10Mhz also its not a square wave it has a rise time of 50ns and fall time of 70ns approximately. How to reduce the rise and fall time. Please reply to my post. Its very urgent

CEinTX · February 27, 2017, 2:43pm

Somehow I doubt the processor pins would have a rise and fall time of 50/70ns - that combo is greater than the period for 10Mhz (100ns).
I suspect your testing may be flawed. Either that or you have too much load/capacitance on your output pins.
Look at the 3358’s datasheet for the pins you are using and see what it has to allow for the output strength and switching capability (tr/tf).
10Mhz is about the max you will be able to achieve without using the PRU - there have been at least a couple of threads on this - look/search.

GL,
Matt

Dennis_Lee_Bieber · February 27, 2017, 2:45pm

On Sun, 26 Feb 2017 23:05:33 -0800 (PST),
chaitanyakumarreddymallu@gmail.com declaimed
the following:

I am trying to toggle PRU GPIO at high frequency but I am getting the

You should have started a fresh thread rather than posting under an
older subject.

Note that the subject thread you posted under states "without using
PRU" yet YOUR first sentence "toggle PRU GPIO" implies that you ARE using
the PRU. As a result, your questions may not be looked at by people who
might know an answer (I'm not one -- I'm just extrapolating from theory in
the following)

output at 10Mhz also its not a square wave it has a rise time of 50ns and
fall time of 70ns approximately. How to reduce the rise and fall time.

I'm not an expert at wiring, but I'd think reduction in rise/fall times
would entail using circuits with less loading (higher impedance, lower
capacitance). So... shorter leads may help; if you've got a few feet of
wires carrying the signal its going to take some time for inherent
capacitance (and maybe some inductance too) to let the signal reach the
ends of the transition points.

As for the output frequency... How many instructions are in your PRU
toggling code? As I recall the PRU runs at 200MHz. If it requires one
instruction to set the output high, one to set it low, and one to loop back
that makes three instructions for a net of 66MHz, and a non-square wave
(high one clock, low two clocks). Put in a NOP of some type to make it a
square (high, NOP, low, jump) and the minimum code loop is now 50MHz. If
you have any other processing in that loop, your maximum pulse frequency
will drop correspondingly -- a 20 instruction loop would give you a
programmed 10MHz.

Please reply to my post. Its very urgent

This is, for the most part, a user-to-user forum (I believe the BBB
designer and the maintainer of the OS image do respond at times). It isn't
really a place for "urgent" matters.

William_Hermans · February 27, 2017, 3:57pm

Well, you wont be toggling a GPIO pin at 10Mhz unless you are using the PRU . . ./dev/mem + mmap(), or a kernel module ar best *may be able to achieve ~1.5Mhz.

John_Syn · February 27, 2017, 6:27pm

What is the bandwidth and sample rate of your oscilloscope? I suspect that you don’t have the required bandwidth on your oscilloscope/probe to be able to measure the rise/fall times accurately. To test, you could always buffer your signal with a AHC or AHCT buffer which will have switching times of less than 6ns, but then you need to use proper probing techniques or you will get ringing from long probe ground connections or ground bounce issues with poor power distribution.

Regards,
John