slow GPIO access

jmelson · September 30, 2009, 9:23pm

I hate to keep flogging a dead horse, but I hope somebody knows
something about this area.
I want to interface to an existing parallel device using the BB's
GPIO. Some early tests indicate the GPIO6 section is running with a 4
MHz clock, so even though the OMAP CPU is running at 600 MHz, I can't
make anything happen on the GPIO pins faster than every 250 ns. It
also appears that accessing the GPIO pins probably freezes the CPU for
250 ns or 150 clock cycles. The 250 ns is very clearly some kind of
clock, as the timing edges are ROCK SOLID on a scope. I am using mmap
to directly map the IO ports to user space.

So, is there some way to turn up this clock without fouling up the
rest of the devices that use GPIO? I only need
to speed up about 13 lines connected to the expansion header. I have
read the sprufxx.pdf section on GPIO thorougly, and don't think there
is anything in there that can be adjusted. There are debounce timers
and such, but my reading seems to say that is all related to input
only - If I'm wrong on that, please correct me.

I have tried to plow through the PRCM section, but it is VERY complex
and difficult to figure out what side effects would happen if I
changed anything. I am using the USB and SD card interfaces, and
don't want to kill them.

I see that the CPU clock can be divided by 1 to 4 by a pair of chained
div by 1/div by 2 circuits, but don't find
anywhere where the CPU clock could be divided by 150 before being fed
to the GPIO, so obviously I'm missing something. Also, the doc seems
to indicate the WHOLE GPIO system runs from one clock, and I think
there MUST be stuff like USB and SD cards that get a higher clock.
That too seems to say I've missed something.

Thanks,

Jon

Jerry_Johns · September 30, 2009, 11:18pm

What kind of parallel interface is this? Is it n-bit data + clock?
custom?
Any change you can use the stock peripherals on the OMAP?

Charles_Krinke · October 1, 2009, 2:55pm

Dear Jon:

In thinking about your post and your need for timely toggling of hardware pins, I would say that working from user space is probably always going to be a problem.

If one wants the closest control over hardware, one needs to get down to the kernel driver level. At that point, you can tailor your software to toggle at machine language speed (to the limit of the available bandwidth, and to the extent you wish to stop all other programs while toggling pins).

Personally, if I were to tackle a project like this, I would make a small FPGA and put a few 8 or 16 bit registers in it and just read/write to the registers. Then I would let the FPGA do the toggling at bus speed. But thats just how I might tackle such a project.

I have always found that toggling GPIO on any project to end up being a drain on system resources.

Charles

jmelson · October 1, 2009, 5:00pm

Dear Jon:

In thinking about your post and your need for timely toggling of hardware pins, I would say that working from user space is probably always going to be a problem.

These are just tests, although there will always be a user-mode
diagnostic program.
The real driver is to be a kernel module running as a real-time
process, under the RTAI scheduler.

If one wants the closest control over hardware, one needs to get down to the kernel driver level. At that point, you can tailor your software to toggle at machine language speed (to the limit of the available bandwidth, and to the extent you wish to stop all other programs while toggling pins).

Well, that is the problem, this is NOWHERE NEAR "machine language"
speeds, about 150 X slower.

Personally, if I were to tackle a project like this, I would make a small FPGA and put a few 8 or 16 bit registers in it and just read/write to the registers. Then I would let the FPGA do the toggling at bus speed. But thats just how I might tackle such a project.

I am going to attach an FPGA to the OMAP. That is a mature product,
and uses the EPP mode of a PC parallel port for communication. Since
the EPP handshaking is done by hardware in the PC, it is fairly fast.
I can do a byte every 600 - 800 ns depending on whether it is a
motherboard or PCI port card. I need to do at least as good as this,
but it seems the OMAP should be able to go quite a bit faster. To
control a CNC machine, the CPU reads position 1000 times a second and
sends new velocity commands to the output section. The communication
overhead needs to be kept to a reasonable minimum.

Thanks,

Jon

jmelson · October 1, 2009, 5:05pm

I want to stay with a Beagle Board, the expansion header only brings
out a limited number of
balls from the OMAP. One annoying thing is they do not bring out any
byte-aligned 8 contiguous bits, but there is a non-aligned string of 8
contiguous bits, so I can shift my 8-bit byte over to match that
group.

There is a data strobe and an address strobe, plus a write/read bit,
and an acknowledge. Read up on the IEEE-1284 (EPP mode) protocol to
see what it looks like.

I will need to provide a voltage level translator and a direction
control signal to that to turn the bus around.

It SHOULD be pretty simple, but I'd like to get a bit more performance
out of it if that is possible.

Thanks,

Jon

Keith_Williams · October 1, 2009, 6:26pm

Since what you really seem to want is a IEEE-1284 port have you looked
into using a USB-Parallel adapter?

jmelson wrote:

jmelson · October 1, 2009, 9:09pm

USB does not guarantee real-time delivery of packets. Oh, if it were
only that simple!
We need to have a real-time module make the device sample the current
encoder position,
and that is a pretty tight real-time constraint. Once sampled, then
we have to read a bunch of
bytes, process it and send back a bunch of velocity command bytes, all
within a few hundred us
for sure, and would prefer to do it even faster, if at all possible.
You really can't do this back-and-
forth with USB very easily. Making the program dispatch when the hard-
real-time external hardware
clock ticks would be even better, but at the moment the program is not
set up to do that, the program wants
to be the master.

Jon

Jacob_Leemaster · October 1, 2009, 9:34pm

Hi Jon,
I actually have a similar issue where i need to place a byte
(hopefully two) on the GPIO with as fast a signaling rate as
possible. I actually got my board (Rev C3) just yesterday so am still
in the process of getting everything set up for development. Would
you mind sending me a few commented code snippets on how your
configuring and using the GPIO pins? the sooner i get up and running
the sooner I can help make progress with this issue.
--Jacob

P.S. it was going over the TRM and it looks like the GPIO gets its
clock speed from L4_IFACE clock, but i can't find where that speed is
set. or how GPIO speed is controlled.

Keith_Williams · October 1, 2009, 9:46pm

I thought that you might say that. So, here is my other thought.....

There are three other fairly inexpensive OMAP3 platforms Overo, one by
Cogent Computing, and the Embest board.

All of those offer different IO interfaces and some even allow for
direct Address/Data bus connection. If shoe-horning the Beagle into
your application is too much of a headache, but you still want to use
the 3530, then maybe one of those would be a better fit?

jmelson wrote:

jmelson · October 1, 2009, 10:14pm

Well, I already have the Beagle, and it works as far as I have gone
with it.
All I want to do is turn up the clock that seems to be limiting GPIO6
by a
modest factor. I am going to write a program to read out a bunch of
the clock
selection registers to find out what the default settings are, I see
some of them
that look like they might be responsible for this slow clock. I think
I have figured
out how to make the Beagle's expansion header work for this
application, and it
will only take a couple lines of code to deal with the non-aligned
byte.

I have seen the other OMAP boards, and I don't think they really offer
any great
advantage over the Beagle. I have this thing running Debian, and I am
totally blown
away by the possibilities! I also have some other less time-critical
applications
in mind for it that also require the parallel port emulation, they
should be able'
to use the same voltage translator board.

Thanks,

Jon

jmelson · October 2, 2009, 4:09am

Would

you mind sending me a few commented code snippets on how your
configuring and using the GPIO pins?

P.S. it was going over the TRM and it looks like the GPIO gets its
clock speed from L4_IFACE clock, but i can't find where that speed is
set. or how GPIO speed is controlled.

Yes, that seems right. There are a bunch of registers in the PRCM
section that generate
these clocks. The worrisome part is that there is only ONE clock for
the whole GPIO
system, so changing it might have wide-ranging side effects on the USB
and SD card
interfaces.

I have found a divider in the GPIO_CTRL register, but have not found
that it has any
effect on output speed.

Here's a short program that toggles pin 24 of the expansion header :
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int main() {
int x,y;
int fd = open("/dev/mem", O_RDWR | O_SYNC);

  if (fd < 0) {
    printf("Could not open memory\n");
    return 0;
  }

  // Pad configuration
  volatile ulong *pinconf;
  pinconf = (ulong*) mmap(NULL, 0x10000, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0x480000
00);
  if (pinconf == MAP_FAILED) {
    printf("Pinconf Mapping failed\n");
    close(fd);
    return 0;
  }

  // Configure Expansion header pins as input.
  x = pinconf[0x21bc/4];
  printf("pinconf[0x21bc/4] = %x\n",x);
  x = x & 0x0000ffff; // mask off high bits for GPIO 168
  x = x | 0x011C0000; // set pulltype, bi-dir and mux mode 4
  pinconf[0x21bc/4] = x;
  close(fd);

fd = open("/dev/mem", O_RDWR | O_SYNC);

  // GPIO Configuration: configure are input
  volatile ulong *gpio;
  gpio = (ulong*) mmap(NULL, 0x10000, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0x49050000)
;
  if (gpio == MAP_FAILED) {
     printf("Gpio Mapping failed\n");
     close(fd);
     return 0;
  }

  // Configure 8th GPIO pin on bank 6 as output.
  x = gpio[8034/4];
  printf("gpio[8034/4] = %x\n",x);
  x = x | 0xfffffeff; // change bit 8 to zero
  gpio[0x8034/4] = x;

  // Toggles GPIO_168 (expansion connector pin 24)
  int c=0;
  x = 0x803c/4; // addr of gpio data out reg
  y = 0x8090/4; // addr of clear gpio reg

  for(; {
      gpio = 0x00000100; // set bit 8
      gpio = 0x00000000; // set bit 8
      gpio = 0x00000100; // set bit 8
      gpio = 0x00000000; // set bit 8
      gpio = 0x00000100; // set bit 8
      gpio = 0x00000000; // set bit 8
      gpio = 0x00000100; // set bit 8
      gpio = 0x00000000; // set bit 8
  }
}

Jerry_Johns · October 2, 2009, 4:15am

Based on a cursory glance at the EPP spec, it seems you can use the
GPMC peripheral for this - 8-bit asynchronous data bus, with addr/data
strobe, write enable, ....
Since the GPMC gives you the ability to add non-determinisitic waits
using the WAIT pin as well, this fits right down the EPP alley way.
And you get completely controlled behaviour from this peripheral, with
microsecond accuracy in terms of transaction timing.

Jerry

jmelson · October 2, 2009, 3:40pm

Is the GPMC brought out to any pins I can get to on the Beagle Board?
If not, then it
doesn't do me much good. "Microsecond accuracy" doesn't sound very
good, I've got 4 times better
than that already with the GPIO pins that are available. Looking in
the docs, GPMC is the General Purpose Memory Controller. It seems
like it might be a bit more specific than I was looking for, with a
lot of fixed protocol built into the logic. Anyway, if I can't get
to the OMAC balls from the Beagle Board, it is not much use.

Thanks,

Jon

Gerald_Coley1 · October 2, 2009, 4:14pm

There are GPIO pins brought out onto the expansion header of the Beagle. You can find the information in the System Reference Manual http://beagleboard.org/hardware/design .

Gerald

Jerry_Johns · October 2, 2009, 4:45pm

When i meant microsecond accuracy, i meant the accuracy with which you
can trigger a transaction. The timing of the actual transactional
waveforms can be adjusted to within 5ns of accuracy, more than
sufficient for a parallel port. Truly, this is how you should be doing
it instead of loading down the ARM for GPIO toggling, which it not at
all efficient. And the GPMC peripheral is actually much more
customizable than you think - since any and all timing parameters are
configurable with 5ns clock period accuracy, you can choose to match
the EPP standard quite closely. The only thing that might be tricky is
the distinction between the address and data phases, both of which
might be coupled together in one transaction in the GPMC.

However, it looks like the pins are not brought out on the Beagle, in
which case.....unless you're willing to go to the OMAP3EVM...

The PRCM clock module changes would be your best bet if you still
stick with GPIOs - see clock34xx.h in arch/arm/mach-omap2/ for
tweaking details. You're looking to change the EMU_PER_ALWON_CLK
output speed coming from dpll4m6.

Jerry

jmelson · October 3, 2009, 4:05am

Yes, of course, a small selection of GPIO6 pins, roughly from GPIO135
to GPIO161, with a number
missing in the middle of this range, are brought out. I don't believe
any of these can be MUX'ed to the GPMC section.

jmelson · October 3, 2009, 4:10am

When i meant microsecond accuracy, i meant the accuracy with which you
can trigger a transaction. The timing of the actual transactional
waveforms can be adjusted to within 5ns of accuracy, more than
sufficient for a parallel port. Truly, this is how you should be doing
it instead of loading down the ARM for GPIO toggling, which it not at
all efficient. And the GPMC peripheral is actually much more
customizable than you think - since any and all timing parameters are
configurable with 5ns clock period accuracy, you can choose to match
the EPP standard quite closely. The only thing that might be tricky is
the distinction between the address and data phases, both of which
might be coupled together in one transaction in the GPMC.

Well, this is interesting, but what I am doing is not long block
transfers,
but blocks of 12 - 24 bytes at a time. Setting up a DMA transfer for
this
is not efficient. There are a lot of single-byte reads and writes,
also.

However, it looks like the pins are not brought out on the Beagle, in
which case.....unless you're willing to go to the OMAP3EVM...

Yes, that makes me want to stay with the Beagle for the moment.

The PRCM clock module changes would be your best bet if you still
stick with GPIOs - see clock34xx.h in arch/arm/mach-omap2/ for
tweaking details. You're looking to change the EMU_PER_ALWON_CLK
output speed coming from dpll4m6.

GREAT, thanks for the pointer! I will look this up, it may be exactly
the
info I needed. I never thought it would be the EMU clock I needed to
work
with. Sheesh, the GPIO is sure complicated!

Thanks again,

Jon

jmelson · October 3, 2009, 6:03am

Looking in register CM_IDLEST_CKGEN (4800 4D20) bit 13 is zero, which
indicates that clock
EMU_PER_ALWON_CLK is not active. Wish I'd checked that first, I tried
all sorts of things to
DPLL4 that had no effect, now I know why. For instance, I changed the
DIV_DPLL4 field of CM_CLKSEL1_EMU (4800 5140) from 3 to 2, 1 and 16
with no effect. So, the GPIO system must be clocked off something
else, but I haven't been able to track down where the clock actually
comes from. I am pretty sure I changed the multiplier setting of
DPLL4, so it must be coming from a different DPLL.

clock34xx.h and clock34xx.c are not small files, it will take a while
to dig through them.
Poking the registers directly is certainly more dangerous, but I can
also read them first to see what they are set to.

And, maybe what is throttling the GPIO pins has nothing to do with the
peripheral clock, but is some other part of the chip that is delaying
things.

Thanks,

Jon

Jacob_Leemaster · October 7, 2009, 11:44pm

one thing that might be worth trying (I haven’t had a chance to test this, still getting my stuff set up amid midterms) is speeding up the CPU clock. i think i read somewhere on the angstrom wiki that the beaglboard cpu defaults to 500MHz operation and a certain flag need to be set in u-boot (before the kernel boots) to switch to 600MHz operation. I’ll see if i can dig up the original doc
–Jacob

jmelson · October 8, 2009, 4:19am

Going from 500 to 600 MHz is only a 20% speed increase, hardly worth
the effort.
I was hoping to find a way to get a 2X up to 5X speedup of the GPIO
pins. It just doesn't
make sense for such a fast CPU to be saddled with such slow I/O. It
may have been set this way
for battery-operated systems, but my applications are line powered for
the most part.

I have sent two messages to TI technical support, but haven't heard
anything back from them.

Jon