slow GPIO access

jmelson wrote:

The PRCM clock module changes would be your best bet if you still
stick with GPIOs - see clock34xx.h in arch/arm/mach-omap2/ for
tweaking details. You're looking to change the EMU_PER_ALWON_CLK
output speed coming from dpll4m6.

according to the TRM, GPIO banks 2-6 are driven from the PER_L4_ICLK
which is L4_ICLK. so no surprise EMU_PER_ALWON_CLK is not used...

clock34xx.h and clock34xx.c are not small files, it will take a while
to dig through them.

may I suggest looking into the TRM 1st, which is also no small file,
but might be a better starting point.

From: beagleboard@googlegroups.com
[mailto:beagleboard@googlegroups.com] On Behalf Of Vladimir Pantelic
Sent: Thursday, October 08, 2009 12:03 AM
To: beagleboard@googlegroups.com
Subject: [beagleboard] Re: slow GPIO access

jmelson wrote:
>
>
>
>
>> The PRCM clock module changes would be your best bet if you still
>> stick with GPIOs - see clock34xx.h in arch/arm/mach-omap2/ for
>> tweaking details. You're looking to change the EMU_PER_ALWON_CLK
>> output speed coming from dpll4m6.

according to the TRM, GPIO banks 2-6 are driven from the PER_L4_ICLK
which is L4_ICLK. so no surprise EMU_PER_ALWON_CLK is not used...

> clock34xx.h and clock34xx.c are not small files, it will take a while
> to dig through them.

may I suggest looking into the TRM 1st, which is also no small file,
but might be a better starting point.

Have you tried changing the gating ratio (TRM P3428). The default is set to
0x01 which means the GPIO functional clock is interface clock divided by 2.

From: beagleboard@googlegroups.com
[mailto:beagleboard@googlegroups.com] On Behalf Of John (USP)
Sent: Thursday, October 08, 2009 9:34 AM
To: beagleboard@googlegroups.com
Subject: [beagleboard] Re: slow GPIO access

> From: beagleboard@googlegroups.com
> [mailto:beagleboard@googlegroups.com] On Behalf Of Vladimir Pantelic
> Sent: Thursday, October 08, 2009 12:03 AM
> To: beagleboard@googlegroups.com
> Subject: [beagleboard] Re: slow GPIO access
>
>
> jmelson wrote:
> >
> >
> >
> >
> >> The PRCM clock module changes would be your best bet if you still
> >> stick with GPIOs - see clock34xx.h in arch/arm/mach-omap2/ for
> >> tweaking details. You're looking to change the EMU_PER_ALWON_CLK
> >> output speed coming from dpll4m6.
>
> according to the TRM, GPIO banks 2-6 are driven from the PER_L4_ICLK
> which is L4_ICLK. so no surprise EMU_PER_ALWON_CLK is not used...
>
> > clock34xx.h and clock34xx.c are not small files, it will take a while
> > to dig through them.
>
> may I suggest looking into the TRM 1st, which is also no small file,
> but might be a better starting point.
Have you tried changing the gating ratio (TRM P3428). The default is set

to

0x01 which means the GPIO functional clock is interface clock divided by

2.
Never mind, in gpio.c line 1435, GPIO_CTRL is set to 0.

I noticed someone was playing with DPLL4 to speed up L4_ICLK and therefore, the GPIO.
After going over the TRM for the 3530, it looks like L4_ICLK come from DPLL3, not DPLL4 (pg 185 of the 3530 TRM) unless i’m misunderstanding something. Also, I found the section of the TRM that discusses how to reconfigure the L4 clock (section 1.7.8.2 of the PRCM chapter, page 231 of the OMAP 353x TRM). I haven’t had a chance to dig into how to utilize this but i think it might help.
let me know what you guys figure out and i’ll do the same
Also, has anyone checked that the GPIO clock is free_running and not gated, i think it’s set in GPIO_SYSCONFIG bit 0 (AUTO_IDLE)

Hope this helps
–Jacob

Right, and my attempt to adjust the L4_ICLK crashed the system, which
wasn't much of
a surprise. It seems the L4_CLK which creates PER_L4_ICLK runs a LOT
of different parts
of the chip, so changing it may affect some timing, like for the SD
card or USB, that
can't be changed.

It is set to a ratio of 1 on my system when Debian is running, but
changing it
had no effect. I thought that was curious, but I can't really
determine what it
does, I have read the GPIO section several times. It may also be that
the bottleneck
is somewhere else, in the L3 interconnect, L4 interconnect, or IO
firewall. Geez, why do
they need a FIREWALL in there, isn't this all supposed to be trusted
software?

Jon

I noticed someone was playing with DPLL4 to speed up L4_ICLK and therefore,
the GPIO.
After going over the TRM for the 3530, it looks like L4_ICLK come from
DPLL3, not DPLL4 (pg 185 of the 3530 TRM) unless i'm misunderstanding
something.

That was my understanding, too.

Also, I found the section of the TRM that discusses how to

reconfigure the L4 clock (section 1.7.8.2 of the PRCM chapter, page 231 of
the OMAP 353x TRM). I haven't had a chance to dig into how to utilize this
but i think it might help.
let me know what you guys figure out and i'll do the same
Also, has anyone checked that the GPIO clock is free_running and not gated,
i think it's set in GPIO_SYSCONFIG bit 0 (AUTO_IDLE)

All of this is quite confusing. So, if GPIO6 was in AUTO_IDLE, it
would power down after every access, and then take 250 ns to power
back up? That could be it, I will have to see what mode it is in on
my system. (I know I have looked at it, but didn't write it down.)

Thanks,

Jon

Looks like my AUTOIDLE idea didn’t pan out. I tried turning off AUTOIDLE for GPIO6 but it had no effect
–Jacob

So, you are seeing the same 250 ns limit on the GPIO? What program
are you using?
(I think I included my test program in an earlier message in this
thread.)
I really hope the 250 ns speed is not a fundamental limitation of the
L3 and L4 interconnect
hardware. It seems like maybe the system could run at that speed,
with the intelligent peripherals making most things work acceptably,
but it is hell for embedded sort of jobs.

I was wondering if it is the particular kernel I was using, so what
kernel are you testing with?
I also have Angstrom here, but couldn't get the mmap function to
compile on that, it probably needs some additional set of include
files to compile.

Thanks,

Jon

The GPMC turns out to be used to communicate with the NAND flash
memory chip that
is piggybacked on top of the CPU, so it is not available for other
things.

Jon

oh, and i forgot to mention, i’m using the code-sorcery toolchain with pre-built rootfs from koen’s blog, not the openembedded build system
–J

As I said earlier today:

I feel compelled to repeat that using codesourcery when targetting angstrom is a bad idea. Your development toolchain should match the one the system was built with, as well as all the C and LD FLAGS. Unless you know what you’re doing (hi mru!), you shouldn’t be mixing toolchains.

regards,

Koen

oh, and i forgot to mention, i'm using the code-sorcery toolchain with
pre-built rootfs from koen's blog, not the openembedded build system
--J

> Yeah, I'm seeing the same 250ns switching rate, with a lot of ringing as
> well (thats probably just from the cable) and i'm using the example code
> that you sent me earlier

OK, glad to hear this is a reliable measurement, not something due to
bad system configuration on just my board. I have a scope probe poked
into the expansion header holes and the signals look fine that way.
(i still haven't found a good reference for mapping

> the GPIO pin numbers to the bits in the GPIO6 register, can't find it in the
> TRM)

Yeah, the TRM is a totally INSANE piece of work! 3700 pages! Nobody
can possibly wade through all of that.

The pin mapping is in 3 places. There is a GPIO chapter, I think next
to last one in the TRM. Page 30 of that chapter (section 1.6.1) lists
all the register addresses. GPIO1 is pins 0-31, GPIO2 is 32-63, etc.
So, GPIO6 is pins 160 - 195, the register addresses are in the block
0x4905 8xxx. If you write to bit zero of 0x4905 803C (the dataout
register for GPIO6) it will come out on GPIO 160, assuming you have
that pad set to the raw GPIO rather than some other I/O module. So,
there's also the pad configuration registers, that is in a different
chapter, the System Control Module. You have to set the pad
multiplexer to select each I/O ball to one of several possible modules
for that specific ball.

And, finally, the beagle board's own manual has two pages that are
somewhat confusing. Table 20 and table 30 both show the mapping from
specific I/O functions to expansion header pins. I think table 20 is
the one that has been updated for the Rev C Beagle.

> I'm using the latest kernel from koen's blog (kernel version 2.6.29-r44)

> I think the next step for me is to go over the gpio.txt and gpio.h files in
> the kernel tree and see if they reveal anything. last resort i think is
> going to be to just load the entire disto into RAM (yay ramdisk) so no other
> system IO is needed and then mess around with DPLL3, 4, etc

Well, that is an interesting thought. people have revved up the CPU
clock, I wonder if that would show a speed-up in the GPIO as well. I
think these speed up methods have been published.

Jon

Hi Jon,

Sorry for first kicking in on this thread now. But better late than never
:slight_smile: - I have been away from my BeagleBoard-list-account for like 6 weeks and
I'm now slowly catching up on the previous ~1.500 emails :slight_smile:

Yeah, the TRM is a totally INSANE piece of work! 3700 pages! Nobody
can possibly wade through all of that.

You are right - It's insane - Never the less I have been through most parts
of it at least twice (some of the chapters many more times) - But I have as
well spend the last ~7 years doing this - Starting with OMAP1 long time ago
:slight_smile:

The GPMC turns out to be used to communicate with the NAND flash memory
chip that is piggybacked on top of the CPU, so it is not available for

other things.
Even though the GPMC is connected to the NAND it can be used to communicate
with other devices and well, since it have several Chip Selects, which can
be mapped to different memory regions and configured individually. But you
are right: You won't have exclusive access to the GPMC - That being said the
GPMC isn't accessible at Beagle, so it isn't relevant :slight_smile:

It may also be that the bottleneck is somewhere else, in the L3

interconnect,

L4 interconnect, or IO firewall.

The L4, L3 and IO firewalls won't be the bottlenecks. It's unfortunately the
GPIO module itself, which isn't made for the purpose you would like. I.e.
it's *not* supposed to be used as a 8-bit parallel IO bus interface, but to
be used as single GPIOs for control on their own.

In case you need a bus interface in OMAP3 you need to use either the GPMC
(IO), ISP (I), LCD (O), or MMC (IO) interface. The newly introduced OMAP
L138 have a Universal Parallel Port (uPP), which is basically what you want
but to be honest I don't know that much about this one yet, since it's still
relatively new in the OMAP world and I haven't dealt with it yet :slight_smile:

So the short answer to your GPIO trouble is: You won't (unfortunately) be
able to go higher than the ~4MHz you have found - It's limited by the IP
block design AFAIK - I know this isn't the answer you are searching for, but
unfortunately it's the truth :slight_smile:

Hope you find another way around this. Using the MMC interface together with
a FPGA for converting into the format you need should bring you the ability
to go to 8-bit@48MHz minus the MMC protocol overhead. This might be a
solution while still utilizing the BeagleBoard? Alternative you can access
the GPMC on an Gumstix Overo board, but again this requires you to add
extra/other HW to your setup.

Best regards - Sorry about the bad news :slight_smile: - Good luck
  Søren

So the short answer to your GPIO trouble is: You won't (unfortunately) be
able to go higher than the ~4MHz you have found - It's limited by the IP
block design AFAIK - I know this isn't the answer you are searching for, but
unfortunately it's the truth :slight_smile:

Well, that certainly is not the best news. I have a number of
options. One
is that the protocol I'm using was designed for long cables, and the
Beagle
would be mounted to an adaptor board that will plug directly into the
target
device. So, I can likely just skip the handshaking and know that the
target
device will respond within one or two I/O clocks.

Hope you find another way around this. Using the MMC interface together with
a FPGA for converting into the format you need should bring you the ability
to go to 8-bit@48MHz minus the MMC protocol overhead. This might be a
solution while still utilizing the BeagleBoard? Alternative you can access
the GPMC on an Gumstix Overo board, but again this requires you to add
extra/other HW to your setup.

The GPIO that is there will work for the initial experiments, but
doesn't provide
any improved performance above a PC's already slow parallel port. Oh,
I see you CAN
get to all 8 data bits on MMC2 of the Beagle's expansion header. That
would certainly
handle the data rate, but I don't know ANYTHING about the interface.
This is not
a job for long block transfers, some will be as short as write one
address, read one
byte. The longest block transfer will be write one address, read 12
bytes. So, the
MMC may not be a great help there, if the setup of the controller
takes a lot of time.

Thanks for your info! I can't believe TI tech support has taken 3
weeks so far and can't tell me this! But, maybe they know something
that can help, and are trying to code it up. I don't need a massive
speedup, just X2 or X4 would make me happy in this first experiment.

Jon

Hi Jon,

Oh, I see you CAN
get to all 8 data bits on MMC2 of the Beagle's expansion header. That
would certainly
handle the data rate, but I don't know ANYTHING about the interface.
This is not
a job for long block transfers, some will be as short as write one
address, read one
byte. The longest block transfer will be write one address, read 12
bytes. So, the
MMC may not be a great help there, if the setup of the controller
takes a lot of time.

Hmm - MMC is really designed for longer transfers, although you can do short
transfers as well. It's though not designed for a write/ack kind of
communication and you will get a severe protocol overhead hit (around 5-10
bytes pr write AFAIR => You are again down around the 4MHz). The way you
communicate in MMC is, that you need to write all the data in one MMC
package, and then the respondent will indicate if the data is received
correctly (by a CRC calculation) afterwards. This was a very simplified and
rough explanation, but in short it doesn’t fit your type of EPP
communication very well, and you would need to have a FPGA in between doing
some kind of protocol conversion.

Setting up and restarting the MMC module between transfers doesn't take
long. AFAIR it's just a matter of feeding the data you want to send into the
MMC FIFO, programming the MMC packet type and setting a start-bit (again a
very rough simplification), but you should be able to do packages at a
reasonable speed, but the package overhead itself will "kill you" -
Unfortunately

The above being said I think you best option for something like this really
is to utilize either the GPMC bus (on an Gumstix Overo or similar) or the
uPP interface on an OMAP L138 which is designed for this kind of
communication AFAIK...

The Beagleboard really isn't very capable of a task like the one you need -
Unfortunately...

Good luck
  Søren

Setting up and restarting the MMC module between transfers doesn't take
long. AFAIR it's just a matter of feeding the data you want to send into the
MMC FIFO, programming the MMC packet type and setting a start-bit (again a
very rough simplification), but you should be able to do packages at a
reasonable speed, but the package overhead itself will "kill you" -
Unfortunately

Well, this might not be so bad. If the transfer rate is quite fast on
the MMC
side, I could have a packet format that looks like address:r/w:data
for each
byte or contiguous block to be transferred, and an FPGA or CPLD on the
interface
board would do the conversion to a speeded-up EPP transfer to the
target device.
The way the PC driver works now, it wraps up all the reads to be done
in one
group and does them, then the program performs calculations for the
motion control
and sends data to the driver to be written all in one block to the
target. If I set
up the FPGA (Hmm, now seems a CPLD might not have enough memory) so
you could
pre-load a format, and then every read request would send you a packet
containing
data corresponding to that format, you just ask for a read, and the
data comes
all pre-formatted as required. Putting all the transfers together in
blocks is
already done by the existing driver, so this wouldn't be hard to do.
With appropriate programming of the FPGA, this should all boil down to
two
blocks to be transferred, first a read, then a write.

The above being said I think you best option for something like this really
is to utilize either the GPMC bus (on an Gumstix Overo or similar) or the
uPP interface on an OMAP L138 which is designed for this kind of
communication AFAIK...

So, is there going to be an affordable, Beagle-like board for the
L138? If the
Beagle can be a stepping stone to a better system, that will be OK.
Right now we
don't even have an RTAI kernel, but that is being worked on.

Thanks,

Jon