[beagleboard] How to get a high frequency Gpio Input Sampling Rate

randy_rodes · April 6, 2013, 2:13am

tl;dr:
How to achieve a high sampling frequency on gpio input data reading
(beaglebone) ?

Theoretically:
Best is to remain inside kernel for capturing the data at high
frequency to avoid context switch to userspace.

Fastest would be to put your code in some existing debugfs "cat
/sys/kernel/debug/pm_debug/... "
Hack some existing infrastructure.

Start a timer and print your gpio and see how well its sampling.
Next you could try with dmtimer sending some interrupt to ARM and do
the same in interrupt handler.

If you have to pump data to userspace, some kind of poll/select will
have to be implemented to wakeup userspace app when the sampling timer
expires.

Rafael_Machado · April 8, 2013, 3:59pm

Thanks Randy.

So you are basically suggesting me that I should put my code into some ram-based fs, like debugfs/tmpfs/ramfs
in order to avoid file write cycles. Is this any better than mmaping my gpio’s file descriptors ?

I’m not quite sure what timer should I use to collect interruptions in userspace app: dmtimer or dmtimer2 ?

I’m gonna play around with poll/select/epoll eventually. Thanks for the tip.

The main issue bothering me is that
I cannot possibly get ubuntu arm to interrupt my userspace app in periods smaller than ~30ms.

Even if I blank out the loop internals (i.e., delete the entire gpio read code overhead),
I cannot get better interruption periods, even when with dmtimer2 mmaped.

Do you think plain poll/select/epoll techniques or another (patched) distro can be of any use here ?

To clarify, an interrupt snippet in my code would be something like this (got it in some topic here… can’t find the link right now) :

`
#define PERIOD 1000
int fd = open("/dev/mem", O_RDWR | O_SYNC);
volatile u_int32_t *dmtimer2_regs = (u_int32_t *)mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0x48040000);

u_int32_t t0 = 0, t1 = 0;
dmtimer2_regs[0x3c / 4] = 0;
while ((t1 - t0) < PERIOD) {
t1 = dmtimer2_regs[0x3c / 4];
}
printf(“delta=%u. t0=%u. t1=%u\n”, t1 - t0, t0, t1);
`

Or something like this (via clock_gettime):

`

struct sched_param sch; sch.sched_priority = sched_get_priority_max(SCHED_FIFO);
if (sched_setscheduler(0, SCHED_FIFO, &sch) == -1) perror(“Scheduling error”);

struct timespec spec_start, spec_end;

clock_gettime(CLOCK_REALTIME, &spec_start);
clock_gettime(CLOCK_REALTIME, &spec_end);
printf(“delta_gettime=%lf us\n”, diff_nsec(&spec_start, &spec_end) / 1000.0);

`

Anyway, in my computer (arch linux 64b) I can get results down to 0.5us.
In the bbone (w/ ubuntu arm 32b) I cannot go further 30ms.

Thank you

jkridner · April 9, 2013, 3:45am

Thanks Randy.

So you are basically suggesting me that I should put my code into some
ram-based fs, like debugfs/tmpfs/ramfs
in order to avoid file write cycles. Is this any better than mmaping my
gpio's file descriptors ?

I'm not quite sure what timer should I use to collect interruptions in
userspace app: dmtimer or dmtimer2 ?

I'm gonna play around with poll/select/epoll eventually. Thanks for the
tip.

The main issue bothering me is that
I cannot possibly get ubuntu arm to interrupt my userspace app in periods
smaller than ~30ms.

Even if I blank out the loop internals (i.e., delete the entire gpio read
code overhead),
I cannot get better interruption periods, even when with dmtimer2 mmaped.

Do you think plain poll/select/epoll techniques or another (patched)
distro can be of any use here ?

To clarify, an interrupt snippet in my code would be something like this
(got it in some topic here.. can't find the link right now) :

    #define PERIOD 1000
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    volatile u_int32_t *dmtimer2_regs = (u_int32_t *)mmap(NULL, 0x1000,PROT_READ
>PROT_WRITE, MAP_SHARED, fd, 0x48040000);

    u_int32_t t0 = 0, t1 = 0;
    dmtimer2_regs[0x3c / 4] = 0;
    while ((t1 - t0) < PERIOD) {
        t1 = dmtimer2_regs[0x3c / 4];
    }
    printf("delta=%u. t0=%u. t1=%u\n", t1 - t0, t0, t1);

Or something like this (via clock_gettime):

    struct sched_param sch; sch.sched_priority =
sched_get_priority_max(SCHED_FIFO);
    if (sched_setscheduler(0, SCHED_FIFO, &sch) == -1) perror("Scheduling
error");

    struct timespec spec_start, spec_end;
    clock_gettime(CLOCK_REALTIME, &spec_start);
    clock_gettime(CLOCK_REALTIME, &spec_end);
    printf("delta_gettime=%lf us\n", diff_nsec(&spec_start, &spec_end) /
1000.0);

Anyway, in my computer (arch linux 64b) I can get results down to 0.5us.
In the bbone (w/ ubuntu arm 32b) I cannot go further 30ms.

Under JavaScript (node.js) under Angstrom using multiple processes and
messages in userspace I'm getting single digit ms responses (epoll
POLLPRI). I'm kinda confused why yours is so slow.

Of course, I think we should really try to figure out what you are doing.
If you are trying to count fast events, you probably want to use the eCAP
or PRU hardware and not thrash the main CPU.

Rafael_Machado · April 9, 2013, 5:39am

Thanks Jason.

This eCAP idea seems quite promising … but I’m getting mixed information about it from the web:

- eCAP driver for capture mode not supported:
eCAP is described in http://processors.wiki.ti.com/index.php/AM335x_PWM_Driver's_Guide#eCAP_2

and discussed in http://e2e.ti.com/support/arm/sitara_arm/f/791/t/249460.aspx (a decently recent topic)
as not available through a driver (“The current release of the driver supports only PWM mode”).
As my understanding, this would basically means to write my own eCAP capture assembly code (guided by the am3359 - technical reference).

- eCAP driver for capture mode supported:

Although, eCAP driver is described in http://comments.gmane.org/gmane.linux.ports.arm.omap/80653 as available
and I can see the code (also very recent) for doing this in
https://gitorious.org/linux-pwm/linux-pwm/blobs/blame/bdd7cf97153d354f654379563483bdb5a774ef16/drivers/pwm/pwm-tiecap.c

Is this eCAP capture driver available in Angstrom upstream ?
Is it worth to change my embedded OS (ubuntu arm) to Angstrom (so I can get the cutting edge patches) ?

In addition to this eCAP plethora of questions, I’d like to know if is the PRU more suitable for my needs
(reading gpios at high well-controlled frequencies) ?
I’m just getting started with embedded systems and don’t know yet how to evaluate this decision (PRU vs eCAP).

Thank you very much.

Chris_Micali · April 10, 2013, 8:40pm

I’m using the PRU to do something like this now at rates between 1-2MS/s (16-bit samples.) It has been non-trivial though… prepare for some work if you end up going down this route. That said, apart from eCAP (which I don’t know much about) I could not find another way to do this.

Rafael_Machado · April 11, 2013, 4:12am

Chris,
This just gave some hope. At least I know it is possible to do this with a Beaglebone.

Do you have any sources or materials to share with me
besides the resources on https://github.com/beagleboard/am335x_pru_package

I still don’t know yet if I should go through this route or the eCAP one
(I also didn’t try yet to put xenomai or some RTOS over my current ubuntu arm)

Thank you

Chris_Micali · April 11, 2013, 5:56pm

Rafael,

This blog post helped me a lot: http://blog.boxysean.com/2012/08/12/first-steps-with-the-beaglebone-pru/

I’ve also posted a DDR example here: https://github.com/sagedevices/am335x_pru_package/tree/master/pru_sw/example_apps

Also the TI wiki has been helpful: http://processors.wiki.ti.com/index.php/Programmable_Realtime_Unit and http://processors.wiki.ti.com/index.php/PRU_Assembly_Instructions

http://hipstercircuits.com/ has had a couple great PRU posts also

-c

Rafael_Machado · April 12, 2013, 3:26am

Thanks Chris.
I’m certainly going to dig down those links.

In additional to that, I’d like to put another complementary question in this this thread:

The 30us bottleneck I mentioned in the very first message is actually T=30.5717 us
Well … T^-1 = 32.768KHz, which is somewhat of a canonical number, mentioned several times in the TRManual (http://www.ti.com/lit/ug/spruf98x/spruf98x.pdf ),
such as precisely the de-bouncing timer frequency for a given GPIO in input mode.

Do you guys have any though on this ? It cannot be just mere coincidence.

@Chris:
Are using PRU to control how many input gpio ?
I think I read in another topic of a certain 48 gpio limit.

Thank you

Rafael_Machado · April 12, 2013, 2:25pm

ops… wrong technical reference link
http://www.ti.com/litv/pdf/spruh73g is the correct one

Rafael_Machado · April 13, 2013, 4:05am

Hi. I’ve (re)tried the experiment.
This time I’m reading values in the scope instead of relying in plain software sampling time calculation (nanosleep, clock_gettime, etc).

I’d like to share some results.

Scope Images: http://share.pho.to/1oteV

The experiment idea is quite simple:
An input gpio is reading an square wave externally generated (scope CH2 - blue wave)
and another output gpio is configured to “mirror” this input just read (this gpio is connected to scope CH1 - yellow wave).
Conceptually, there are just two consecutives code lines inside of a main infinite loop.

The conclusions are as following:

The initial overhead to read the square wave and write back to another gpio is always >=760ns
For each additional input code line before the output code line (i.e., we are increasing the gpio input payload) there is an observed 80ns increase in the phase difference of CH2 and CH1.

This 80ns increase is pretty much constant across experimentations (i.e, for an additional 4 inputs, there is an ~300ns increasing).

Any external square wave (CH2) of f<500KHz is captured and mirrored back to CH1 with
a good frequency match (approximately the same as the inputted square wave freq).
Anything beyond that and the sampling is compromised (frequency mismatch, high level for too long, low level for too long, etc).
For any input square wave (CH2) with f<=200Khz, the phase difference (between CH1 and CH2) is small (<25% of total period time).
For anything beyond that freq value, phase difference is >25% of total period time.

Now, the most intriguing observed fact:

The time values absorved by the software are always >=30.571 us (32.768Khz).
This sampling period is obsviouly wrong, since I’ve read it the actual value on the scope.

Chris_Micali · April 14, 2013, 6:27pm

Rafael,

I’m using about 10 pins out for the PRU I think, mostly outputs but a couple inputs. The beaglebone only has a subset of the PRU pins brought out and available but i think you could get up to 16-20 PRU I/O pins… don’t fully remember. You can find the exact # by using the TI Pinmux Tool and the beaglebone SRM to see which pins are available on beaglebone headers

-c

randy_rodes · April 15, 2013, 5:02pm

can anyone share some _REAL_ code wrt PRU work with GPIO sampling you are doing?
I am quiet interested to see how it works and may be able to make use
of it as well.

thanks in advance
Randy