Slow Code Execution on the Beaglebone

Hello,

I have an algorithm that I need to run on the Beaglebone for a project that I am working on. Specifically, it is an IMU algorithm. It uses I2C to read a couple sensors and then does some calculus and linear algebra to produce a result.

I tested this code first on an mbed (http://mbed.org) while I was waiting for my Beagle to come in and had it running on a 1.2ms timer interrupt with no problems. The specs on the mbed are: 96MHz, 32KB RAM, 512KB FLASH.

When I run the same code on the Beaglebone, the algorithm, on average, runs at about 22ms. I have an A5 Beaglebone running the latest Angstrom image from the demo site.

Why would there be such a big slow down? Why would the Beagle run so much slower? I understand that there is some overhead from the OS, but 20ms of it seems a bit extreme.

Thanks for your help,
Joey

Ok, it seems that the issue is the I2C driver. I took out the I2C calls and faked the values they output and clocked the algorithm. On average it ran in about 73us, which is fast enough. Adding the I2C calls (reading 12 bytes total from 2 devices) added the extra 20ms or so of execution time.

Does anyone know how this might be optimized or if there is a way to speed up these calls? I am already doing multibyte reads, so I issue a request for all 6 bytes at once to each device. Is there something in the OS that can be fixed?

Thanks,
Joey

Ok, it seems that the issue is the I2C driver. I took out the I2C calls and
faked the values they output and clocked the algorithm. On average it ran
in about 73us, which is fast enough. Adding the I2C calls (reading 12 bytes
total from 2 devices) added the extra 20ms or so of execution time.

Does anyone know how this might be optimized or if there is a way to speed
up these calls? I am already doing multibyte reads, so I issue a request
for all 6 bytes at once to each device. Is there something in the OS that
can be fixed?

I'd be on the lookout for any sleep() calls within the I2C functions.
[1] What method precisely are you using to perform the I2C calls?

You might try out StarterWare [2] which has some I2C functions [3] if
you are interesting in trying it out without an OS.

[1] Kernel Korner - The New Work Queue Interface in the 2.6 Kernel | Linux Journal
[2] http://processors.wiki.ti.com/index.php/StarterWare
[3] http://processors.wiki.ti.com/index.php/StarterWare_HSI2C

Thanks for your reply, I am using a c++ wrapper around the ioctl read/write methods. There are no sleep calls in my code, so unless it happens at the driver level, that should not be a problem.

I optimized my calls a little bit which significantly decreased the execution time to 3.6ms. This was doing a single 6 byte read instead of 6 single byte reads.

The 3.6ms still seems a little high for reading only 6 bytes, however. The I2C bus with device overhead should be able to do that in under 100us. This would mean the OS overhead is about 3.5ms, which does not seem quite right.

I will look into the starterware stuff, but I believe I will need an OS for the other parts of my application.

Do you have any suggestions on some further optimizations for the I2C calls?

-Joey

Joey Yore wrote:

Thanks for your reply, I am using a c++ wrapper around the ioctl read/write methods. There are no sleep calls in my
code, so unless it happens at the driver level, that should not be a problem.

I optimized my calls a little bit which significantly decreased the execution time to 3.6ms. This was doing a single 6
byte read instead of 6 single byte reads.

The 3.6ms still seems a little high for reading only 6 bytes, however. The I2C bus with device overhead should be able
to do that in under 100us. This would mean the OS overhead is about 3.5ms, which does not seem quite right.

I will look into the starterware stuff, but I believe I will need an OS for the other parts of my application.

Do you have any suggestions on some further optimizations for the I2C calls?

maybe the I2C on the bone is setup to use a rather slow
clock per default, Jason?

The kernel scheduler or other i2c activity on that bus may be causing
the delay. What does your kernel have HZ set to? What kernel are you
using? Are there any other devices on that i2c bus?

Do you have a scope / logic analyzer to verify the actual on-bus time
and frequency of the clock line?

How long does 2 of these i2c transmissions sent back to back take? How
long does 100? Is the increase linear with the number of transmissions
you send? That might shed some light on it.

-Andrew

Thanks for your reply, I am using a c++ wrapper around the ioctl read/write
methods. There are no sleep calls in my code, so unless it happens at the
driver level, that should not be a problem.

I assume you are using I2C_RDWR per [1], but it wouldn't hurt to give
us a bit more of a snippet and to explain where you are taking the
time measurements.

I'd suggest doing a kernel trace to see where all the call goes, but
my understanding is that there will always be a write before the read.
The delay might be between those.

[1] http://kernel.org/doc/Documentation/i2c/dev-interface

I optimized my calls a little bit which significantly decreased the
execution time to 3.6ms. This was doing a single 6 byte read instead of 6
single byte reads.

The 3.6ms still seems a little high for reading only 6 bytes, however. The
I2C bus with device overhead should be able to do that in under 100us. This
would mean the OS overhead is about 3.5ms, which does not seem quite right.

I will look into the starterware stuff, but I believe I will need an OS for
the other parts of my application.

Do you have any suggestions on some further optimizations for the I2C calls?

What did you set the bit clock to be and have you verified it with a scope?

"c++ wrapper" should give you an idea where the bottleneck is.

I'd give the *like* to Jason as he pointed to the necessary document:
[1] http://kernel.org/doc/Documentation/i2c/dev-interface

No need to use ioctl to read/write i2c bus. Plain i2c I/O is performed
by write/read system calls which are transparently translated right
into the i2c_omap.c driver. No bottleneck here. I'm not so sure about
ioctl calls. I'd have a look at sources of that slow program.

Regarding "c++ wrapper" - this may be the issue, but c++ is not that
slow to add 20ms to any algorithm. Even for pure virtual functions
there are only two pointer dereferences, if I'm not mistaken.

Thank you for all the replies, I will be looking into everything.

The c++ is not the problem. It doesn’t add milliseconds of difference. Plus I am using the same wrapper with the low level calls different on the other device with no problems.

I’ll post my findings as I find them,
Joey

Joey,

Not sure if this will be related, but there was a thread that suggested that maybe there is a problem with the i2c bus frequencies. I have not verified this myself, but it’s on my list. Here’s a link: https://groups.google.com/forum/?fromgroups#!topic/beagleboard/Q1pDr1lT7Gk

Steve

Hello,

I have confirmed that the i2c clock is running at 25kHz instead of 100kHz.
This would certainly account for some of the slow down.

From what I have seen, you have to recompile the Kernel in order to get
this fixed. There may be a u-boot alternative, but I have not gotten that
to work yet.

Is there a way you can do this from userland?

Thanks,
Joey

Here is the c++ code snippets...pretty basic and follows the documentation
that I have seen.

I2C::I2C(int channel, int addr) {
        char file[40];
        sprintf(file,"/dev/i2c-%d",channel);

        if((fd = open(file,O_RDWR)) < 0) {
                cout << "ERROR Opening File" << endl;
                status = -1; return; }

        if(ioctl(fd,I2C_SLAVE,addr) < 0) {
                cout << "ERROR Opening Device" << endl;
                status = -1; return; }

status = 0;
}

int I2C::get_status() {
        return status;
}

int I2C::write_address(unsigned char reg) {
        buf[0] = reg;

        if(write(fd,buf,1) != 1) {
                return -1;
        }
        return 0;
}

int I2C::write_byte(unsigned char reg, unsigned char data) {
        buf[0] = reg;
        buf[1] = data;

        if(write(fd,buf,2) != 2) {
                return -1;
        }
        return 0;
}

int I2C::write_masked_byte(unsigned char reg, unsigned char data, unsigned
char mask) {
        unsigned char currentData;

        if(write_address(reg) != 0) {
                return -1;
        }

        if(read_current_byte(currentData) != 0) {
                return -2;
        }

        data = (currentData & ~mask) | (data & mask);
        return write_byte(reg,data);
}

int I2C::read_current_byte(unsigned char &data) {
        if(read(fd,buf,1) != 1) {
                return -1;
        }

        data = buf[0];
        return 0;
}

int I2C::read_current_bytes(unsigned char *data, int num_bytes) {
        if(read(fd,data,num_bytes) != 1) {
                return -1;
        }

        return 0;
}

int I2C::read_byte(unsigned char reg, unsigned char &data) {
        if(write_address(reg) != 0) {
                return -1;
        }

        return read_current_byte(data);
}

int I2C::read_multiple_bytes(unsigned char reg, unsigned char *data,
intnum_bytes
) {
        if(write_address(reg) != 0) {
                return -1;
        }

        return read_current_bytes(data,num_bytes);
} I do not really see what would cause such a delay here, other than the
actual system calls (and that i2c-clock issue in my last post). Anything
that I am missing? -Joey

I forgot to mention, as pointed out in the thread at
https://groups.google.com/forum/?fromgroups#!topic/beagleboard/Q1pDr1lT7Gk

dmesg reports the clock to be 100kHz

# dmesg | grep omap_i2c.3
[ 0.233241] omap_i2c.3: alias fck already exists
[ 0.250713] omap_i2c omap_i2c.3: bus 3 rev2.4.0 at 100kHz

But when measured on the scope, I get ~25kHz.
See image here: http://jyore.com/site_media/img/beagle/IMAG0198.jpg

int I2C::read_current_bytes(unsigned char *data, int num_bytes) {
         if(read(fd,data,num_bytes) != 1) {

are you sure read() returns "1" here?

Read returns the number of bytes read. There is a slight bug here that I
thank you for pointing out.

Before I had single byte reads only, so yes, read would return 1 on success.
Now that I have multibyte reads too, I need to support reading more than 1
byte.

I will make this change to my code source. Thanks again.

from Linux man pages for read:
"On success, the number of bytes read is returned (zero indicates end of
file), and the file position is advanced by this number. It is not an error
if this number is smaller than the number of bytes requested; this may
happen for example because fewer bytes are actually available right now
(maybe because we were close to end-of-file, or because we are reading from
a pipe, or from a terminal), or because *read*() was interrupted by a
signal. On error, -1 is returned, and *errno* is set appropriately. In this
case it is left unspecified whether the file position (if any) changes."