Starterware slower than Linux

Hi there,

I am testing the performance of the beaglebone in combination with the no-OS Starterware from TexasInstruments and with the Linux Distribution Angström.

My test program, which I wrote for both for Starterware and Linux, does the following: I configure GPIO_1_28 as an output pin and in an endless loop I do…

  1. set the pin HIGH
  2. for(i=0; i<35000; i++)
    { x = sqrt(3.141592654); }
  3. set the pin LOW
  4. for(i=0; i<35000; i++)
    { x = sqrt(3.141592654); }

Then I used an oscilloscope top measure the period time / frequency of the output. My first results:

using Linux, the period is T = 1,1 ms (with a Jitter of about 70 us)
using Starterware, the period is T = 18 s !!!

Wondering why Starterware is so amazing slow, I started googling and found the following post in a TI forum: http://e2e.ti.com/support/embedded/starterware/f/790/t/208033.aspx
The guys on that forum manged to speed up Starterware by enabling Cache and MMU like in the example \StarterWare\examples\evmskAM335x\uart\uartEcho.c
So I also integrated the functions MMUConfigEnable() and CacheEnable(CACHE_ALL) into my program. It kind of worked, my new results for Starterware:

T = 500 ms

But still !!! 500 times slower than on Linux.

Did anyone make similar experiences or do you have a clue what could be wrong with Starterware? In the past i only did programming on simple Microcontrollers, so I don’t really know how to handle all this cache and MMU stuff.

Is the starterware tool chain using the FPU?

Oh damn you’re right. I didn’t turn on the FPU. I should have read this first:

http://processors.wiki.ti.com/index.php/StarterWare_NeonVFP

What I will do now:
First I will change my programm by doing integer calculations instead of float/double and repeat my comparison between Linux and Starterware. Next I will try to enable the FPU-Unit (Neon, VFP). I’ll post my results when I’m ready.

Here my results:

  1. I changed my programm loop to the following integer calculations:

int i;
int a = 0;
int b = 23210

for(i = 0; i<35000; i++)
{
a = a + b;
}

  1. I changed my FPU testing loop to the following:

int i;
float x = 0;
float y = 2.812;

for(i = 0; i<35000; i++)
{
x = x + y;
}

Are you using the same compiler and flags?

The A8 FPU has hardware sqrt support in the FPU. Given the difference
in your numbers I suspect the Starterware compiler maybe isn't using
that for some reason and is emulating sqrt with simpler FPU instructions.

...but that's just a guess. Have you compared the actual code produced
for the loop by the two compilers?

There are different Compilers for Starterware and Linux. When I set the flag -mfpu=vfpv3 in Linux, and Rebuild the project I don't get a different result. What I can do in eclipse under Linux is to set the optimizations level. What I did yesterday was to change my programm to the following:

double x = 3.1415;

while(1)
{
toggle_gpio();

for(i=0; i<35000; i++)
{
x = sqrt(x);
}
}

For Starterware I measured 85,5 Hz and for Linux 19,7 Hz. In both i turned off the optimization. This result makes a bit more sense but actually both should be the same. The problem with my other programm was probably, that the Compiler on Linux saw that I wanted to do the same operation sqrt(3.1415) 35000 times and did some kind of optimization even though I told him not to optimize. In my new program he gets a new result after each iteration of the loop.
``
But I think I will finish this testing now. In most of my tests Starterware got much better results and moreover you don't have the Jitter like when you use Linux.