Hi there,
I am testing the performance of the beaglebone in combination with the no-OS Starterware from TexasInstruments and with the Linux Distribution Angström.
My test program, which I wrote for both for Starterware and Linux, does the following: I configure GPIO_1_28 as an output pin and in an endless loop I do…
- set the pin HIGH
- for(i=0; i<35000; i++)
{ x = sqrt(3.141592654); }
- set the pin LOW
- for(i=0; i<35000; i++)
{ x = sqrt(3.141592654); }
Then I used an oscilloscope top measure the period time / frequency of the output. My first results:
using Linux, the period is T = 1,1 ms (with a Jitter of about 70 us)
using Starterware, the period is T = 18 s !!!
Wondering why Starterware is so amazing slow, I started googling and found the following post in a TI forum: http://e2e.ti.com/support/embedded/starterware/f/790/t/208033.aspx
The guys on that forum manged to speed up Starterware by enabling Cache and MMU like in the example \StarterWare\examples\evmskAM335x\uart\uartEcho.c
So I also integrated the functions MMUConfigEnable() and CacheEnable(CACHE_ALL) into my program. It kind of worked, my new results for Starterware:
T = 500 ms
But still !!! 500 times slower than on Linux.
Did anyone make similar experiences or do you have a clue what could be wrong with Starterware? In the past i only did programming on simple Microcontrollers, so I don’t really know how to handle all this cache and MMU stuff.
Is the starterware tool chain using the FPU?
Oh damn you’re right. I didn’t turn on the FPU. I should have read this first:
http://processors.wiki.ti.com/index.php/StarterWare_NeonVFP
What I will do now:
First I will change my programm by doing integer calculations instead of float/double and repeat my comparison between Linux and Starterware. Next I will try to enable the FPU-Unit (Neon, VFP). I’ll post my results when I’m ready.
Here my results:
- I changed my programm loop to the following integer calculations:
int i;
int a = 0;
int b = 23210
for(i = 0; i<35000; i++)
{
a = a + b;
}
- I changed my FPU testing loop to the following:
int i;
float x = 0;
float y = 2.812;
for(i = 0; i<35000; i++)
{
x = x + y;
}
Are you using the same compiler and flags?
The A8 FPU has hardware sqrt support in the FPU. Given the difference
in your numbers I suspect the Starterware compiler maybe isn't using
that for some reason and is emulating sqrt with simpler FPU instructions.
...but that's just a guess. Have you compared the actual code produced
for the loop by the two compilers?
There are different Compilers for Starterware and Linux. When I set the flag -mfpu=vfpv3 in Linux, and Rebuild the project I don't get a different result. What I can do in eclipse under Linux is to set the optimizations level. What I did yesterday was to change my programm to the following:
double x = 3.1415;
while(1)
{
toggle_gpio();
for(i=0; i<35000; i++)
{
x = sqrt(x);
}
}
For Starterware I measured 85,5 Hz and for Linux 19,7 Hz. In both i turned off the optimization. This result makes a bit more sense but actually both should be the same. The problem with my other programm was probably, that the Compiler on Linux saw that I wanted to do the same operation sqrt(3.1415) 35000 times and did some kind of optimization even though I told him not to optimize. In my new program he gets a new result after each iteration of the loop.
``
But I think I will finish this testing now. In most of my tests Starterware got much better results and moreover you don't have the Jitter like when you use Linux.