same program runs 4x slower in Linux than on Windows CE

I was porting a program from windwos to OMAP platform. We need to know
how fast the program can run on OMAP.
First, I ported the program to windows CE on OMAP 3530, it tooks
1000ms to finish the job, we realized that windows compiler only
support ARM4/5 , and there is no support from ARM 7 Cortex-A8, we
thought that since there are newer compiler support for Cortex-A8 on
Linux, maybe we can run the program faster on linux.
so, I installed Anstrom demo image(with X) on the beagle board, and
booted into Angstrom from a double partion SD card. I managed to port
the same program to beagle board with Code sourcery 2007q3 arm-gnu
tool chain. since the X service took a lot of memory so I turned it
off before I run the program.
To my surprise, this time the program took 4000+ ms to finish. I am
really confused, why does the program run even slower when compiled
with a Coretx-a8 supported compiler on Linux? is there anything I am
missing there?
could anybody provide some advice?
Thanks!

lwpcse wrote:

To my surprise, this time the program took 4000+ ms to finish. I am
really confused, why does the program run even slower when compiled
with a Coretx-a8 supported compiler on Linux? is there anything I am
missing there?
could anybody provide some advice?

well, what does it do?

Did you use -O2 or similar options for gcc?

Are you sure your L2 cache is enabled (if you're running this on
the same Beagle that runs WinCE then forget this question.)

Laurent

lwpcse <lwpcse@gmail.com> writes:

I was porting a program from windwos to OMAP platform. We need to know
how fast the program can run on OMAP.
First, I ported the program to windows CE on OMAP 3530, it tooks
1000ms to finish the job, we realized that windows compiler only
support ARM4/5 , and there is no support from ARM 7 Cortex-A8, we
thought that since there are newer compiler support for Cortex-A8 on
Linux, maybe we can run the program faster on linux.
so, I installed Anstrom demo image(with X) on the beagle board, and
booted into Angstrom from a double partion SD card. I managed to port
the same program to beagle board with Code sourcery 2007q3 arm-gnu
tool chain. since the X service took a lot of memory so I turned it
off before I run the program.
To my surprise, this time the program took 4000+ ms to finish. I am
really confused, why does the program run even slower when compiled
with a Coretx-a8 supported compiler on Linux? is there anything I am
missing there?

What does your programme do? What does /proc/cpu/alignment contain
after running it?

yes I compiled my program as well as QT library which is needed by my
program with optimized option.
How can I know if my L2 cache is enabled?
I do not use the same board, but these 2 boards have exactly the same
specification in terms of CPUs and Memory.
This program will read a image in and process the image.

Thanks.

yes I compiled my program as well as QT library which is needed by my
program with optimized option.
How can I know if my L2 cache is enabled?
I do not use the same board, but these 2 boards have exactly the same
specification in terms of CPUs and Memory.
This program will read a image in and process the image.

And store the image? Display the image? Are your FS caches hot?

no, the program will only do some calculation based on the image. the
image is not saved. also, the calculation is performed after the image
is displayed, but the problem is it looks like the calculation and
other operations like opening the file and displaying the image run
much slower on Linux.
what do you mean by "FS caches hot?'

lwpcse <lwpcse@gmail.com> writes:

no, the program will only do some calculation based on the image. the
image is not saved. also, the calculation is performed after the image
is displayed, but the problem is it looks like the calculation and
other operations like opening the file and displaying the image run
much slower on Linux.

Can you get timings for the individual steps? Which ones are slower?

I did get some numbers for the calculation part. it took 4x more time
to finish, while image loading and displaying were not timed, I just
feel they run a little slower.

lwpcse <lwpcse@gmail.com> writes:

I did get some numbers for the calculation part. it took 4x more time
to finish, while image loading and displaying were not timed, I just
feel they run a little slower.

Now then, what kind of calculations are those? Also, for the second
time, what does /proc/cpu/alignment say?