Math NEON and related questions.

Hello,

I'm near completion on my sponsored math-neon project:
http://code.google.com/p/math-neon/
All the basic cmath-like functions are done, I'm adding matrix and
vector functions at the moment.

However I have a few questions:
1. Is it possible to produce a hardfp binary with CS 2009q3? I fail
when linking... it says something like "this is not a hardfp binary"
even though all my object files were compiled -mfloat-abi=hard....
2. I'm getting some strange performance on different kernels. I should
make it clear here that I'm using a Pandora Devboard, i lack the time
and a serial->10pin cord to setup my Beagleboard.... but the question
still applies (both are OMAP3+Angstrom). For instance, on the older
kernel my purely neon sinf is about 2x faster than cmath sinf, on the
newer kernel its ~5x faster. The 2nd value is what i would have
expected/hoped for....

You can see how I'm testing here: http://code.google.com/p/math-neon/source/browse/trunk/math_debug.c

From what i can deduce there's two possibilities: My testing routine
is broke (either kernel side or in my code) or the cmath on the older
kernel is already neon optimised (vfplite could not get that fast). If
someone could compile my project on a beagleboard (just compile+link
all files & libm) and post its console output it would be very
helpful. Otherwise any random idea's would be appreciated.

Cheers,

Hello,

I'm near completion on my sponsored math-neon project:
Google Code Archive - Long-term storage for Google Code Project Hosting.
All the basic cmath-like functions are done, I'm adding matrix and
vector functions at the moment.

However I have a few questions:
1. Is it possible to produce a hardfp binary with CS 2009q3? I fail
when linking... it says something like "this is not a hardfp binary"
even though all my object files were compiled -mfloat-abi=hard....

Including all the libraries that are implicitly linked?

2. I'm getting some strange performance on different kernels. I should
make it clear here that I'm using a Pandora Devboard, i lack the time
and a serial->10pin cord to setup my Beagleboard.... but the question
still applies (both are OMAP3+Angstrom). For instance, on the older
kernel my purely neon sinf is about 2x faster than cmath sinf, on the
newer kernel its ~5x faster. The 2nd value is what i would have
expected/hoped for....

You can see how I'm testing here: Google Code Archive - Long-term storage for Google Code Project Hosting.

From what i can deduce there's two possibilities: My testing routine
is broke (either kernel side or in my code) or the cmath on the older
kernel is already neon optimised (vfplite could not get that fast). If
someone could compile my project on a beagleboard (just compile+link
all files & libm) and post its console output it would be very
helpful. Otherwise any random idea's would be appreciated.

One random idea: perhaps the kernel interacts with the fast
mode settings.

BTW regarding speedup, why don't you instead measure time
on each kernel? This way you'd know whether cmath/libm is
faster or not depending on the kernel.

Laurent

Laurent Desnogues <laurent.desnogues@gmail.com> writes:

Hello,

I'm near completion on my sponsored math-neon project:
Google Code Archive - Long-term storage for Google Code Project Hosting.
All the basic cmath-like functions are done, I'm adding matrix and
vector functions at the moment.

However I have a few questions:
1. Is it possible to produce a hardfp binary with CS 2009q3? I fail
when linking... it says something like "this is not a hardfp binary"
even though all my object files were compiled -mfloat-abi=hard....

Including all the libraries that are implicitly linked?

I have managed to build a hardfp gcc+glibc, but it wasn't easy.
Unfortunately I don't remember exactly what I had to do.

2. I'm getting some strange performance on different kernels. I should
make it clear here that I'm using a Pandora Devboard, i lack the time
and a serial->10pin cord to setup my Beagleboard.... but the question
still applies (both are OMAP3+Angstrom). For instance, on the older
kernel my purely neon sinf is about 2x faster than cmath sinf, on the
newer kernel its ~5x faster. The 2nd value is what i would have
expected/hoped for....

You can see how I'm testing here: Google Code Archive - Long-term storage for Google Code Project Hosting.

From what i can deduce there's two possibilities: My testing routine
is broke (either kernel side or in my code) or the cmath on the older
kernel is already neon optimised (vfplite could not get that fast). If
someone could compile my project on a beagleboard (just compile+link
all files & libm) and post its console output it would be very
helpful. Otherwise any random idea's would be appreciated.

One random idea: perhaps the kernel interacts with the fast
mode settings.

I don't think so. The kernel does context save/restore, but not much
else.

BTW regarding speedup, why don't you instead measure time
on each kernel? This way you'd know whether cmath/libm is
faster or not depending on the kernel.

Yes, never change more than one thing at a time.