Hello,
beeing curious about hardfp, I've used the povray benchmark to get some numbers. I've used povray 2.6.1, see http://www.povray.org/download/benchmark.php for an explanation. I think that will give an impression about how much applications with heavy floating point usage might gain from hardfp.
I've run those tests using a BeagleBoard C4 ((w/o XM, 720 MHz) using the same (vanilla) kernel 2.6.37.3, both systems where on the same usb-hd, using different (ext4-)partitions with the same size.
The whole softfp-system was compiled using
CFLAGS="-Os -pipe -mtune=cortex-a8 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer"
CXXFLAGS="${CFLAGS} -std=gnu++0x -fvisibility-inlines-hidden"
CFLAGS="${CFLAGS} -std=gnu99"
LDFLAGS="-Wl,-O1 -Wl,--enable-new-dtags -Wl,--sort-common -Wl,--as-needed"
and the hardfp-system was compiled using
CFLAGS="-Os -pipe -mtune=cortex-a8 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard -fomit-frame-pointer"
CXXFLAGS="${CFLAGS} -std=gnu++0x -fvisibility-inlines-hidden"
CFLAGS="${CFLAGS} -std=gnu99"
LDFLAGS="-Wl,-O1 -Wl,--enable-new-dtags -Wl,--sort-common -Wl,--as-needed"
All package-versions where the same and the same patches (if any) where used. The gcc version was 4.5.2, binutils was 2.21 and glibc was 2.11.2.
Here are the times for "time povray benchmark.ini":
softfp:
Total Time: 10 hours 39 minutes 23 seconds (38363 seconds)
real 639m23.292s
user 639m17.914s
sys 0m0.430s
hardfp:
Total Time: 10 hours 3 minutes 25 seconds (36205 seconds)
real 603m24.803s
user 603m21.188s
sys 0m0.422s
Beeing curious about the compiler optimisations I've done the same benchmark on the same systems just using -O3 instead of -Os to compile povray:
softfp:
Total Time: 9 hours 49 minutes 29 seconds (35369 seconds)
real 589m29.634s
user 589m24.016s
sys 0m0.422s
hardfp:
Total Time: 9 hours 22 minutes 13 seconds (33733 seconds)
real 562m12.603s
user 562m9.320s
sys 0m0.469s
So it looks like using hardfp instead of softp might gain about 5-6 % for applications which are heavily using floating points.
I don't want to interpret if -Os, -O2 or -O3 might be better for your use case, those optimizations could have heavy implications, escpecially in regard to floating point and using the fastest optimizations won't fit allways.
Regards,
Alexander Holler
PS: Before someone asks why I'm using -std=gnu++0x, I'm using it because c++0x offers some new nice to have features, especially in regard to "perfect forwarding", and I think almost all c++-programs might benefit from that, if those new features are used e.g. by the STL. I haven't checked if those new features are already used somewhere in the standard libraries (or templates), but ...
Be aware, using -std=gnu++0x actually breaks compilation of some few c++-programs.