Best linpack result on Beagleboard-XM

Hello,

On Android Froyo, with the Linpack program (http://
www.greenecomputing.com/apps/linpack/), I'm topping around 14MFLOPS
for the Beagleboard-XM at 1GHz which is very far from what other
device such as the Nexus achieve. I though that Sitara technology at
1GHz should be in some good conditions compared to the other
alternatives...

Has anyone managed to go higher on its Beagleboard-XM? Obviously,
frying up the chip with mpurate=2000 followed by a double explosion of
the TPS and the memory doesn't count!

Share your best results,

Grégoire

Does it use the NEON SIMD unit?

You could (ideally) speedup this result in 4 times!

NEON allows 4 floating point operation at the same time, in real parallelism.

So, try to check if your benchmark used the NEON unit.

BTW, maybe you’re losing some operations with virtual machine, etc. If you really want to measure performance, I’d recommend using C.

The whole point of benchmarking is to use the same method as
everybody!!! If you use C, you are changing everything.

I see two basic areas of improvements: obvious kernel frequency, and
compilation flag of Android. I do think that I have the right Neon
flags during the compilation of Android.

There is an interesting article here:
http://www.greenecomputing.com/2010/09/24/mea-culpa-why-are-the-nexus-one-linpack-scores-so-much-higher-than-on-my-phone/

Is there any Beagleboard/TI-guru who can tell if we are in 64 bit or
128 bit for SIMD?

Grégoire

I’m not a NEON guru, but already used SIMD on my BB-XM. It has 128-bit, and you can operate 16 bytes at the same time, or 4 floats.

Quite simple to use with gcc intrinsics, although hand tune assembly gives better performance.

Rafael

Taking a look at the number against Snapdragon, my feeling is that I
don't use 128 bit. And I have downloaded various Android images
including from TI website and they give the same kind of performance.

It's probably a question of Android gcc/toolchain optimisation.

I think that it would make some sense for TI to take a look. Perhaps
somebody at tI has already done some work about this. Because right
now, TI looks completely ridiculous against Freescale!

Grégoire

G2 <gregoire@gentil.com> writes:

Taking a look at the number against Snapdragon, my feeling is that I
don't use 128 bit. And I have downloaded various Android images
including from TI website and they give the same kind of performance.

Snapdragon probably has a pipelined VFP unit, unlike the A8. It makes
a huge difference. Try running the same test on an A9 and compare.

A 1 GHz A9 reaches about 150 DP MFLOPS on Linpack.

Laurent

Laurent: Is it a TI A9 chip? OMAP4? Pandaboard?

Grégoire

It's an interesting point but my initial question was more "is there
anything more to do in software to improve the situation on
Beagleboard-XM?". Reading between the lines, I guess that no. In other
words, there is no crazy GCC optimization in Angstrom/AIOS OE-based OS
that could be ported to Android toolchain to get 128 bit in SIMD and
then improve the number? Or it's already at 128 bit - then how to know
and check?

Grégoire

G2 <gregoire@gentil.com> writes:

G2 <gregoire@gentil.com> writes:

G2 <grego...@gentil.com> writes:
> Taking a look at the number against Snapdragon, my feeling is that I
> don't use 128 bit. And I have downloaded various Android images
> including from TI website and they give the same kind of performance.

Snapdragon probably has a pipelined VFP unit, unlike the A8. It makes
a huge difference. Try running the same test on an A9 and compare.

It's an interesting point but my initial question was more "is there
anything more to do in software to improve the situation on
Beagleboard-XM?". Reading between the lines, I guess that no. In other
words, there is no crazy GCC optimization in Angstrom/AIOS OE-based OS
that could be ported to Android toolchain to get 128 bit in SIMD and
then improve the number? Or it's already at 128 bit - then how to know
and check?

To improve performance of specific pieces of code, you hand-optimise
them in assembler. If you don't want to do it yourself, you hire me
to do it.

Mans, good solution, but you do not scale well :slight_smile:

Philip