Dhrystone 2.1

Hi,

I've tried compiling the Dhrystone 2.1 benchmark for the beagle board
and running it. This is using CodeSourcery G++ Lite 2007q3-51, and
the kernel from here git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6.git.

The result I get is this:

Dhrystone Benchmark, Version 2.1 (Language: C)

Program compiled without 'register' attribute

Please give the number of runs through the benchmark: 100000000

Execution starts, 100000000 runs through Dhrystone

Execution ends

Final values of the variables used in the benchmark:

Int_Glob: 5
        should be: 5
Bool_Glob: 1
        should be: 1
Ch_1_Glob: A
        should be: A
Ch_2_Glob: B
        should be: B
Arr_1_Glob[8]: 7
        should be: 7
Arr_2_Glob[8][7]: 100000010
        should be: Number_Of_Runs + 10
Ptr_Glob->
  Ptr_Comp: 90120
        should be: (implementation-dependent)
  Discr: 0
        should be: 0
  Enum_Comp: 2
        should be: 2
  Int_Comp: 17
        should be: 17
  Str_Comp: DHRYSTONE PROGRAM, SOME STRING
        should be: DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->
  Ptr_Comp: 90120
        should be: (implementation-dependent), same as above
  Discr: 0
        should be: 0
  Enum_Comp: 1
        should be: 1
  Int_Comp: 18
        should be: 18
  Str_Comp: DHRYSTONE PROGRAM, SOME STRING
        should be: DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc: 5
        should be: 5
Int_2_Loc: 13
        should be: 13
Int_3_Loc: 7
        should be: 7
Enum_Loc: 1
        should be: 1
Str_1_Loc: DHRYSTONE PROGRAM, 1'ST STRING
        should be: DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc: DHRYSTONE PROGRAM, 2'ND STRING
        should be: DHRYSTONE PROGRAM, 2'ND STRING

Microseconds for one run through Dhrystone: 1.0
Dhrystones per Second: 1022181.3

That's raw Dhrystones per second, so I divide by 1751 to get the
Dhrystone VAX MIPS (as per http://www.arm.com/support/faqdev/4160.html).
So this gives 584 DMIPs, which isn't anywhere close to the 1200 DMIPS
often specified for the A8.

How can I verify the Beagle is being clocked at ~500MHz, and is 584
really the best the Beagle can do?

Regards,

Mike

How did you compile it?

BTW, the 1200 DMIPS is for Dhrystone compiled with armcc which is
sometimes better than gcc, and very well tuned for Dhrystone (among
other things).

Laurent

On Tue, Mar 10, 2009 at 9:25 PM, Mike McTernan
> So this gives 584 DMIPs, which isn't anywhere close to the 1200 DMIPS
> often specified for the A8.

How did you compile it?

arm-none-linux-gnueabi-gcc -DNO_PROTOTYPES=1 -O2 -c -o dhry_1.o
dhry_1.c
arm-none-linux-gnueabi-gcc -DNO_PROTOTYPES=1 -O2 -c -o dhry_2.o
dhry_2.c
arm-none-linux-gnueabi-gcc -o dhrystone dhry_1.o dhry_2.o
arm-none-linux-gnueabi-strip -x dhrystone

$ file dhrystone
dhrystone: ELF 32-bit LSB executable, ARM, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.14, not
stripped

BTW, the 1200 DMIPS is for Dhrystone compiled with armcc which is
sometimes better than gcc, and very well tuned for Dhrystone (among
other things).

600 DMIPS is a massive difference.

Mike

Mike McTernan <Michael.McTernan@gmail.com> writes:

arm-none-linux-gnueabi-gcc -DNO_PROTOTYPES=1 -O2 -c -o dhry_1.o
dhry_1.c
arm-none-linux-gnueabi-gcc -DNO_PROTOTYPES=1 -O2 -c -o dhry_2.o
dhry_2.c
arm-none-linux-gnueabi-gcc -o dhrystone dhry_1.o dhry_2.o
arm-none-linux-gnueabi-strip -x dhrystone

Does that really compile? For dhry 2.1 I have to use -DTIME.
Also you should use -mcpu=cortex-a8.

$ file dhrystone
dhrystone: ELF 32-bit LSB executable, ARM, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.14, not
stripped

BTW, the 1200 DMIPS is for Dhrystone compiled with armcc which is
sometimes better than gcc, and very well tuned for Dhrystone (among
other things).

600 DMIPS is a massive difference.

Given that dhrystone is really a dumb benchmark, it spends a non
negligible amount of time in C library, so the C library should have
a very good implementation of strcmp. What libc are you using?

And, yes over-tuning of compilers can give huge speedups for
specific benchmarks. OTOH this can bring some speedup for
other programs.

Laurent

> arm-none-linux-gnueabi-gcc -DNO_PROTOTYPES=1 -O2 -c -o dhry_2.o
> dhry_2.c
> arm-none-linux-gnueabi-gcc -o dhrystone dhry_1.o dhry_2.o
> arm-none-linux-gnueabi-strip -x dhrystone

Does that really compile? For dhry 2.1 I have to use -DTIME.
Also you should use -mcpu=cortex-a8.

Yes it really compiles! TIMES is getting defined in dhry.h -
apologies for not indicating that.

Good point on -mcpu. I've retried with that but got no significant
improvement in performance.

> 600 DMIPS is a massive difference.

Given that dhrystone is really a dumb benchmark, it spends a non
negligible amount of time in C library, so the C library should have
a very good implementation of strcmp.

Agreed that it's dumb. And I wouldn't be concerned if it were out by
<100 DMIPS for example, but I kinda expect to be able to get close the
published 'performance' metric provided by the benchmark.

For example, on the Blackfin they have benchmarked it and published
figures:

http://docs.blackfin.uclinux.org/doku.php?id=uclinux-dist:dhrystone

Using boards and the same set-up here I can almost reproduce those
*exact* figures.

What libc are you using?

The CodeSourcery one, GLIBC. Not sure on the exact version as they
don't seem to list it:
http://www.codesourcery.com/sgpp/lite/arm/portal/release313?@template=datasheet

What else can I check?

Regards,

Mike

Agreed that it's dumb. And I wouldn't be concerned if it were out by
<100 DMIPS for example, but I kinda expect to be able to get close the
published 'performance' metric provided by the benchmark.

Using different compilers? If you want to reproduce that results buy
RVCT. But beware that since RVCT will be using the host C library
you might be disappointed.

For example, on the Blackfin they have benchmarked it and published
figures:

http://docs.blackfin.uclinux.org/doku.php?id=uclinux-dist:dhrystone

Using boards and the same set-up here I can almost reproduce those
*exact* figures.

Well given that blackfin published score uses gcc there's no doubt you
can reproduce it :slight_smile:

I don't understand why people are obsessed by dhrystone. It doesn't
measure anything useful.

Laurent

Laurent Desnogues <laurent.desnogues@gmail.com> writes:

dhrystone is mostly a test of some library functions (if I recall
correctly mostly str* and mem* functions).

Way back in the old days I had Turbo C for the Atari ST. This had the
best dhrystone scores for atari.
Reason: not because they produced better code.
They just analysed the benchmarks and made handcrafted versions of the
library functions involved, including loop unrolling etc.
Needless to say that if you used other functions the difference
between Turbo C and other compilers was a lot smaller.

Lesson: there are lies, damned lies and benchmarks.

Despite not measuring anything of use, it does have it's uses. For
example, I want to see how much more powerful the OMAP is than a
Blackfin BF537. Since I can run Dhrystone with similar compilers and
libraries on both platforms, it's a quick way to get a comparison
without porting the actual app (which is quite big and has a lot of
dependencies).

I agree that the actual number doesn't represent much, but having done
the experiment on both platforms my conclusion is that the OMAP is
only about twice as fast as the Blackfin.

Marketing would have you believe something quite different...

Regards,

Mike

Mike McTernan <Michael.McTernan@gmail.com> writes:

I don't understand why people are obsessed by dhrystone. It doesn't
measure anything useful.

Despite not measuring anything of use, it does have it's uses. For
example, I want to see how much more powerful the OMAP is than a
Blackfin BF537. Since I can run Dhrystone with similar compilers and
libraries on both platforms, it's a quick way to get a comparison
without porting the actual app (which is quite big and has a lot of
dependencies).

Chances are dhrystone is not at all representative of your actual
workload. A much better approach is to extract a few heavy functions
from your actual application and test those on various platforms. For
instance, if your app uses FFTs heavily, test *your* FFT function on
each platform of interest, and also check for availability of
specially tuned FFT implementations.

I agree that the actual number doesn't represent much, but having done
the experiment on both platforms my conclusion is that the OMAP is
only about twice as fast as the Blackfin.

The Blackfin is a very different architecture. Comparing the speed of
C code unrelated to your real problem is mostly meaningless.

Marketing would have you believe something quite different...

That's their job.

Chances are dhrystone is not at all representative of your actual
workload.

Of course it isn't! Dhrystone is a toy.

A much better approach is to extract a few heavy functions
from your actual application and test those on various platforms. For
instance, if your app uses FFTs heavily, test *your* FFT function on
each platform of interest, and also check for availability of
specially tuned FFT implementations.

The app is just C code, mainly control code, no FFT, no DSP functions,
actually a bad fit for a Blackfin. While Dhrystone doesn't represent
much, I don't think it's too bad in this case.

I'll look for some arbitrary bit of C code with which to compare
performance, but the Dhrystone result makes me think I'll see only a
~2x speed up from Blackfin to OMAP - quite disappointing.

> Marketing would have you believe something quite different...

That's their job.

Indeed. ARM/TI's marketing department must be doing a better job that
ADI's in that respect :wink:

Regards,

Mike

Mike McTernan wrote:

Chances are dhrystone is not at all representative of your actual
workload.

Of course it isn't! Dhrystone is a toy.

A much better approach is to extract a few heavy functions
from your actual application and test those on various platforms. For
instance, if your app uses FFTs heavily, test *your* FFT function on
each platform of interest, and also check for availability of
specially tuned FFT implementations.

The app is just C code, mainly control code, no FFT, no DSP functions,
actually a bad fit for a Blackfin. While Dhrystone doesn't represent
much, I don't think it's too bad in this case.

I'll look for some arbitrary bit of C code with which to compare
performance, but the Dhrystone result makes me think I'll see only a
~2x speed up from Blackfin to OMAP - quite disappointing.

But not only the speed matters. Power consumption, peripherals, memory
interfaces, etc. IMHO, BF537 is something different than OMAP35xx w.r.t.
application areas. At least they have different programming models.

Marketing would have you believe something quite different...

That's their job.

Indeed. ARM/TI's marketing department must be doing a better job that
ADI's in that respect :wink:

They just show us the one side of the medallion :slight_smile:

Best regards,
Caglar