How to optimize compiler flags to speed up execution of program

hi ,

-O3 -mtune=cortex-a8 -march=armv7-a -ftree-vectorize -funroll-loops
Could anybone suggest me how can I set the above compiler flags to
speed up the code execution.

I am using openembedded to cross compile the application.I have set
the compiler flags in before compilation of C program in recipe
like below:

DESCRIPTION = "sample program"
CFLAGS += -O3 -mtune=cortex-a8 -march=armv7-a -ftree-vectorize -
mfpu=neon
DEPENDS = "my-opencv"
SRC_URI = " file://detect.c "
do_compile () {
        ${CXX} ${CFLAGS} ${LDFLAGS} -ggdb `pkg-config --cflags opencv`
-o bgface $i `pkg-config --libs opencv`;
      }

After compilation when I run the application on BB there is no speed
improvement.Please suggest me, if I am wrong.
How do I make sure whether the flags are set or not on beagle board.

From the below thread do I need build kernel module to set the
compiler flags:

http://groups.google.com/group/beagleboard/browse_thread/thread/6c465086e11ee24/7fc64a3343fa3629?lnk=gst&q=neon+set+compiler+flags+&pli=1

Thank you,
srikanth

sri <srikanthparupati@gmail.com> writes:

hi ,

-O3 -mtune=cortex-a8 -march=armv7-a -ftree-vectorize -funroll-loops
Could anybone suggest me how can I set the above compiler flags to
speed up the code execution.

-O3 already enables -ftree-vectorize. Unfortunately, the vectorizer
does more harm than good, so adding -fno-tree-vectorize is generally
advisable.

Has anyone tested the thumb instruction set?

-mthumb -mthumb-interwork

Does it really speed up the code if it only uses short integers?

sri <srikanthparupati@gmail.com> writes:

hi ,

-O3 -mtune=cortex-a8 -march=armv7-a -ftree-vectorize -funroll-loops
Could anybone suggest me how can I set the above compiler flags to
speed up the code execution.

-O3 already enables -ftree-vectorize. Unfortunately, the vectorizer
does more harm than good, so adding -fno-tree-vectorize is generally
advisable.

Thumb2 has nothing to do with short integers. It's just another
instruction set which is similar to ARM instruction set, but with
instruction sizes that can be 2- or 4-byte long, instead of always
4-byte long. It operates with the same registers as ARM mode.

Laurent

"Sebastian Kruber" <Sebastian.Kruber@hedo.de> writes:

Has anyone tested the thumb instruction set?

Yes.

-mthumb -mthumb-interwork

Does it really speed up the code if it only uses short integers?

I think you misunderstand what Thumb is. Thumb instructions operate
on 32-bit numbers just like ARM instructions. The difference is that
most _instructions_ are 16 bits in size, half the size of an ARM
instruction, at the expense of not being as versatile. Thumb2 code
compiled with ARM's compiler is, compared to ARM code, almost as fast
and a bit smaller. The original Thumb instruction set available in
ARMv4-6 is almost always slower than the corresponding ARM
instructions. It can save some space in many cases, which is why it
might be used in non-speed-critical functions.

Current gcc versions produce Thumb2 code that crashes on the A8
revision used in current Beagle boards. This is caused by a bug in
the CPU that needs a compiler workaround.

anyone please suggest me the way to set the compiler flags.

Thank you,
srikanth parupati

sri wrote:

anyone please suggest me the way to set the compiler flags.

how 'bout profiling the app and finding out that is "slow", then optimizing that?

Hi Vladimir,