compiler and ffmpeg question

Hi All,
I use gentoo(-pandora) linux on the beagleboard, with native gcc
4.3.3-r2 on it. It's the same version that OE on my desktop uses for
Angstrom builds and the set of patches applied when building this
compiler is almost the same in gentoo and OE. There are a few
differences, but they didn't look like arm/neon related to me.

The compiler works fine in most cases, I could build a gnome desktop,
firefox, pidgin with it, but when it comes to neon I'm not sure it
works right. For example pixman 0.16.2 compiled, saying all arm
optimizations are enabled, but the xserver crashed as soon as I
started a GTK app (xterm was ok). With a really old pixman (0.10.x, no
neon at all), everything was ok.

I also tried ffmpeg (svn), passing --cpu=cortex-a8 to the config
script but the test programs failed, and no neon, no vfp was enabled.

So my questions are:
- Is there an easy (fool-proof) way to figure out if the compiler is
broken? I mean some test: "compile this, assembly list should look
like this, and program output should be this" Unfortunately I'm too
new to arm and neon, and can't determine these on sight.
- Do I need some magic to build ffmpeg with neon? I was wondering if
I'm simply missing some ./config options, env vars. I know this is not
the ffmpeg list, but I know some experts are here. :slight_smile:

Almost forgot: the binutils package is different from OE, I use 2.19.1
with some patches, I haven't reviewed them yet.

Regards,
Gyorgy

I had ffmpeg test program results like this:

check_asm neon "vadd.i16 q0, q0, q0"
check_as
BEGIN /tmp/ffconf.rfSJ7sky.c
    1 void foo(void){ __asm__ volatile("vadd.i16 q0, q0, q0"); }
END /tmp/ffconf.rfSJ7sky.c
gcc -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -mcpu=cortex-a8 -c -o /tmp/ffconf.k0U9yBVk.o
/tmp/ffconf.rfSJ7sky.c
/tmp/ccvfAIyD.s: Assembler messages:
/tmp/ccvfAIyD.s:24: Error: bad instruction `vadd.i16 q0,q0,q0'

check_asm armvfp "fadds s0, s0, s0"
check_as
BEGIN /tmp/ffconf.rfSJ7sky.c
    1 void foo(void){ __asm__ volatile("fadds s0, s0, s0"); }
END /tmp/ffconf.rfSJ7sky.c
gcc -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -mcpu=cortex-a8 -c -o /tmp/ffconf.k0U9yBVk.o
/tmp/ffconf.rfSJ7sky.c
/tmp/ccvw5GSe.s: Assembler messages:
/tmp/ccvw5GSe.s:24: Error: selected processor does not support `fadds s0,s0,s0'

check_cc
BEGIN /tmp/ffconf.rfSJ7sky.c
    1 __asm__ (".eabi_attribute 28, 1");
    2 int main(void) { return 0; }
END /tmp/ffconf.rfSJ7sky.c
gcc -D_ISOC99_SOURCE -D_POSIX_C_SOURCE=200112 -D_FILE_OFFSET_BITS=64
-D_LARGEFILE_SOURCE -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -c
-o /tmp/ffconf.k0U9yBVk.o /tmp/ffconf.rfSJ7sky.c
gcc -o /tmp/ffconf.SX32O7xj /tmp/ffconf.k0U9yBVk.o
/usr/lib/gcc/armv7a-softfloat-linux-gnueabi/4.3.3/../../../../armv7a-softfloat-linux-gnueabi/bin/ld:
ERROR: /tmp/ffconf.k0U9yBVk.o uses VFP register arguments,
/tmp/ffconf.SX32O7xj does not
/usr/lib/gcc/armv7a-softfloat-linux-gnueabi/4.3.3/../../../../armv7a-softfloat-linux-gnueabi/bin/ld:
failed to merge target specific data of file /tmp/ffconf.k0U9yBVk.o
collect2: ld returned 1 exit status

env:
CFLAGS="-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
-fomit-frame-pointer -Os -pipe"
CXXFLAGS <same>
CHOST="armv7a-softfloat-linux-gnueabi"

If you have a reproducible testcase for pixman 0.16.2 problems, it is better
to report a bug (explaining what kind of GTK application was causing crash
and what you did to trigger it).

Pixman has some regression tests included and you may want to try them first
(at least try to run 'test/blitters-test' program after compiling pixman).

As for the potential sources of instability. Make sure that your compiler does
not try autovectorization, it has been broken for a long time on all platforms
(to various extent), but recently gcc started to use it by default at -O3
optimization level. It may be a good choice in the long run (bugs will be
found and hopefully fixed), but everyone trying -O3 optimizations is used as a
guinea pig at the moment. That is unless autovectorization is explicitly
disabled with -fno-tree-vectorize option.

Also your kernel may have problems with properly saving/restoring NEON/VFP
context in all cases, which may make system unstable. Having more NEON
optimizations surely increases chances of triggering these bugs.

Additionally ARM Cortex-A8 core has some HW bugs, which need to be
workarounded (in bootloader). But CPU just deadlocks if you don't have
the needed fixes, so it is relatively easy to differentiate from the other
types of problems :slight_smile:

Using thumb2 is a potential source of problems too, but hopefully you are not
touching it yet.

Hi,
I'll run the pixman tests and report the results to the developers.
BTW, I'm using a recent bootloader and kernel (2.6.29-r46) from OE,
following the Angstrom build steps, so the issue is probably not
related to these.

Regards,
Gyorgy

Hi,
Issues kind of solved:

- pixman blitter test runs fine and as far as I can tell other tests
too. I've built pixman-0.16.2 with no optimizations, arm simd, and arm
neon. All 3 gave the result:
crc32=06D8EDB6
blitters test passed
What I don't understand does why does the whole xserver crash when
start something that uses pixman? Did the ABI change? Do I need to
rebuild cairo or another lib? I'll investigate this further,
tips/ideas where to look welcome.

- the ffmpeg VFP and NEON tests fail because no matter what --cpu
--arch and CFLAGS I set when calling ./configure the magic options
(-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp) don't
get passed to the compiler when doing the test programs. I have to set
ASFLAGS for the test programs to compile which is a bit strange to me,
cause -m* is not intended for the assembler.
Is this a bug or a feature?

Regards,
Gyorgy

pixman-0.16 has different semantics for source image clipping and probably may
be incompatible with some versions of xserver. But this isn't ARM/NEON
specific, so disabling NEON optimizations would not have any effect if this
was the source of your problem.

Gyorgy Szekely <hoditohod@gmail.com> writes:

- the ffmpeg VFP and NEON tests fail because no matter what --cpu
--arch and CFLAGS I set when calling ./configure the magic options
(-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp) don't
get passed to the compiler when doing the test programs. I have to set
ASFLAGS for the test programs to compile which is a bit strange to me,
cause -m* is not intended for the assembler.
Is this a bug or a feature?

It's a feature.

For those who were not present on the #ffmpeg-devel today :), but
would like a more detailed explanation:
Not using CFLAGS in this case is the intended behaviour. The
./configure script decides what cpu flags to use, and for a well
behaved compiler passing --cpu=cortex-a8 is enough. (At least, this is
what I see in OE.)
In my case solution was: --extra-cflags='-mfpu=neon -mfloat-abi=softfp'
With this VFP and NEON is correctly detected.

Regards,
Gyorgy

Gyorgy Szekely <hoditohod@gmail.com> writes:

Hi,
I'm slowly, very slowly, but progressing. I managed to type make after
last week's ./configur(e)ation. And this is what I got. (still ffmpeg)

armv7a-softfloat-linux-gnueabi-gcc -DHAVE_AV_CONFIG_H -I.
-I"/var/tmp/portage/media-video/ffmpeg-0.5_p20373/work/ffmpeg-0.5_p20373"
-mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -c -o
libavcodec/arm/dsputil_vfp.o libavcodec/arm/dsputil_neon.S

libavcodec/arm/dsputil_neon.S: Assembler messages:
libavcodec/arm/dsputil_neon.S:255: Error: bad instruction `vld1.64
{d0,d1},[r1],r2'
(and so on... with every line of source and every neon instruction used)

It works if the assembly is inlined in a *.c file (as the ./configure
script tests the feature), but fails for *.S. It looks like as if the
compiler didn't pass the right options to the assembler in case of the
.S files. I don't know why, and how can I pass these options...

So, my compiler is junk?
I'll try to figure out how OE builds the native toolchain, and learn
from that (it doesn't look trivial :wink: ). Where do I pass the the
default options Mans' been talking about?

Regards,
Gyorgy

Gyorgy Szekely <hoditohod@gmail.com> writes:

Hi,
I'm slowly, very slowly, but progressing. I managed to type make after
last week's ./configur(e)ation. And this is what I got. (still ffmpeg)

armv7a-softfloat-linux-gnueabi-gcc -DHAVE_AV_CONFIG_H -I.
-I"/var/tmp/portage/media-video/ffmpeg-0.5_p20373/work/ffmpeg-0.5_p20373"
-mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -c -o
libavcodec/arm/dsputil_vfp.o libavcodec/arm/dsputil_neon.S

Please post the exact command that's failing.

libavcodec/arm/dsputil_neon.S: Assembler messages:
libavcodec/arm/dsputil_neon.S:255: Error: bad instruction `vld1.64
{d0,d1},[r1],r2'
(and so on... with every line of source and every neon instruction used)

It works if the assembly is inlined in a *.c file (as the ./configure
script tests the feature), but fails for *.S. It looks like as if the
compiler didn't pass the right options to the assembler in case of the
.S files. I don't know why, and how can I pass these options...

So, my compiler is junk?

Probably. What compiler are you using?

M�ns Rullg�rd wrote:

Probably. What compiler are you using?
  
Might want to throw in a -v on your gcc command line, so you can see
exactly what it is passing to the assembler and linker. Then you can
tweak that with the -Wa option.

Gcc is driven internally by a "specs file", among other things. That
file, if you can find it, would control in large part what goes between
the compiler frontend and the assembler. The fact that the instructions
work for .c but not .S may suggest a spec file issue as the same
assembler program would be used in both cases.

Gcc's -dumpspecs command will have it show you what the current specs
file looks like. You can capture that output to a file, modify it, and
then feed it back to the compiler using -specs=<file>. But you'd want
to do that with help from the gcc guys, since the file format is almost
unbearably cryptic and minimally-documented. Good luck, and godspeed. :slight_smile:

b.g.

Hi,
Here are my latest results! :slight_smile:

Compiler:
- 4.3.3-r2, version and patches are almost exactly the same as OE
Angstrom/beagle gcc4.3.3
- configuration is different from OE, most notably it has the --with-float=soft
- binutils is 2.19.1-r1 (I haven't checked the patches, but version is
different from OE)

The ffmpeg build error mentioned in my last letter can be seen more in
detail here:
- build.log [1] (contains all output from the build)
- config.err [2] (./configure options among other stuff)
I think everything is ok with these.

But more important: I played around with the -v flag Bill mentioned,
and I've seen a few interesting things. I created a .c file with 2
inline assembly instructions (one vfp and one neon), and also an .S
with the same instructions. With my usual set of flags
(-mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp) the first compiles
fine, the second dies with bad instruction. With -v flag I checked the
compilation process and (for my surprise) the assembler gets always
invoked with the same options:

[...]/bin/as -mcpu=cortex-a8 -mfpu=softvfp -meabi=4 [...]

So these are the options for both the .c and .S. I'm not sure if these
are right... Probably not because the -mfpu option passed to gas is
definitely not what I passed to gcc.

To investigate a bit further I used the -S option to get the assembly
listing of the c file, and found the main difference between my .S
file and the one gcc generates from the c source. The generated
version starts with these:

        .cpu cortex-a8
        .eabi_attribute 27, 3
        .fpu neon
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 6
        .eabi_attribute 18, 4

I don't know what the eabi_attribute lines mean, but the .cpu .fpu
most probably override the gas commandline options. With these 2 lines
inserted into the right ffmpeg .S files, all the neon and vfp specific
code compiles. (I haven't yet tried to execute it)

So my analysis (as I'm no expert on this, correct me if I'm wrong):
- the .S files don't compile because gcc doesn't pass the right -mfpu
flag to the assembler
- it's probably because the --with-float=soft options gcc configured with
- as a workaround inserting the .cpu and .fpu lines (and
eabi_attributes?) can help
Is this right?

The next step will be to rebuild the toolchain without the
--with-float option, and see if that helps, if not I'll use the
workaround.

Thanks for all your help so far. :slight_smile:

Regards,
Gyorgy

[1] http://blackmilk.extra.hu/logs/build.log
[2] http://blackmilk.extra.hu/logs/config.err

Gyorgy Szekely <hoditohod@gmail.com> writes:

Hi,
Here are my latest results! :slight_smile:

Compiler:
- 4.3.3-r2, version and patches are almost exactly the same as OE
Angstrom/beagle gcc4.3.3
- configuration is different from OE, most notably it has the --with-float=soft

Make that --with-float-softfp.

Hi,
I have rebuilt the compiler with the suggested --with-float=softfp, I
also noticed that gentoo applies a "softfloat" patch that hacks in the
specs file, I removed it. With the new configuration ffmpeg compiled
with vfp and neon cleanly, and works fine. I recompiled (most of) the
gnome-light 2.26 desktop without any issue.
So my problem is solved, thanks for all your help.

Regards,
Gyorgy