CodeSourcery G++ Lite doesn't save Neon upper registers

bob.feretich · December 8, 2009, 6:10am

I'm using CodeSourcery G++ Lite 2009q1 version4.3.3.
I have been examining the assembly output of a program which makes
heavy use of the Neon floating point instructions. The program uses
the Neon intrinsics.

Although the compiler G++ utilizes most of the available register set,
it only seems to save/restore registers d8 to d14.

The below code segment is extracted from the entry point of a function
that is declared as external.

3076 0000 F04F2DE9 stmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
@,
3077 .save {r4, r5, r6, r7, r8, r9, sl, fp, lr}
3078 .LCFI0:
3079 0004 0E8B2DED fstmfdd sp!, {d8, d9, d10, d11, d12, d13, d14}
@,
3080 .vsave {d8, d9, d10, d11, d12, d13, d14}
. . .
4605 07a8 140BC8ED vstr d16, [r8, #80] @, acceleration
4606 07ac 161BC8ED vstr d17, [r8, #88] @, acceleration
4607 .loc 1 675 0
4608 07b0 306B82ED vstr d6, [r2, #192] @, accel_calibration
4609 07b4 327B82ED vstr d7, [r2, #200] @, accel_calibration

As you can see, the compiler is using registers both above and below
the ones that it decided to save.

I have seen some reports of registers being corrupted after neon code
is executed. I suspect that this is the reason.

Is anyone aware of a fix or work around for this apparent bug?
Regards,
Bob

mansr · December 8, 2009, 3:48pm

bob.feretich@prodigy.net writes:

I'm using CodeSourcery G++ Lite 2009q1 version4.3.3.
I have been examining the assembly output of a program which makes
heavy use of the Neon floating point instructions. The program uses
the Neon intrinsics.

Although the compiler G++ utilizes most of the available register set,
it only seems to save/restore registers d8 to d14.

The ARM ABI specifies registers D8-D15 as callee-saved. All others
are caller-saved. The compiler is working correctly (this time).

Siarhei_Siamashka · December 8, 2009, 4:10pm

I'm using CodeSourcery G++ Lite 2009q1 version4.3.3.
I have been examining the assembly output of a program which makes
heavy use of the Neon floating point instructions. The program uses
the Neon intrinsics.

Although the compiler G++ utilizes most of the available register set,
it only seems to save/restore registers d8 to d14.

The below code segment is extracted from the entry point of a function
that is declared as external.

3076 0000 F04F2DE9 stmfd sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
@,
3077 .save {r4, r5, r6, r7, r8, r9, sl, fp, lr}
3078 .LCFI0:
3079 0004 0E8B2DED fstmfdd sp!, {d8, d9, d10, d11, d12, d13, d14}
@,
3080 .vsave {d8, d9, d10, d11, d12, d13, d14}
. . .
4605 07a8 140BC8ED vstr d16, [r8, #80] @, acceleration
4606 07ac 161BC8ED vstr d17, [r8, #88] @, acceleration
4607 .loc 1 675 0
4608 07b0 306B82ED vstr d6, [r2, #192] @, accel_calibration
4609 07b4 327B82ED vstr d7, [r2, #200] @, accel_calibration

As you can see, the compiler is using registers both above and below
the ones that it decided to save.

As Måns replied earlier, there seems to be nothing wrong in this particular
case.

I have seen some reports of registers being corrupted after neon code
is executed. I suspect that this is the reason.

Maybe these reports originate from people who suffered from this bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42321

Is anyone aware of a fix or work around for this apparent bug?

If using inline assembly, don't specify 'q' registers in the clobber list (use
equivalent 'd' register pairs). Also if using registers from d8-d15 range in
inline assembly clobber list, select a 'contiguous' set of them and don't
leave 'gaps'.

bob.feretich · December 8, 2009, 7:20pm

The Neon intrinsics are not the same as in-line assembly.
They are coded in the form of "C" functions using "C" variable names.
The compiler does the register use assignments. If the compiler
chooses to use a register, shouldn't the compiler save it first? The
calling routine could also be using that register for a different
purpose.

Regards,
Bob

mansr · December 8, 2009, 7:25pm

bob.feretich@prodigy.net writes:

The Neon intrinsics are not the same as in-line assembly.
They are coded in the form of "C" functions using "C" variable names.
The compiler does the register use assignments. If the compiler
chooses to use a register, shouldn't the compiler save it first? The
calling routine could also be using that register for a different
purpose.

Only if it's in the d8-d15 range. Other registers will have been
saved by the caller if it cares.

Siarhei_Siamashka · December 8, 2009, 7:47pm

[...]

If the compiler chooses to use a register, shouldn't the compiler save it
first? The calling routine could also be using that register for a different
purpose.

There is some difference between "callee-saved" and "caller-saved" registers.
You can google for "AAPCS" and download pdf file from the very first link. It
has all the details you need.

bob.feretich · December 8, 2009, 8:08pm

You are correct. The "Procedure Call Standard for the ARM
Architecture" states...

"Registers s16-s31 (d8-d15, q4-q7) must be preserved across subroutine
calls; registers s0-s15 (d0-d7, q0-q3) do not need to be preserved
(and can be used for passing arguments or returning results in
standard procedure-call variants). Registers d16-d31 (q8-q15), if
present, do not need to be preserved."

The upper registers must be saved by the caller. In my test case, the
caller didn't use the upper registers, so everything is probably
working correctly.
Thanks.

Regards.
Bob