Where is a manual for GCC asm coding (syntax etc.)?

RJ_Wang · June 14, 2015, 4:49am

Hi,
I want to write ARM asm code with GCC toolchain. Previously, I use TI CGT, whose asm syntax is different from
GCC. For example, the last line has an error with GCC toolchain (bad instruction type).

.global asmfunc
.global gvar
asmfunc:
LDR r1, gvar_a
LDR r2, [r1, #0]
ADD r0, r0, r2
STR r0, [r1, #0]
MOV pc, lr
gvar_a .field gvar, 32

Could you point me to a tutorial, manual etc. on the syntax about writing ARM asm code for GCC?

Thanks,

zmatt · June 14, 2015, 10:45am

I want to write ARM asm code with GCC toolchain. Previously, I use TI CGT, whose asm syntax is different from GCC.

The assembler is actually part of “Binutils”, not of GCC itself. You can find its manual at: Top (Using as)

For example, the last line has an error with GCC toolchain (bad instruction type).

gvar_a .field gvar, 32

The main issue is the lack of colon after the label gvar_a. This makes the assembler think it is some (unknown) instruction. Also, I’ve never seen the “.field” directive. You probably want “.long gvar” here.

An even easier solution is using a literal load, i.e. “ldr r1, =gvar”.

Also, I should note that “mov pc, lr” is a deprecated way to return from a function, “bx lr” should be used instead.

One way to see how to write “proper” assembler is by having GCC produce some for you (although you may need to weed through some redundant directives and unreadable randomly-generated labels). For example, if I understand your intention correctly, your code aims to do the equivalent of:

extern int gvar;
void asmfunc( int arg ) {
        gvar += arg;
}

If I put that in a file “foo.c” and compile it with “arm-linux-gnueabihf-gcc -Og -S -o- foo.c” then several things can be noticed about the output:

the “.syntax unified” directive, to select UAL syntax (which is the syntax currently used by the ARM Architecture Reference Manual, so selecting it is highly recommended)
some declarations of the target architecture, with many finer points defined in rather poorly readable “eabi attributes”
the use of Thumb mode by default, which I’d agree with: since ARMv7 there’s really little reason not to.
the use of movw+movt to produce the address of gvar, instead of loading it from some location.

It is interesting to note that if I compile with “clang -target arm-linux-gnueabihf -O -S -o- foo.c” then it will generate code essentially identical to yours, however the directives at the top show it is being conservative and targeting arm1136jf-s. If I add “-march=armv7-a” or “-mcpu=cortex-a8” to the commandline options then clang will also use movw+movt, so apparently this really is preferred (and considering the effects on separate L1 instruction and data caches, I can imagine why). Also, clang conveniently includes some comments on what all those eabi-attributes mean.

BTW, you haven’t mentioned what your motivation is to write in assembly, but note that GCC has powerful functionality for including inline assembly into C/C++ source code. For example, if you want to use the “rbit” instruction (for which no intrinsic is available I think) you can easily wrap it in a function:

__attribute__((const))
static inline unsigned bitreverse( unsigned x ) {
        unsigned result;
        asm( "rbit %0, %1" : "=r"( result ) : "r"( x ) );
        return result;
}

(The “const” attribute indicates that the function is free of side-effects and doesn’t depend on global memory, giving the optimizer a lot of liberty to move the instruction around.)

RJ_Wang · June 14, 2015, 3:59pm

Thanks Matthijs. You give me so much helpful information. I know generally C compiler can do an excellent work. Only in special
case, manual assembly code is necessary. Anyhow, I enjoy such work when it is necessary. I have done asm coding on one DSP
core, one Synopsis ARC600 core. Now I feel that ARM processor is still very different from those. I am especially interested in digital
signal processing project. Do you know any small project which can be a good exercise to grasp ARM asm coding? For me, it is
import to know the goal, then try to get that goal by coding.

Second, I used SIMD on other cores in the past. When I compile a project having for loop, I do not see the generated ARM NEON
assembly code in the disassembly window. I have used options:

“-march=armv7-a -mtune=cortex-a8 -mfpu=neon -ftree-vectorize -ffast-math -mfloat-abi=softfp”

Thanks,

zmatt · June 14, 2015, 5:57pm

I know generally C compiler can do an excellent work.

Well, to be honest I've also seen both GCC and Clang produce pretty idiotic
output quite often enough. However, they do save you from the large amount
of boilerplate you'd need for example to produce code that supports proper
debugging and/or stack unwinding (see compiler output when compiling with
-g and/or -fexceptions). Also, manual instruction scheduling is tedious,
and a processor like the cortex-a8 can be quite sensitive to it.

I have done asm coding on one DSP core, one Synopsis ARC600 core. Now I

feel that ARM processor is still very different from those.

I am unfamiliar with that particular architecture. The only DSP I have
experience with is TI's C674x which I do like, but it is pretty weird in
comparison to a "regular" CPU architecture. ARM is a fairly clean and
standard RISC architecture though, although it accumulated more cruft over
time (as any architecture does).

Do you know any small project which can be a good exercise to grasp ARM asm

coding?

I'm sorry, I don't really know how to answer that.

Second, I used SIMD on other cores in the past. When I compile a project
having for loop, I do not see the generated ARM NEON
assembly code in the disassembly window.

I get clearly vectorized output if I compile some simple dst += a * src
loop with: -mcpu=cortex-a8 -mfpu=neon -Ofast
(-Ofast is basically short for -O3 -ffast-math). I don't understand why
you're specifying a softfp ABI: hardfloat is the standard on modern ARM
targets.

I've tested with both gcc 4.9.3 (linaro) and 5.1.1 (debian), and it works
for me both for floats and for int16. The vectorized kernel loop is usually
buried among a mass of code dealing with "leftovers" and/or misaligned
cases: much of it can be removed using things like alignment attributes,
the "restrict" qualifier, making sure the loop count is always a multiple
of some nice power of two, etc.

Still, the output doesn't look particularly good to me. Uhh, in fact the
output looks pretty invalid to me: it's putting :64 alignment specifiers on
vector loads, yet those addresses only increment by 16 each loop
iteration...

I would say GCC's auto-vectorization still looks like work-in-progress to
me.

RJ_Wang · June 15, 2015, 4:52am

I don’t know how to get assembly code list file from compiling. I think the cross compiler is ‘arm-linux-gnueabihf-gcc-4.7.3’. This is the first time I use
this compiler and have not found its switches.

Thanks,

RJ_Wang · June 15, 2015, 5:12am

Here is the help message of the compiler. I don’t see which setting can generate asm list file, although the binutils (another source) has a switch

Command-Line Options

This chapter describes command-line options available in all versions of the gnu assembler; see Machine Dependencies, for options specific to particular machine architectures.

If you are invoking via the gnu C compiler, you can use the ' option to pass arguments through to the assembler. The assembler arguments must be separated from each other (and the ') by commas. For example:

     gcc -c -g -O -Wa,-alh,-L file.c

zmatt · June 15, 2015, 4:41pm

I don't know how to get assembly code list file from compiling.

I actually used it in my reply to your original post:

[..] compile it with "arm-linux-gnueabihf-gcc -Og -S -o- foo.c"

The options I use here:
-Og a very mild optimization level (milder even than -O1), most likely
to produce very straightforward assembly output. (Introduced in gcc 4.8)
-S produce assembly output instead of object code
-o - send output to stdout (you can send it to a file instead of course)

I do suggest you upgrade your rather dated 4.7.x compiler to a more recent
one, either 4.9.x or 5.1.x. Debian "stretch" (testing) and "sid" include a
"gcc-5-arm-linux-gnueabihf" package.

The distro-independent Linaro toolchains are a useful alternative. They're
good toolchains for various ARM targets, both linux (64-bit) and windows
(32-bit) hosts, and completely standalone (just unpack the archive anywhere
and put its "bin" subdir in your PATH).

They are a bit annoying to locate in their download area however. I don't
know of a better way to find the most recent version than checking the
directory for each month to see which one contains a
"components/toolchain/binaries/arm-linux-gnueabihf" subdir. Their most
recent version currently seems to be the 2015.02 release
<http://releases.linaro.org/15.02/components/toolchain/binaries/arm-linux-gnueabihf>
(you
only need the gcc download).

Finally, although cross-compiling is faster you can of course also just
compile on the beaglebone itself. The native gcc is 4.9.2 in debian
"jessie" (stable) and gcc-5 is available in "stretch".

William_Hermans · June 15, 2015, 5:29pm

Finally, although cross-compiling is faster you can of course also just compile on the beaglebone itself. The native gcc is 4.9.2 in debian “jessie” (stable) and gcc-5 is available in “stretch”.

I concur. Stuff like compiling the kernel would be all but out of the question. Wireshark, Qt, and largish projects like this too. Something like Nodejs however, I would consider passable( Only takes about an hour to compile natively ).

But for like 95%+ of user written / compiled executables. Natively is more than good enough. For example, I have a CANBus app I’m writing. Compiling on a 4 core 4GB virtual machine compiles the executable instantly. On the BBB it takes a second to compile. Granted the code base is rather small at the moment. Less than 500 lines of C( so far ).

I would however recommend using a working directory that is on non flash type media. Such as a NFS or USB hard drive mount.

zmatt · June 19, 2015, 2:37am

Never mind that, alignment specs in Neon load/store instructions are in
bits, not in bytes... *slaps forehead*