porting from x86 to omap3 (arm)

All,

I was wondering if anyone had suggestions as to what to look out for
when porting from a Linux application written in C for x86 to arm in the
omap3.

I have found several sites about issues dealing with porting to an arm
processor in general but wasn't sure how many still apply to the arm
that the arm cortex A8.

I have found some information about unsigned char vs signed char that
does seem to still be the case.

I have also read a lot about alignment issues that I haven't really seen
in my tests.

Any ideas/suggestions/pointers are most welcome.

Ben Anderson

Ben Anderson wrote:

All,

I was wondering if anyone had suggestions as to what to look out for
when porting from a Linux application written in C for x86 to arm in the
omap3.

I have found several sites about issues dealing with porting to an arm
processor in general but wasn't sure how many still apply to the arm
that the arm cortex A8.

I have found some information about unsigned char vs signed char that
does seem to still be the case.

I have also read a lot about alignment issues that I haven't really seen
in my tests.

Those two are the big ones. Unless you are dealing with hardware i/o, then you
probably don't need to worry about anything else.

If you aren't seeing alignment issues, then you must be writing some pretty
decent C code. In particular, anytime you are casting pointers you are probably
asking for trouble. x86 is very forgiving, ARM isn't.

Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
unspecified chars be signed or unsigned, it's just that the defaults for ARM are
opposite to x86. Again, writing portable C code is the way to avoid this
problem. Dan Saks has some columns on embedded.com about this topic.

b.g.

Bill Gatliff wrote:

Ben Anderson wrote:

All,

I was wondering if anyone had suggestions as to what to look out for
when porting from a Linux application written in C for x86 to arm in the
omap3.

I have found several sites about issues dealing with porting to an arm
processor in general but wasn't sure how many still apply to the arm
that the arm cortex A8.

I have found some information about unsigned char vs signed char that
does seem to still be the case.

I have also read a lot about alignment issues that I haven't really seen
in my tests.

Those two are the big ones. Unless you are dealing with hardware i/o, then you
probably don't need to worry about anything else.

If you aren't seeing alignment issues, then you must be writing some pretty
decent C code. In particular, anytime you are casting pointers you are probably
asking for trouble. x86 is very forgiving, ARM isn't.

Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
unspecified chars be signed or unsigned, it's just that the defaults for ARM are
opposite to x86. Again, writing portable C code is the way to avoid this
problem. Dan Saks has some columns on embedded.com about this topic.

Forgot one more: endianness.

Any time you do anything that assumes a certain byte-ordering for multi-byte
values, the differences between ARM and x86 will trip you up. This includes
casting integer pointers to character pointers and then dereferencing the char
pointer via 's, using unions to pick apart bytes of an integer, memcpy'ing
structures to storage on one machine, and then reading them back on another machine.

The << and >> operators work properly, because the C language specification is
clear as to how they are supposed to work. If you look at the assembly language
and memory representations of values as you shift them, however, you'll see they
behave differently in ARM vs. x86.

As with the others, portable C code is your friend here too.

b.g.

You're in luck. Cortex-A8 allows unaligned access, according to the
Cortex-A8 Technical Reference at www.arm.com. Earlier ARM processors
did not allow unaligned access. Unaligned accesses usually cause a
loss of performance, so it's best to avoid them.

You're also in luck regarding little versus big-endian byte
numbering. ARM is an "either-endian" architecture, but specific
implementations usually hard-wire one or the other. As far as I can
tell, OMAP uses little-endian like the x86. (Or else it's either-
endian and software sets it to little-endian by default.) Little
versus big-endian would be an issue if you wanted to port to a big-
endian architecture like PowerPC.

As Bill mentioned, signed versus unsigned char is a compiler option
and always an issue when porting code from one compiler to another,
even on the same architecture. Other gotchas along these lines
include how structures are packed, and what byte size is used to
implement enums.

Here's another subtle one: multi-byte character constants.

Modern C compilers let you define a multi-byte character constant like
'ab', which results in a two-byte short value. However, which
chararcter ends up in which byte was not well specified so it depends
on the compiler. Borland C and GCC do it differently even if both are
compiling for x86.

You're in luck. Cortex-A8 allows unaligned access, according to the
Cortex-A8 Technical Reference atwww.arm.com. Earlier ARM processors
did not allow unaligned access. Unaligned accesses usually cause a
loss of performance, so it's best to avoid them.

You're also in luck regarding little versus big-endian byte
numbering. ARM is an "either-endian" architecture, but specific
implementations usually hard-wire one or the other. As far as I can
tell, OMAP uses little-endian like the x86. (Or else it's either-
endian and software sets it to little-endian by default.) Little
versus big-endian would be an issue if you wanted to port to a big-
endian architecture like PowerPC.

You can still run FPA code on the cortex which is mixed-endian. Why
one would do that, I don't know :slight_smile:

regards,

Koen

Koen Kooi <koen.kooi@gmail.com> writes:

You're in luck. Cortex-A8 allows unaligned access, according to the
Cortex-A8 Technical Reference atwww.arm.com. Earlier ARM processors
did not allow unaligned access. Unaligned accesses usually cause a
loss of performance, so it's best to avoid them.

Unaligned accesses cost one cycle extra on Cortex-A8. This is much
less than manually shifting the bytes into place. Writing the code
properly avoids most unaligned data, but sometimes it's unavoidable,
for instance in networking code.

You're also in luck regarding little versus big-endian byte
numbering. ARM is an "either-endian" architecture, but specific
implementations usually hard-wire one or the other. As far as I can
tell, OMAP uses little-endian like the x86. (Or else it's either-
endian and software sets it to little-endian by default.) Little
versus big-endian would be an issue if you wanted to port to a big-
endian architecture like PowerPC.

You can still run FPA code on the cortex which is mixed-endian. Why
one would do that, I don't know :slight_smile:

You can't run FPA code on a Cortex CPU. It doesn't have an FPA unit.
You probably meant soft-fpa.

Ben Anderson wrote:
> All,
>
> I was wondering if anyone had suggestions as to what to look out for
> when porting from a Linux application written in C for x86 to arm in the
> omap3.
>
> I have found several sites about issues dealing with porting to an arm
> processor in general but wasn't sure how many still apply to the arm
> that the arm cortex A8.
>
> I have found some information about unsigned char vs signed char that
> does seem to still be the case.
>
> I have also read a lot about alignment issues that I haven't really seen
> in my tests.

Those two are the big ones. Unless you are dealing with hardware i/o, then you
probably don't need to worry about anything else.

If you aren't seeing alignment issues, then you must be writing some pretty
decent C code. In particular, anytime you are casting pointers you are probably
asking for trouble. x86 is very forgiving, ARM isn't.

What is it about casting of pointers that is bad? Is it de-referencing
pointers to un-aligned data elements?

I don't mean to say that casting of any type in general is good. I am
just trying to get a firm grasp on what the issues are.

Signed vs. unsigned is a compiler issue. You can throw switches in gcc to make
unspecified chars be signed or unsigned, it's just that the defaults for ARM are
opposite to x86. Again, writing portable C code is the way to avoid this
problem. Dan Saks has some columns on embedded.com about this topic.

b.g.

I looked through some of Dan Saks articles and in one he does mention
caution against casting of pointers but no real details.

I still having hard time finding this info via google. So if anyone
knows of some site that goes through the alignment/pointer issue with
possibly some examples let me know. It would be much appreciated.

Thanks all of you for your input!

Ben Anderson

I still having hard time finding this info via google. So if anyone
knows of some site that goes through the alignment/pointer issue with
possibly some examples let me know. It would be much appreciated.

Wikipedia to the rescue: http://en.wikipedia.org/wiki/Data_structure_alignment

Which points to this nifty page: http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

For tech stuff, I always start with Wikipedia.

Actually most of the "fun" is in cross-compiling setup.
If you put a sata based disk or network mount a nice shared drive,
there no reason you cannot build a native compiler. (Cross tool
supports it)
So then you compile on the beagle itself. Most of the cross compile
issues go away.
Of course if you something like arm-debian, most packages are in
binary anyway.
(Including the compiler).

It may be slower than your quad core, but not that much.

Also for me I like working in ruby, for gui development. Which the
omap can
run easily. And once ruby is compiled, the rest is interpreted.

Isn’t the NEON coprocessor is kinda like an FPA?

Albert Nguyen wrote:

Isn't the NEON coprocessor is kinda like an FPA?

As I've tinkered with it, NEON provides major acceleration for media-intensive
activities. But for more general-purpose floating-point operations, it doesn't
outperform a true FPA.

So is that a yes, or a no? It depends. :slight_smile:

b.g.

Ben Anderson wrote:

What is it about casting of pointers that is bad? Is it de-referencing
pointers to un-aligned data elements?

Correct.

Generally speaking, if you have to cast a pointer then you must have lied to the
compiler at some point about what the object in question actually is. That's
almost always a bad idea, particularly so when writing portable code. Better to
clearly convey to the compiler what's going on, and let it and the rules of C
work to your advantage.

I don't mean to say that casting of any type in general is good. I am
just trying to get a firm grasp on what the issues are.

If you cast a char* to an int*, then you risk problems on machines where the
alignment restrictions are different for the two data types. x86 doesn't care,
so if your char wasn't word-aligned then when you dereference the casted pointer
to int, nothing bad happens. On ARM, however, you get an exception (*).

* - except on some of the newest ARM cores, apparently. I avoid the problem so
that I don't have to care which core I'm running on!

These kinds of problems can be nasty to test for, because they seem to travel
along with dynamically-allocated data structures, buffers, etc. that are very
sensitive to system state. So you might make several passes over the same code
successfully before the !kaboom! happens. Better to prove the code right by
inspection beforehand, which is only possible if you follow C's rules carefully.

I looked through some of Dan Saks articles and in one he does mention
caution against casting of pointers but no real details.

I still having hard time finding this info via google. So if anyone
knows of some site that goes through the alignment/pointer issue with
possibly some examples let me know. It would be much appreciated.

Just don't cast pointers, and you should be fine. Here's a bad one I see from
time to time:

int i;
char *ibuf = (char*)&i;
char a, b, c, d;

/* break apart i into its four bytes */
a = ibuf[0];
b = ibuf[1];
c = ibuf[2];
d = ibuf[3];

The values of a, b, c, and d will be different on x86 vs. ARM. Ditto if you go
the opposite way:

char cbuf[sizeof(int)];
int i;

i = cbuf[0];
i = (i << 8) + cbuf[1];
i = (i << 8) + cbuf[2];
i = (i << 8) + cbuf[3];

This code is actually portable if you load up cbuf the right way each time,
regardless of the endianness of the machine (which can be tricky). BUT, you
will still get different values for i if char's are signed on one machine but
unsigned on another. BUT BUT, you won't see the problem until the
most-significant bit of a byte in cbuf is set.

The two above examples aren't casts per-se, but they are definitely
representation transformations of the same type that casts cause. So I lump
them together.

Just watch out for stuff like that. You tend to know when you're in risky
territory, because the code starts to look very much like the above.

b.g.

Bill Gatliff wrote:

BUT, you will still get different values for i if char's are signed on one machine but
unsigned on another. BUT BUT, you won't see the problem until the
most-significant bit of a byte in cbuf is set.

Forgot to mention: fix this by making cbuf an _unsigned_ char in all cases.

b.g.

I would be surprised to see single precision code running faster on an
FPA unit than on a NEON unit at the same frequency. For double
precision, well NEON doesn't support it.

Laurent

Albert Nguyen wrote:

Indeed.. and of course the Cortex-A8 also has VFPv3. VFP is the
successor to ARM's ancient FPA coprocessor, and can do single and double
precision operations. For general floating point you can use a mixture
of VFPv3 and NEON instructions, since these use the same registers.

Torne Wuff wrote:

>
> Albert Nguyen wrote:
>> Isn't the NEON coprocessor is kinda like an FPA?
>
> As I've tinkered with it, NEON provides major acceleration for
> media-intensive activities. But for more general-purpose floating-point
> operations, it doesn't outperform a true FPA.
>
> So is that a yes, or a no? It depends. :slight_smile:

I would be surprised to see single precision code running faster on an
FPA unit than on a NEON unit at the same frequency. For double
precision, well NEON doesn't support it.

Indeed.. and of course the Cortex-A8 also has VFPv3. VFP is the
successor to ARM's ancient FPA coprocessor, and can do single and double
precision operations. For general floating point you can use a mixture
of VFPv3 and NEON instructions, since these use the same registers.

The Cortex-A8 has a VFPlite unit, which implements the full VFPv3
instruction set, but it is not pipelined, meaning floating-point-heavy
code runs rather slowly, but thanks to the instruction FIFO shouldn't
impact the execution speed too much when the bulk of the code run in the
ARM pipeline.

When in "runfast" mode, single-precision VFP instructions execute in the
NEON pipeline. If using gcc, adding the flags -ffast-math -fno-math-errno,
and avoiding double precision is advisable, assuming results are still
accurate enough.

Hi Bill,

Here's a bad one I see from
time to time:

int i;
char *ibuf = (char*)&i;
char a, b, c, d;

/* break apart i into its four bytes */
a = ibuf[0];
b = ibuf[1];
c = ibuf[2];
d = ibuf[3];

The values of a, b, c, and d will be different on x86 vs. ARM.

Apologies if this is a stupid question, but just to clarify: Do you
mean a, b, c, d will be different in their signedness (though, only
different if you say something like anInt = (int)a and would therefore
see their sign)? It looks instead like your example is talking about
endianness, and I'm having trouble seeing how that could be the case
going from one LE machine to another LE machine. (Unless GCC ARM does
something freaky I don't know about...) :slight_smile:

Cheers,

Matt

The issue I am having is the an application developer is casting a
(unsigned char *) to a double * then de-referencing it into a double.
Of course this has historically worked fine in x86 but is hanging under
the cortex A8.

Here is a sample of what the application does.