Working with NEON at X-loader level

I'm modifying x-loader to work with NEON, and in the process got a the
undefined instruction exception when trying to use NEON instructions;
I've enabled support in the c1, Coprorocessor Access Control Register
for cp11 and cp10 using the "mcr p15, etc..." instruction.

However, i still get the exception despite doing the above - I'm using
arm-2008q1 compiler with the "-mcpu=cortex-a8 -mfpu=neon -mfloat-
abi=softfp -march=armv7-a" options. A corresponding disassembly of the
binary shows that the NEON instructions remain intact.

Any ideas are absolutely appreciated!

Thanks a ton,

Jerry

What value did you use for that mcr p15 0,<rd>,c1,c0,2 instruction?
rd should contain 0x00f00000.

Note the mcr has to be followed by an imb before using a NEON
or VFP instruction.

Ref: http://www.phoronix.com/scan.php?page=article&item=fedora_r600_3d&num=1

Laurent

Jerry Johns <jerry.johns@gmail.com> writes:

I'm modifying x-loader to work with NEON,

Why? X-loader runs for a split second during bootup, so its
performance is largely irrelevant. Furthermore, I strongly doubt
there is any performance advantage to using NEON there.

I'm trying to optimize the speed loading of various images from NAND
flash (uImage, fs, etc.) and unfort, the existing memcpy is absolutely
horrendous (byte wide, no alignment checks, etc)
I've got my hands on an optimized memcpy supported for NEON, and
(http://sourceware.org/ml/libc-ports/2009-07/msg00000.html) and would
like to get it working within x-loader to speed up accesses. I realize
NAND flash is slow enough that this won't be a gain, but i'm intending
to use x-loader for future uses down the road

Laurent, I've done mcr p15, 0, <rd>, c1, c0, 2 with a 0xf00000 and it
does not make any difference. I've also looked at the disassem to
ensure that i'm not getting register trampling.
Regarding instruction memory barriers, would an "isb" instruction take
care of that? I notice that's what they do in the Linux kernel (see
arch/arm/include/asm/system.h, set_copro_access)

All of these are still to no avail - i notice in the vfp code in the
kernel, that they handle the invalid instruction exception...do i have
to do that as well?

Jerry

Does it also matter that when i modify the Copro Access Control reg,
that i'm in insecure, privilleged mode? The Cortex TRM seems to
insinuate that this only works in the privilleged mode, or if the
corresponding nonsecure access control reg is set properly (which is
sort of like the chicken and the egg problem..since i can't modify
that reg unless i'm in secure mode)

The Linux kernel just goes on right ahead and modifies the Copro
Access Control without worrying about this Secure mode stuff - I've
checked, and verified that it is also in in-secure mode as well (read-
write-read gives me zeros)

Jerry

Mans? Laurent? any ideas guys?

Not all that familiar with NEON yet but the floating point co-processor
in ARM 11 required specifying fast mode instead of full ieee mode if you
did not want to handle undefined exceptions.