code reorder of x-loader using arm-none-linux-gnueabi-gcc

Hi everyone,

I am the developer of qemu-omap3(http://code.google.com/p/qemu-omap3/)
and I am doing MMC card emulation right now.

I meet a strange code reorder problem while compiling x-loader for
qemu-omap3.
x-loader souce code is downloaded from
http://beagleboard.googlecode.com/files/u-boot_beagle_revb.tar.gz.

The arm-none-linux-gnueabi-gcc I use is gcc version 4.2.1
(CodeSourcery Sourcery G++ Lite 2007q3-51).

In function mmc_send_cmd(cpu/omap3/mmc.c).

145 OMAP_HSMMC_BLK = BLEN_512BYTESLEN | NBLK_STPCNT;
146 OMAP_HSMMC_STAT = 0xFFFFFFFF;
147 OMAP_HSMMC_ARG = arg;
148 OMAP_HSMMC_CMD = cmd | CMD_TYPE_NORMAL | CICE_NOCHECK |
149 CCCE_NOCHECK | MSBS_SGLEBLK | ACEN_DISABLE |
BCE_DISABLE |
150 DE_DISABLE;

Thant means CPU write to MMC register follow the sequence of
OMAP_HSMMC_BLK -> OMAP_HSMMC_STAT -> OMAP_HSMMC_ARG -> OMAP_HSMMC_CMD.

But the compiler reordes the register access sequence.
r2= 0x4809c000
40201d78: e3a03c02 mov r3, #512 ; 0x200
40201d7c: e5821108 str r1, [r2, #264] <-0x108 OMAP_HSMMC_ARG
40201d80: e5823104 str r3, [r2, #260] <-0x104 OMAP_HSMMC_BLK
40201d84: e582010c str r0, [r2, #268] <-0x10C OMAP_HSMMC_CMD
40201d88: e3e03000 mvn r3, #0 ; 0x0
40201d8c: e5823130 str r3, [r2, #304] <-0x130 OMAP_HSMMC_STAT

It will confuse the MMC emulation.

Can anyone tell me why compiler reorde the register access sequence?
How to stop it?

The information in http://elinux.org/BeagleBoard#Cortex_A8_ARM
Note: CodeSourcery 2007q3 has an issue with -Os option. There is a fix
(needs recompilation)

Is it about -Os option?

Thanks.

Try this:

CFLAGS_KERNEL = -fno-schedule-insns -fno-schedule-insns2

Regards,

From: beagleboard@googlegroups.com
[mailto:beagleboard@googlegroups.com] On Behalf Of yajin
Sent: Friday, December 19, 2008 9:42 PM
To: Beagle Board
Cc: yajin@vm-kernel.org
Subject: [beagleboard] code reorder of x-loader using

arm-none-linux-gnueabi-gcc

A C compiler is free to reorder operations as long as the semantics are the same. In this case, the compiler believes that the semantics are the same since the memory operations are being done to different locations. As to why the compiler chose to do so in this case, it is likely because without knowledge of the temporal side-effects, the compiler chose to hide instruction latencies. If there were no side-effects, the immediate move into r3 will take a few cycles due to the Cortex-A8 pipeline. The next instruction specified, the write to OMAP_HSMMC_BLK, has a data dependency on r3 and so must wait until the immediate move has finished before it can be executed. Rather than have a bubble in the pipeline, the compiler has placed what it believes to be an independent operation in that slot (the move of r1 to OMAP_HSMMC_BLK).

Now, how to fix it. You can attempt to play with the optimizer (-O0) or with compiler flags to turn of instruction scheduling, but those are hackish and are not guaranteed to resolve the issue on anything other than the exact toolchain you are using. A way to do this so that the compiler knows better is to do these few operations as volatile inline functions or asm blocks. Essentially, by breaking up each line into a separate inline function that is marked volatile or a single asm block that is marked volatile, the compiler is informed that the calls to the functions or the position of the asm block _cannot_ be reordered. I personally prefer using an inline volatile asm block since the entire operation can be done in a single block.

This is why using structures or suchlike to access IO registers is
"teh l0ss". You are probably better off defining a set of IO accessor
functions (they can be inline) with the proper compiler code to ensure
that the compiler will issue the instructions in the right order (this
may require a write-memory-barrier to be issued).

Hi,

As far as I know it should be enough to simple declare the registers
OMAP_HSMMC_* as volatile in the file:
\include\asm\arch-omap3\mmc_host_def.h

Currently the definitions are like follows, which clearly doesn't put any
constraints on the addresses:
#define OMAP_HSMMC_CMD (*(unsigned int *) 0x4809C10C)

Please change it to:
#define OMAP_HSMMC_CMD (*(volatile unsigned int *)
0x4809C10C)

In case this doesn't solve the problem it basically suggests, that all
low-level-register access needs to be done in ASM in order to really control
the order of execution on a Cortex A8. I have never heard, that this should
be required and I really can't/won't believe this, and therefore bet, that
just putting a volatile will solve the problem. For sure it may/will fail
with no volatile included in the definition...

As far as I know the C-standard, the compiler isn't allowed to change the
execution order of two statements, both involving the "volatile"-statement,
regardless of optimization setting...

Best regards
  Søren

-----Oprindelig meddelelse-----
vegne af Ben Dooks

Hi,

I think Søren has made a good point on this.
If there are no strong reasons, just keep away from machine code.
To me the only reason could be that the compiler is not at all aware of instruction sheduling and execution scheme of the target cpu (better don’t use this compiler) or the fact that the target routine is very excessively executed,- so it really makes sense to do some hand assembly for the sake of runtime optimization.
If the routine is only called once in a while (comparing to execution speed) it will give no profit but instead compromise portability…
regards,
AdT

Søren Steen Christensen schrieb:

Hi everyone,

I am the developer of qemu-omap3(Google Code Archive - Long-term storage for Google Code Project Hosting.)
and I am doing MMC card emulation right now.

I meet a strange code reorder problem while compiling x-loader for
qemu-omap3.
x-loader souce code is downloaded from
http://beagleboard.googlecode.com/files/u-boot_beagle_revb.tar.gz.

Don't use that code, use http://www.sakoman.net/cgi-bin/gitweb.cgi?p=x-load-omap3.git;a=shortlog instead. Same goes for u-boot.

The arm-none-linux-gnueabi-gcc I use is gcc version 4.2.1
(CodeSourcery Sourcery G++ Lite 2007q3-51).

In function mmc_send_cmd(cpu/omap3/mmc.c).

145 OMAP_HSMMC_BLK = BLEN_512BYTESLEN | NBLK_STPCNT;
146 OMAP_HSMMC_STAT = 0xFFFFFFFF;
147 OMAP_HSMMC_ARG = arg;
148 OMAP_HSMMC_CMD = cmd | CMD_TYPE_NORMAL | CICE_NOCHECK |
149 CCCE_NOCHECK | MSBS_SGLEBLK | ACEN_DISABLE |
BCE_DISABLE |
150 DE_DISABLE;

Thant means CPU write to MMC register follow the sequence of
OMAP_HSMMC_BLK -> OMAP_HSMMC_STAT -> OMAP_HSMMC_ARG -> OMAP_HSMMC_CMD.

But the compiler reordes the register access sequence.
r2= 0x4809c000
40201d78: e3a03c02 mov r3, #512 ; 0x200
40201d7c: e5821108 str r1, [r2, #264] <-0x108 OMAP_HSMMC_ARG
40201d80: e5823104 str r3, [r2, #260] <-0x104 OMAP_HSMMC_BLK
40201d84: e582010c str r0, [r2, #268] <-0x10C OMAP_HSMMC_CMD
40201d88: e3e03000 mvn r3, #0 ; 0x0
40201d8c: e5823130 str r3, [r2, #304] <-0x130 OMAP_HSMMC_STAT

It will confuse the MMC emulation.

Can anyone tell me why compiler reorde the register access sequence?
How to stop it?

http://www.sakoman.net/cgi-bin/gitweb.cgi?p=x-load-omap3.git;a=commitdiff;h=f87ad1d2229115444ee0d754dfab7393f95942df

regards,

Koen

Please fix the problem the proper way (using volatile in the definitions) and not by using the patch/’hack’ suggested below by Koen…

– I really think it’s ugly introducing a new volatile variable in order to work around the definition-error in the h-file.

Furthermore the patch doesn’t fix the problem – Just hides it. If i.e. a new revision of GCC would like to switch the order of the lines:

OMAP_HSMMC_STAT = OMAP_HSMMC_STAT;

OMAP_HSMMC_CON &= ~INIT_INITSTREAM;

, at the bottom of the patch, it’s still free to do it as long these aren’t declared volatile as well. This might therefore still cause other unforeseen issues…

(I haven’t checked if the particular order of these two lines are really important – The two lines are only meant as an example)

Therefore please change to use ‘volatile’ for each and every register access – It’s the only way to ensure the HW will work properly.

Furthermore you of cause have to mark the register-address-region(s) as non-cacheable and non-bufferable in the MMU configuration, but this is a complete other story J I think you safely can assume that this is set right – In case it isn’t everything would work ‘strange’ J

Best regards – Enjoy X-mas everybody

Søren

PS: I totally agree with Koen, that you should use the GIT repository instead of the old tar.gz file.

Only problem is, that the definition still doesn’t contain ‘volatile’ in the GIT tree either J

Hi all,

Thanks for of all of your kind detailed explanation and suggestion.

I wonder why the x-loader code on beagle board google code can execute rightly on the real hardware!

Write 0xffffffff to OMAP_HSMMC_STAT after OMAP_HSMMC_CMD register will clear all the status bit in register OMAP_HSMMC_STAT, including command complete bit(bit 0). And x-loader will be do infinate loop in checking command complete bit(bit 0) of OMAP_HSMMC_STAT to see whether last command has been completed…

PS: I remove -Os from OPTFLAGS and gcc and generate stupid but right code now.

发件人: S鴕en_Steen_Christensen
发送时间: 2008-12-20 19:38:03
收件人: beagleboard@googlegroups.com
抄送:
主题: [beagleboard] Re: code reorder of x-loader usingarm-none-linux-gnueabi-gcc

Please fix the problem the proper way (using volatile in the definitions) and not by using the patch/’hack’ suggested below by Koen…

– I really think it’s ugly introducing a new volatile variable in order to work around the definition-error in the h-file.

Furthermore the patch doesn’t fix the problem – Just hides it. If i.e. a new revision of GCC would like to switch the order of the lines:

OMAP_HSMMC_STAT = OMAP_HSMMC_STAT;

OMAP_HSMMC_CON &= ~INIT_INITSTREAM;

, at the bottom of the patch, it’s still free to do it as long these aren’t declared volatile as well. This might therefore still cause other unforeseen issues…

(I haven’t checked if the particular order of these two lines are really important – The two lines are only meant as an example)

Therefore please change to use ‘volatile’ for each and every register access – It’s the only way to ensure the HW will work properly.

Furthermore you of cause have to mark the register-address-region(s) as non-cacheable and non-bufferable in the MMU configuration, but this is a complete other story J I think you safely can assume that this is set right – In case it isn’t everything would work ‘strange’ J

Best regards – Enjoy X-mas everybody

Søren

PS: I totally agree with Koen, that you should use the GIT repository instead of the old tar.gz file.

Only problem is, that the definition still doesn’t contain ‘volatile’ in the GIT tree either J

Without having investigated the issue in detail, I would expect, that the reason it works is, that the write to the OMAP_HSMMC_STAT happens before the command complete bit is being set by the HW. In order for the HW to set the command complete bit, it needs to do several clock cycles on the MMC clk (Maximum 52 MHz AFAIR), which is much slower than the ARM core running at 500-600MHz.

Although it’s clearly a sequential mistake, it seems to be saved by the dealys introduced in the real HW…

I hope the above explanation was clear/understandable?

Søren

I'd need to consult the C standard, but I'm fairly certain that marking a pointer as volatile only means that the value that pointer points to cannot be cached in a register. It shouldn't change the compilers ability to reorder things. That is what write barriers and synchronization instructions are for.

Hi Rick,

a volatile pointer by definition is a memory adress containing a pointer
that may be changed from outside current program flow (eg. annother
task, isr routine)
this is not the same as a pointer to a volatile memory location (or any
other type).

but I also don't know what you get when you succeed in defining a
pointer to a volatile function.
Beside: I've never seen or heard of it before... (probably due to my
lack of experience with gcc:-))

Anyway the C compiler is not allowed to change order of C instructions
(";" separated...).
This has nothing to do with volatiles being used within inside each
instruction.
The order inside a single instruction of course is up to the compiler.

For easier understanding: The volatile keyword shall cause the compiler
to directly read the contens of the variables memory location before
using it within a C instruction (not the same as operation!), and must
write the contens immidiatly back to the memory location when finishing
the instruction.
Thus it can not be kept in a working register,- in turn declaring
somthing "volatile register" must give a compilation error.

When it comes to CPUs with caching and pipelines, or different execution
units that can act in parallel then it is up to the compiler to ensure
that the result has established before reading it in next instruction. -
In this case think of volatile hardware register that even change
without program flow... It must be ensured these are not being cached.

However there are processors that can not directly modify memory contens...
For each C instruction these will copy the contens to a working register
(ex. accumulator) modify it according to the C instruction and write it
back before the next instruction is executed.
Assuming : volatile int a;
a++;a++; // this will two times (read value - increase value, write
value) from/to memory
where as:
a = (a * 5) + (a * 3) + a; // this shall (but may not) once read value
- calculate - write value;

The latter one largely depends on implementation of the compiler, some
implementers of compilers consider it more safe to read a each time from
memory, even within one C instruction.
Keep this in mind when porting between platforms.

So if you intentionally mean:
get value - calculate result - write result to value without getting
messed up from interrupt service routine then _do_ code this way...
*prevent isr / dispatching*
Get value - calculate - write value;
*enable isr / dispatching*

even the instruction
a++;
although looking quite harmless may not be safe against interrupt
service routines,- if 'a' is not atomic on the specific platform you may
end up in big trouble.

Rick Altherr schrieb:

Hi Rick,

a volatile pointer by definition is a memory adress containing a pointer
that may be changed from outside current program flow (eg. annother
task, isr routine)
this is not the same as a pointer to a volatile memory location (or any
other type).

Right, all of the register macros should be defined as volatile pointers.

but I also don't know what you get when you succeed in defining a
pointer to a volatile function.
Beside: I've never seen or heard of it before... (probably due to my
lack of experience with gcc:-))

No idea, but a pointer to a volatile function isn't necessary in this case.

Anyway the C compiler is not allowed to change order of C instructions
(";" separated...).
This has nothing to do with volatiles being used within inside each
instruction.
The order inside a single instruction of course is up to the compiler.

The C compiler isn't allowed to change the semantic order of C "instructions." That does allow the C compiler to intermix two C "instructions" as long as they are completed in the semantic order. That is exactly what the C compiler is doing in this case. For ARM, a *(unsigned *)REGISTER = 0x1 C "instruction" assembles down to 2 ASM instructions: a mov immediate and a store. The subsequent *(unsigned *)REGISTER2 = arg assembles as a single instruction since arg is already in a register. Since the two C "instructions" share no data, they have no semantic ordering between the two operations. Thus, the compiler intermixes them to avoid a pipeline bubble. Now, if the registers were marked with volatile, aliasing rules may come into play and the compiler may not be allowed to assume that the two pointers are not equivalent and thus a data dependency enforces a semantic order.

When it comes to CPUs with caching and pipelines, or different execution
units that can act in parallel then it is up to the compiler to ensure
that the result has established before reading it in next instruction. -
In this case think of volatile hardware register that even change
without program flow... It must be ensured these are not being cached.

Right, but in this case there is no data dependency so there is no semantic ordering between the two register operations at the C level. C is unaware of the hardware side effects of the operations.

However there are processors that can not directly modify memory contens...
For each C instruction these will copy the contens to a working register
(ex. accumulator) modify it according to the C instruction and write it
back before the next instruction is executed.
Assuming : volatile int a;
a++;a++; // this will two times (read value - increase value, write
value) from/to memory
where as:
a = (a * 5) + (a * 3) + a; // this shall (but may not) once read value
- calculate - write value;

The latter one largely depends on implementation of the compiler, some
implementers of compilers consider it more safe to read a each time from
memory, even within one C instruction.
Keep this in mind when porting between platforms.

So if you intentionally mean:
get value - calculate result - write result to value without getting
messed up from interrupt service routine then _do_ code this way...
*prevent isr / dispatching*
Get value - calculate - write value;
*enable isr / dispatching*

even the instruction
a++;
although looking quite harmless may not be safe against interrupt
service routines,- if 'a' is not atomic on the specific platform you may
end up in big trouble.

All good points when dealing with volatiles. You still need to watch when a specific set of operations must happen in a specific order. You either need to use a volatile asm block or a synchronizing instruction between the C operations.

Rick

Hi Rick,

I almost agree, but to me the compiler for a specific platform _must
know and deal with special considerations_ being imposed by the target cpu.
Ex: take a look at Motorola DSP CPU's (I remember 56k CPUs) and a 'good'
compiler for it.
If you take a look at the resulting ASM code you will find up to 3
instructions per row!
The compiler I used then would even take care of NOP'ping 2'nd 3'rd
instruction or following row(s) if it was necessary (It was the DSP 56k
Compiler on Next Station..).
Same (not three instructions per ASM row of course) goes for the
pentium, as for almost any last generation CPU that have Arithmethic Co
processors, pipelines and caches.

However with a new compiler, a newly ported (which is prone to errors)
or a compiler that is as universal as gcc you will find problems in
keeping up with these 'special' considerations for specific targets, -
then there is no way but using ASM routines...

Of course you are right when it comes down to special timing, and/or
sequence considerations for I/O devices of controllers, such as special
hardware registers interfering with pipelining and caching.
Ex. Watchdog registers...
The compiler must not know about these, so if it generates code that
does not meet the requirements of these registers, you will either have
to turn off optimization or provide an ASM function to do the job.

As for the example "volatile int a; a= (a * 5) + (a * 3) + a;"
I missed to provide a reason why the compiler should only once read a
for the above operation...
The reason is that we for example could be reading/writing a status
register,- if a is read multiple times and we are checking flags then we
will end up in trouble as it changes after first read...
Best practice is again to code what we want, so we take a copy before
processing it...

To my knowledge the 'C' definition is not very clear on how to deal with
volatiles inside a C-instruction. Because of this, there are different
interpretations resulting in non portable expressions if the target
and/or source variable is a hardware register...

Rick, thanks for the discussion there are not too much people to share
thoughts on this and I still feel I don't have everything in the right
order on these issues :-).

regards,
  AdT

Rick Altherr schrieb:

Hi Toom,

I totally agree, that the construction: "volatile int a; a= (a * 5) + (a *
3) + a;" is one of the hard ones, which as well to my knowledge isn't 100%
clear how to handle from the C-standard.

The original problem however, was of a little different flavor. Namely:
{
  volatile int a;
  volatile int b;

  a=1;
  b=2;
}

As far as I know the C-standard the compiler isn't allowed to change the
semantic assignment order of a and b since they are both declared volatile.
I hope we all agree on this?

The next step is now if the compiler should take care about special internal
CPU details like pipelines/branch prediction/etc., which under certain
circumstances could/will cause the code to misbehave. In my opinion it
should, but again I think this is something which is very unclear from the
specification, and which might differ from compiler to compiler...

Best regards
  Søren

-----Oprindelig meddelelse-----
vegne af Adrianes den Toom

Rick Altherr schrieb:

Hi Rick,

a volatile pointer by definition is a memory adress containing a pointer
that may be changed from outside current program flow (eg. annother
task, isr routine)
this is not the same as a pointer to a volatile memory location (or any
other type).

Right, all of the register macros should be defined as volatile pointers.

uhm, you probably mean pointers to volatile memory locations…
because as I wrote: this is not the same.

think of:

volatile int a; <small><small>// a is volatile</small></small>
volatile int *p_a; /<small><small>/ p_a (pointer) is _not volatile_, but the memory location it points to is</small></small>
typedef int * tp_int; <small><small>// defining a pointer to an int;</small></small>
volatile tp_int vp_int; <small><small>// a volatile pointer to a non volatile int</small></small>

so the macros should be defined as
<small><small>*(volatile <type> *) <address> // giving you a non volatile pointer to a volatile memory location</small></small>


this is almost the same as with const (also a modifier)

const int a; <small><small>// a const declaration</small></small>
const int * pc_int; <small><small>// a pointer to a const int,- _not_ a const pointer to an int...</small></small>

regards,
AdT

Hi Søren,

the compiler has to take care of pipelining and all special considerations as far at it is associated with the CPU kernel and transparent storage memory (RAM).
But we must not mess up the term CPU and Controller…
The OMAP is not only a CPU because it has special Hardware register for I/O (like UART, Watchdog etc.).
It is not in responsibility of the compiler to take pipelining, caching for these Hardware related I/O-registers into account when it comes down to set bits of a register or a set of associated registers in a special sequence (especially when optimization is on…).

For example if you take a look at a CPU with coprocessor that runs floating point and given statements:

float f_a = 4.711;
float f_b, f_c;
f_a /= 1.99999; // expected to take some machine cycles
f_b = f_a;
f_c = 0.815;

the compiler will have to ensure the result for “f_a /= 1.99999;” is available before submitting it to f_b. So it may have to add nops in order to fullfill this requirement (optimization off). But with optimization on it might shedule the initialization of f_c (as long as it is not compromising f_a, f_b) before copying the result to f_b instead of filling nops.
As you can see with optimization on there is a change in sequential order for the last two statements, and the C compiler is not only allowed but expected to do so for the sake of runtime improvement.
If above code is inside a function and the compiler is really good in optimization you may even find that on entry of function when variables are allocated on stack (or in working registers of CPU) f_c gets initialized directly after f_a… (when optimization on).
However declaring f_c volatile should prevent the compiler from doing so, even if optimization is on. But I doubt that all compilers will pay respect to this requirement.

I doubt there is a clear statement for the C definition on how to handle the optimization of instruction sequence if all variables (not being connected) are declared volatile. The compiler can not know if the change of sequence will compromise the operation on HW I/O control registers intended by the user because this is way out of its scope.
To my opinion a good optimization must lead to the fact that f_c gets set before f_b if the calculation of the division takes to long,- so I’d expect this to happen.

As for my example with the DSP 56k Compiler:
With optimization on there were hardly lines with nop instructions to find. In worst case the optimizer would even move two follow up statements in front of the first. With optimization on, there was almost no chance to understand whats going on. Analysing the resulting ASM code was always good for a severe headache after only a few minutes :-).

Søren Steen Christensen schrieb:

Hi Toom,

Thanks for a fruitful discussion – As you wrote earlier not too many persons, knows too much about these issues in real detail J

I doubt there is a clear statement for the C definition on how to handle the optimization of instruction sequence

if all variables (not being connected) are declared volatile. The compiler can not know if the change of sequence

will compromise the operation on HW I/O control registers intended by the user because this is way out of its scope.

This is indeed my main argument why the compiler needs to accept the order of the statements given by the user. I’m 99.999% confident, that the compiler isn’t allowed to change the order of (non connected) volatile read/writes, since this would make reliable HW peripheral accesses impossible from the C language…

With respect to your DSP56k headache I as well agree, although I have to admit that I haven’t touched such a DSP for ages, although I still clearly remember to complete un-understandable ASM in case you switch on all optimizations J

Best regards

Søren

Hi Søren,

I’m 99.999% confident, that the compiler isn’t allowed to change the order of (non connected) volatile read/writes

to be honest, I doubt that this is covered by the C language definition. I have seen almost any flavour of interpretation as a result of compiled code. Also deferring the compiler from optimization is not really an intention of ‘C’ :-).

Better use a pragma or a module setting to keep the code away from getting optimized, or use ASM functions to wrap the desired functionality when dealing with external I/O and Hardware registers.

— running completely off-topic now —

This is the reason why there are only few people to write drivers and a lot of application programmers…
The complexity of understanding increases from application - multi threading - multi processing & multi threading - driver development and last not least compilers…
The problem with the driver devopment are the hw related obstacles as well as you never will be done with learning,- suddenly the behaviour of a ‘standard’ SPI interface changes completely due to some really benefical (but probably barely or badly documented) extensions at the hardware block and you will again feel like it’s the first time you ever do this job ;-).
Last time I had this was LPC2114 (Phillips ARM) which took me about 4 days for a merely stupid SPI driver. I got shattered by some ‘special’ considerations for HW interfacing due to multi-master implementation of the SPI Block…

Of course the application guys have to fight with increased processor performance and resulting demands on new GUI concepts, but this is not likely to render your previously gained knowledge unusable as you will quite often experience with HW drivers.

Compiler design to me is one of the most challenging issues, especially if we are talking about a compiler platform like gcc.
I’m happy I don’t have to break my mind on this…
Once I created an assembler for 65xx CPU, written in assembler, starting up by assembling it with a Basic program…
Assemblers were quite easy at that time :-), but it took me several weeks to complete. Finally it assembled itself within much less than a minute, compared to more than one hour with the Basic program.

I haven’t touched such a DSP for ages
hi grandpa :slight_smile:

best regards,
AdT

Søren Steen Christensen schrieb:

yajin,

yajin wrote:

Hi all,

Thanks for of all of your kind detailed explanation and suggestion.

I wonder why the x-loader code on beagle board google code can execute
rightly on the real hardware!

Write 0xffffffff to OMAP_HSMMC_STAT after OMAP_HSMMC_CMD register will
clear all the status bit in register OMAP_HSMMC_STAT, including command
complete bit(bit 0). And x-loader will be do infinate loop in checking
command complete bit(bit 0) of OMAP_HSMMC_STAT to see whether last
command has been completed......

PS: I remove -Os from OPTFLAGS and gcc and generate stupid but right
code now.

Do you like to test recent x-loader and U-Boot from Steve's git?

http://elinux.org/BeagleBoard#X-Loader

http://elinux.org/BeagleBoard#U-Boot

Thanks

Dirk