Custom board issues

Hey guys,
              I've got a custom OMAP3503 board with 256Mbytes of LPDDR
memory (single die, x32-bit) that i've got working with x-loader and u-
boot. However, when i load the Linux kernel, it randomly produces the
following errors:

1) Stops midway through the "Uncompressing Linux ...." stage, and this
is arbitrary and random on different attempts
2) If it ever does go through, it sometimes displays "invalid
compressed format (err = 2)" or "crc error", and says "System halted"
3) If i disable caching in the kernel (in arch/arm/boot/compressed.S,
line 233 "bl cache_on"), it at least completes the Uncompressing stage
fine and proceeds to print "....... done, booting the kernel.", at
which point it does not boot (most likely since i disabled the
caching)
4) If i reset the board (not power-cycle) after the board hang above,
and do a crc32 check on the kernel image in DDR memory (this is
possible since DDR2 memory contents do not get changed after a soft-
reset), it matches the CRC32 that i have calculated manually on the
kernel image. This shows that the main kernel image (from which it is
uncompressed) is still intact.

I've run my own DDR2 tester using the ROM bootloader, and done address/
bus stuck-at testing as well as complete psuedo-random data testing on
the full 256MByte memory. They all check out fine. The only difference
i have from the BeagleBoard is that i'm using the CUS 0.65mm packaged
version of the OMAP3503 as well as a non-POP LPDDR memory with single
die (not dual die in the Beagle Board) - hence, i had to modify the
DDR MCFG register for 14-bit RAS width and 256MByte CS space to get
this to work properly.

If you guys have any ideas, please do help as i'm at my wits end!

Thank you kindly,

Jerry

Jerry Johns wrote:

Hey guys,
               I've got a custom OMAP3503 board with 256Mbytes of LPDDR
memory (single die, x32-bit) that i've got working with x-loader and u-
boot. However, when i load the Linux kernel, it randomly produces the
following errors:

1) Stops midway through the "Uncompressing Linux ...." stage, and this
is arbitrary and random on different attempts
2) If it ever does go through, it sometimes displays "invalid
compressed format (err = 2)" or "crc error", and says "System halted"
3) If i disable caching in the kernel (in arch/arm/boot/compressed.S,
line 233 "bl cache_on"), it at least completes the Uncompressing stage
fine and proceeds to print "....... done, booting the kernel.", at
which point it does not boot (most likely since i disabled the
caching)

no, kernel should boot uncached, it will just be slow(er).

4) If i reset the board (not power-cycle) after the board hang above,
and do a crc32 check on the kernel image in DDR memory (this is
possible since DDR2 memory contents do not get changed after a soft-
reset), it matches the CRC32 that i have calculated manually on the
kernel image. This shows that the main kernel image (from which it is
uncompressed) is still intact.

I've run my own DDR2 tester using the ROM bootloader, and done address/
bus stuck-at testing as well as complete psuedo-random data testing on
the full 256MByte memory. They all check out fine. The only difference
i have from the BeagleBoard is that i'm using the CUS 0.65mm packaged
version of the OMAP3503 as well as a non-POP LPDDR memory with single
die (not dual die in the Beagle Board) - hence, i had to modify the
DDR MCFG register for 14-bit RAS width and 256MByte CS space to get
this to work properly.

If you guys have any ideas, please do help as i'm at my wits end!

tried with a lower clock speed? not as a final goal to run at low speed
but for a start....

no, kernel should boot uncached, it will just be slow(er).

The kernel will defn. boot uncached - i just meant that in the
uncompressor's head.S, it enables caching, uncompresses then disables
caching before booting the uncompressed kernel. If i disable this
cache enable, then it seems to get to the next logical step.

tried with a lower clock speed? not as a final goal to run at low speed
but for a start....

Lower DDR speed? Or ARM speed? I've lowered DDR to 132Mhz, still same
issues. Should i try lowering ARM clock? Don't i need to scale the
core voltage down as well?

I've tried lowering the DDR clock to 132Mhz, it was of no avail
I've also verified the complete MMU page tables before and after the
errors (using CRC) as well as vast swaths of memory
I've also checked DDR timings, the DDR registers as well.

All of this still produces random failures, including "crc failure",
"incomplete literal tree", "invalid compressed format (err=1)", and
random hangs

Any help would be appreciated, as i'm quite desperate at the moment!

Jerry

Schematic?

Gerald

I have the same issues with ARM9 processor and still try to find the solution. I’m sure that the CPU and the RAM are perfect, so the only thing is left is PCB. How many boards do you have assembled? What if you try the same at another board?

2010/4/9 Gerald Coley <gerald@beagleboard.org>

Jerry Johns wrote:

I've tried lowering the DDR clock to 132Mhz, it was of no avail
I've also verified the complete MMU page tables before and after the
errors (using CRC) as well as vast swaths of memory
I've also checked DDR timings, the DDR registers as well.

All of this still produces random failures, including "crc failure",
"incomplete literal tree", "invalid compressed format (err=1)", and
random hangs

Any help would be appreciated, as i'm quite desperate at the moment!

  I don't think this helps, but I've had various boot problems related
to the cache - the one I tracked down a fair way was rather different
in that when you turned the MMU on it came up with a memory mapping
subtly different from what was in the processor's view of the
memory containing the page tables - Linux then faulted in odd ways
as parts of the kernel got aliased by other parts of the kernel.

  I infer that there is some way in which you can boot the CPU on
OMAP that makes things go wrong, but the ARM documentation here is
vague and I had no time so I gave up, and in fact my problem eventually
went away for no very good reason.

  My next move in your position would probably be to hack the uncompress
code to checksum each block as processed and see if where it goes
wrong is suspicious (i.e. on a page boundary). It's possible your
bootloader/MLO has done something silly like enabling a 1:1 memory
mapping that isn't, or that your uncompress-to address is outside
memory (or at the top of memory but just tragically not quite low
enough).

  I'm guessing you know this, but *((unsigned long)TXREG) = '<char'>'
is your friend (where TXREG is the location of the relevant UART's
TX register - I can dig out my old debug macros if you don't have
some of your own)

R

Richard Watts wrote:
[snip]

I'm guessing you know this, but *((unsigned long)TXREG) = '<char'>'

  Err, I mean *((unsigned long *)TXREG) = 'char' , obviously..

R

Gerald:
Unfortunately, i can't release the schematic but i can say that the
design is quite similar to the BeagleBoard, with the following
exceptions:
- OMAP3503 instead of OMAP3530, with CUS 0.65mm packaging
- TPS65930 instead of TWL4030 (or known as TPS65950)
- single-die, x32-bit 256MByte DDR Memory from Micron
- EMAC + NAND

Maxim:
I've tested this on 4 different PCBS, all with the same results

Richard:
Do you happen to know how you fixed it? If you have any information
regarding the specifics of how the MMU was mis-configured, that would
be helpful. Currently, neither u-boot nor the MLO set up the MMU - in
fact, one of the cleanup operations before the Linux image is booted
is to clear cache - even in the uncompressor start-up code, i can see
it clearing cache, inserting memory barriers and then setting up page
tables, etc.

The same kernel is working absolutely fine on the Overo Gumstix and
BeagleBoard

If you are using external DDR is nothing like the Beagle. This is a BIG difference between the boards.

Gerald

Jerry Johns wrote:

Gerald:
Unfortunately, i can't release the schematic but i can say that the
design is quite similar to the BeagleBoard, with the following
exceptions:
- OMAP3503 instead of OMAP3530, with CUS 0.65mm packaging
- TPS65930 instead of TWL4030 (or known as TPS65950)
- single-die, x32-bit 256MByte DDR Memory from Micron
- EMAC + NAND

Maxim:
I've tested this on 4 different PCBS, all with the same results

  Frankly, if you can run an MLO-a-like which fires up the serial port
and tests memory - and I am guessing this is how you check your
memory - I kinda doubt that memory faults are your problem, though
it's worth redoing every so often: DDR can hold charge (with a
few bit errors) for about a week at room temperature so it's
easy to screw your write timings (specifically precharge-write
delays) and not notice.

  Messing up your page selection address bits is a possibility,
as would be chip addressing issues if you've address-muxed two
chips (does anyone even do this any more?). IME, DRAM can be
quite tricksy with regard to magically working for a while
when it really shouldn't.

Richard:
Do you happen to know how you fixed it? If you have any information
regarding the specifics of how the MMU was mis-configured, that would
be helpful. Currently, neither u-boot nor the MLO set up the MMU - in
fact, one of the cleanup operations before the Linux image is booted
is to clear cache - even in the uncompressor start-up code, i can see
it clearing cache, inserting memory barriers and then setting up page
tables, etc.

  I'm afraid I didn't. I put quite a bit of debug into head-common.S
(my bug having turned up after uncompress). What seemed to be happening
was that the page tables got set up to map 0x80000000 in 1Mbyte chunks
(I think) and though the page table memory looked correct when I
accessed it using *((volatile unsigned long *)ADDR) , page 1 read and
wrote the same physical memory as page 0.

  The first time the kernel tried to call a function > 1Mbyte, it
would refer to memory below 1Mbyte in the image and chaos commenced.

  As far as I can tell, Linux was doing the right thing. Some external
environmental factor was messing things up, but I have no idea what
it was or why I never suffered from it again. I have seen similar issues
occur on i.MX27-based boards, but whether that was the same thing,
a related bug, or I'm just accident-prone, I don't know.

  Eventually, in exasperation, I pulled the latest OMAP kernel from
git and it just worked. I sighed deeply and got on with my project..

  As I say, I reckon your way forward now is probably painstaking
serial debugging through the decompressor to find out where it's going
wrong.

R

Hey guys,
        I managed to solve the problem! Thank you all for helping me
in narrowing the scope of the issue - turns out the problem was the
DDR2 controller and its associated timings. I had originally validated
the SDRC timing parameters (SDRC_ACTIM_CTRLA, SDRC_ACTIM_CTRLB and
SDRC_RFR_CTRL registers) by re-computing the values based on the
Micron datasheet - however, this did not solve the problem. Continued
testing on the software kept pointing to the conclusion that software
wasn't at fault and that it was a hardware issue. Since the only real
difference in hardware was the change in DDR, i kept working away at
that problem. Since correct timings did not help, i assumed the worst
and started to inflate the timings. Turns out the parameter TWTR (in
ACTIM_CTRLB) has to be increased from its current (and correct) value
of 1 minimum clock cycle to 2! That single change fixed all the
problems! The datasheet from Micron mentions that a minimum of 1 cycle
is all that is needed, but either the datasheet is incorrect or the
part we have is slower than expected.

Either way, problem solved and i'm now a happy camper :slight_smile:

Jerry

Glad to hear it! BTW, the datasheet is correct. However, you changed the parameters of the design by putting the part on a PCB. In essence you compensated for that added parameter by tweaking the values. That is why all of those settings are there to enable you to tweak those values based on your specific design. That is the nice thing about POP in that these extra variables are removed from the design.

Gerald

Hello All,

I am very new to this beagleborad platform.
I am using demo linux kernel image for my application on beagleboard.
Now, I am trying to lower the DDR clock settings to 133Mhz.
can any body help me to do the same.

Thanks in Advance.
Regards,
Surya.