Random SIGILLs in arbitrary programs

Hello,

  I'm afraid I have a mysterious problem and I was wondering if anyone
could shed any light on it?

  I am trying to run quite a bit of a code on a Beagle xM, Rev C and I
get random segfaults and/or SIGILLs.

  These turn up, occasionally, in every program I attempt to run - busybox, Python, my C++ code - everywhere. One just happened in
init.

  They seem mostly to be associated with ELF thunks; I will get some
code like:

  ldr r3, [ r12, # <something> ]
  mov r0, r3
  bx r0

  If I inspect these crashes in gdb:

  - my r12 looks plausible.
  - [r12, #<something>] is a legal address.
  - r3 is that legal address
  - r0 is total garbage
  - I have attempted to jump to the garbage, resulting in the
      fault.

  It looks as though something, somewhere (the kernel's context switch
code?) has caused my registers to become corrupt - either that, or there is a RAM problem somewhere causing occasional reads to go bad -
though I don't really buy that theory because then r3 should also be bad.

  Bolstering my "the kernel is screwing up" theory, strace segfaults
moderately reliably.

  I am not out of memory.

  I am using Linux 3.12rc1, with Tony Lindgren's patches, an up-to-date
u-boot and X-Loader 1.51 , gcc 4.8.1 and binutils 2.22 (from yesterday's
crosstool). I am ARM-only - no thumb code at all.

  I'm mounting my root over nfs.

  My kernel arguments are:

console=ttyO2,115200n8 mpurate=1000 mem=256M rootwait rw root=/dev/nfs ip=10.30.1.8:10.30.1.1:10.30.1.1:255.255.255.0:eth0 nfsroot=10.30.1.1:/export/elb2 earlyprintk=1 earlycon=ttyO2,115200n81 loglevel=8 vram=12M omapfb.vram=0:4M omapfb.vram=1:4M omapfb.vram=2:4M omapfb.mode=dvi:640x480MR-16@60 omapdss.def_disp=dvi nohlt

  .. and you might say "stop using such an adventurous setup". So I tried:

  - Rootfs on the uSD card.
  - Linux 3.2.0, which was stable on another project (with an Overo)
  - u-boot from that project.
  - x-loader hasn't changed since that project
  - arm-2011.03 gcc from codesourcery

  .. and got the same result.

  I have also tried several different power supplies (the one this project uses, which is a local regulator, a bench supply, a wall wart):
no effect.

  You might also say "you have a bad batch of Beagle boards" - so I tried
one from a previous project which also seemed fine there. And that one
exhibited the same fault - though it might have been doing it in the
previous project and we would have been unlikely to have noticed. But
it's not this batch that's the problem - "old" boards do it too.

  So, can anyone tell me what I've missed? I'm running out of straws
to clutch at ..

Richard.