MEMS Accelerometer code missing data

All,

I’m working on a Beaglebone Black project using a MEMS accelerometer chip, the ADXL312.

The accelerometer has a pin which goes active each time a new data sample is available to be read.
It runs at 3200 Hz. (about 1/3 of a ms per sample)

The pin is connected to a GPIO input and I’ve written code to monitor this pin and grab the data each time.
The sampling normally runs for several seconds continuously.

The problem is, every once in a while Linux gets busy and apparently interrupts my code and I miss samples.
(an overrun bit lets me know this is happening)

I’m looking for ideas on how to improve this so data samples are not missed.

Any ideas?

Kirk

Yes, I have 2-3 suggestions where the first suggestion may not work, but are worth attempting.

  • Upgrade to an RT kernel.
  • Use /dev/mem/ + mmap()
  • Bit bang via a PRU
  • Or write a kernel driver that does all this from kernelspace, but writes data to a file userspace hass access to.

Or use a gpio pin that generates an interrupt, mask interrupts do your
data retrieval re enable interrupts.

you are asking for this kind of an issue when polling.

Interrupt can’t be used traditionally from userspace. interrupts used traditionally are kernelspace only. However . . . there are tricks to getting around, but performance will suffer. Which leaves us back at square 1

Kirk,

What bus are you using (SPI or I2C)? Both are "slow" buses which can sleep on
you. In addition, are you using the FIFO?

Scheduling your reads and using a fast enough SPI setting along with the FIFO
should let you acquire at 3.2KHz w/o drops.

Sounds like to me hes using a 1wire device. Reading from SPI or I2C should not present that sort of problem. Granted at 1/3ms intervals. From userspace even on an RT kernel is going to have occasional latency. This is to be expected.

Thanks for the suggestions.I don’t understand /dev/mem/ + mmap(). How does this work?

Writing a kernal driver sounds like the great idea but is probably very difficult.

On thing I should clarify. The loop that is polling and grabbing the data keeps up with no problem.
It is SPI. The problem is when Linux gets buys with something it just “goes away” for multiple milliseconds or more and data is lost.

If I want to experiment with the PRU, which debian image should I use?

Thanks,

Kirk

I’m no expert, but as far as I can tell there a Linux interrupts handling various things that cannot be disabled. Hence the problem.
Kirk

It is SPI and it keeps up with data collection with no problem. The problem is when Linux gets busy with something else and interrupts the data collection.
Going as fast as I can with SPI helps because sometimes the Linux “interrupts” are short and it has enough time. But when a multiple millisecond interruption happens the data is missed.

Kirk

On thing I should clarify. The loop that is polling and grabbing the data keeps up with no problem.
It is SPI. The problem is when Linux gets buys with something it just “goes away” for multiple milliseconds or more and data is lost

Very unlikely. First of all. The SPI is an actual hardware module. Second of all, SPI in terms of simple peripherals is the fastest simple bus on any system.

The problem you’re likely experience is in how Linux communicates with the SPI module. Or, you’re doing something silly like trying to print the output to screen every read. In code.

Anyway, if you use the PRUs, all those problems will go away.

I have a scope on the SPI signals there is nothing I can see that is a problem. The SPI hardware module is doing its job and clocking all the bits.
It’s recognizing the data ready, doing the SPI transfer, saving the data with time to spare. (99% of the time anyway)

To clarify, it’s not overrunning a bit in the SPI word.
The problem is that the data ready bit comes at a steady 3200Hz and sometimes it just doesn’t get back in time for the next data transfer.

As far as I can tell no fault of the code I’m running.
The code is focused only on the tight loop that monitors the data ready bit and then uses the SPI to transfer the data.
I assume there must be things going on in Linux like WiFi communications and Ethernet communications that interrupt my code.

I’m using a somewhat older image.
What image should I use to get started working with the PRU?

Kirk

I have a scope on the SPI signals there is nothing I can see that is a
problem. The SPI hardware module is doing its job and clocking all the
bits.
It's recognizing the data ready, doing the SPI transfer, saving the data
with time to spare. (99% of the time anyway)

I can tell you that the SPI modules, as I recall are only set to run at
16Mhz ? Something much lower than these modules max out at. I seem to
recall someone saying that these modules can easily run 50Mhz, but I can
not think of the source from which I read that.

To clarify, it's not overrunning a bit in the SPI word.
The problem is that the data ready bit comes at a steady 3200Hz and
sometimes it just doesn't get back in time for the next data transfer.
As far as I can tell no fault of the code I'm running.

Show us the code. You can't have anything we( who are programmers ) have
not seen already. But this way we( I ) can examine the code and tell if you
"we" see anything that can be a potential problem.

The code is focused only on the tight loop that monitors the data ready
bit and then uses the SPI to transfer the data.
I assume there must be things going on in Linux like WiFi communications
and Ethernet communications that interrupt my code.

I'm using a somewhat older image.
What image should I use to get started working with the PRU?

Kirk

It sounds like you might be running a 3.8.x kernel ? But you can start with
any image. However, I might suggest you start off with the newest images.
As changes have been made to make it simpler to use either remoteproc, or
uio_pruss.

So what the PRU will net you is a tighter loop, that runs outside of linux.
So this means that your code can not be preempted by anything Linux. Pretty
much, and put very basically. It'll be like running your code on a bare
metal Cortex M3. NBut this Cortex M3 is special in that these do not have a
instruction pipeline. Which means that instructions typically take only one
cycle( 5ns ) There are multi cycle instructions, but you can do your best
to stay away from those as much as possible. *If* you need to. Initially
though, it seems like you'll have plenty of room, while still being able to
maintain your needed timing.

What I would recommend that you try first is upgrading your kernel, and
upgrade to an rt kernel to see if that helps any. 4.x kernels by themselves
already seem more responsive to me, but it is entirely possible all you'll
need is an rt kernel. I've yet to test SPI on an rt kernel myself, but what
I have done is parse 1Mbit worth of CANBUS data, in realtime, parse out,
and send out via websockets around 1,000 seperate fields per second. So . .
.

$ sudo apt-get update
$ apt-cache search linux-image |grep 4.4.*rt*

And then just pick one of the newer bone, or TI kernels. Then give your
code a whirl on that.

OK, sorry, multi tasking here so i kind of left out some important information.

The PRU’s have direct access to any peripheral on the am335x processor. So these PRU’s can twiddle an SPI’s registers, read from it’s FIFO buffer, etc.

This is very similar to what /dev/mem/ + mmap() would net you. However /dev/mem/ + mmap() would use the main processor to do this. From userspace Linux. Basically, you’re going behind the kernels back. Or doing all this without the kernels knowledge. Once the register’s memory is mapped.

This is what I’m running.
Linux beaglebone 3.8.13-bone70 #1 SMP Fri Jan 23 02:15:42 UTC 2015 armv7l

It’s a long story but this system has been growing in both hardware and software for some time and we haven’t wanted to rock the boat by upgrading. Maybe now’s the time.
I was wondering what won’t work on the rt kernel?

Even if the SPI and code is SUPER fast it won’t solve the problem if something else takes over for more than 1/3 of a millisecond.
I’m not using the FIFO on the ADXL312 because it creates extra noise on the signals. (it’s a design problem in the chip).

The code is written in Python and it uses the PyBBIO library to handle the SPI communication and GPIO.
Here’s the snippet of code that is doing the tight loop grabbing the accelerometer data:

*********************** begin snippet **************************************
try:
while sample_count < sensor_channel.samples_per_burst + extra_samps_for_smoothing:

wait for interrupt bit to go hi before grabbing the sample

while bbio.digitalRead(sensor_channel.accel_interrupt_gpio) == 0:
loop_count += 1
if loop_count > max_loop_count:
break

read the SPI data

spi_read_data = bbio.SPI0.transfer(0, [((ADXL312_REG_INT_SOURCE_R | 0xC0)
<< 8), 0, 0, 0, 0])

enc_unpack = struct.unpack("<L", mem[EQEP2_POSITION:EQEP2_POSITION + 4])[0]

INT_SOURCE spi_read_data[0] hi byte store in int_source

nothing spi_read_data[0] lo byte

DATA_FORMAT spi_read_data[1] hi byte

DATAX0 spi_read_data[1] lo byte store in xlo

DATAX1 spi_read_data[2] hi byte store in xhi

DATAY0 spi_read_data[2] lo byte store in ylo

DATAY1 spi_read_data[3] hi byte store in yhi

DATAZ0 spi_read_data[3] lo byte store in zlo

DATAZ1 spi_read_data[4] hi byte store in zhi

FIFO_CTL spi_read_data[4] lo byte ignore

it timed out, break out of the outer loop

if loop_count > max_loop_count:
break

grab the bytes and put them in the right place

x_accel = (spi_read_data[2] & 0xFF00) | (spi_read_data[1] & 0x00FF)
y_accel = (spi_read_data[3] & 0xFF00) | (spi_read_data[2] & 0x00FF)
z_accel = (spi_read_data[4] & 0xFF00) | (spi_read_data[3] & 0x00FF)

convert to ± 32767

if x_accel >= 32768:
x_accel -= 65536
if y_accel >= 32768:
y_accel -= 65536
if z_accel >= 32768:
z_accel -= 65536

BUFFA.append(x_accel)
BUFFA.append(y_accel)
BUFFA.append(z_accel)
BUFFP.append(enc_unpack)

count the number of times data is missed

if spi_read_data[0] & 0x0001:
overrun_count += 1

sample_count += 1
except KeyboardInterrupt:
pass

*********************** end snippet **************************************

OK, right off I can tell you that your try block is bad. Well not the actual contents of the block, but try / catch blocks should never be used in the middle of performant code. So if that try statement is called more than once when the program first starts. You need to refactor it out, or move it so that try block is only run once.

Additionally, your code is Python. Python is an interpreted language, and as such you will incur performance penalties because of that. This goes for any interpreted language. NOt only are scripting languages slower than natively compiled languages. But they use more CPU. Which is most definitely where part of if not all of your problem lies.

The good news is that this code would port to C really well / easily, and C loves good performant bit manipulation :slight_smile:

Heres a decent benchmark that should give you a rough idea of performance differences. http://benchmarksgame.alioth.debian.org/u32q/compare.php?lang=python3&lang2=gcc

So, you can see it really depends on what you’re doing. But also you could use a “python compiler” to compile your python into a native binary. But I do not know much about that. SO could not tell you which to use. I’ve also “heard” of something called ‘Cython’ so maybe that is something to look into.

Many Die hard Python “fans” aware by the language, and I can certainly understand wanting to use a high level language that has first class strings, and all the rest that comes with the language. However . . .personally, I do not have issue working with strings in C, or if I did, I’d use C++ . . .