Could PRU read this signals?

Hello.

I am quite new to PRU, sorry in advance if I make any mistake or incorrect asumption.

I would need to read a LVDS signal which is a 4 data channels + CLK and Framesync, that looks like this:

As you can see, the signals change in every edge (both falling and rising edges).

Of course, we would place a differential LVDS driver receptor prior to the beaglebone pru ports (so signal will be exactly as the black + traces in the figure). Our intention is to send them later to memory and probably the ARM would create a UDP packet to send them over ethernet.

I would like to know if there is a way to receive these signals with the PRU in any way and process them correctly.

Would it be a Direct Connection Mode configuration?

If not would it be possible to make a fast polling on the CLK & FRAME signals, and in every change read/sample the other 4 channels simultaneously?

Thank you all.

You don’t indicate how fast your clock is. Everything in the PRU is polled, so you would have to construct a loop that looks at each pin and makes a determination if the clock has changed state. The PRU operates at 200 MHz and has simple instructions, so you would have to calculate how many instructions you have for each clock to determine if it is even possible. Then, of course, you have to put the data somewhere. Typically one PRU is used to poll I/O lines and assemble the data into chunks that are passed to the 2nd PRU which can either put them in ARM DDR memory, send out, etc. If you get the voltages on the PRU input pins correct and the data rate is within what the PRU can handle, then this can be done.

First of all, let me thank you for your answer.

Our signal will be in the tens of MHz (from 5Mhz to 50MHz max), depending on configuration.

The polling method is what I expected, however the manual (spruh73q) states there are three methods (Direct Input, Parallel Mode and 28bit shift Register), which confuses me. In your proposal, is Direct Input used? What is the max speed at this mode, 100MHz?
What is the difference between using Parallel mode and Polling, the automatic clocking?

Sorry for my lack of deeper knowledge (yet). Thank you again.

Hi!

Thank you TJF, this is quite helpful.
Could you point me to a code example for a similar solution. I understand the lines would be read as a parallel in the registers, right?

pru_GPI.PNG

We were thinking about delaying on of the clocks half a cycle and using and AND gate (sothe output effectively is like having a clock with twice the frequency), and read using the parallel method. Is there any benefit in doing that (besides the max 50MHz freq.)? Is it easier to handle?

Best regards

How many PRU instructions will be required to...

Sample the clock

Determine clock state changed (eg: compare to previous value and branch
back to top if same)

Read data pin(s)

Write the data to memory

Return to top

... oh, and if you need to first look for the frame synch, that adds
another loop around the above with its sample/test

  At best, that looks to be 5 to 7 (frame synch version) instructions. A
five instruction loop with 200MHz processor results in 40MHz "baud rate".
If you need more instructions -- increment a memory pointer, say, that will
reduce the effective rate. If the worst case cycle (and you WILL have to do
that worst-case evaluation!) consumes 10 instructions, your effective rate
will only be 20MHz.

  That is based upon the PRU assembler instruction set -- if using a
C-level source, you may need to have the compiler dump an assembly listing
so you can study the instructions needed by the loops...

There’re 17 input lines in each R31 (bit [0:16]). It’d be best if you use a set of eight (bit [0:7] or bit [8:15]), because the data to write would be only one byte.

Anyhow, the following ASM code is for word data (bit [0:15] range), written to DRam:

`
#define CLKB 5 // define the clock bit# for polling

LDI r0, 0 // counter init
HIGH:
QBBC r31, HIGH, CLKB // wait for clk bit getting high
SBBO r31, r0, 0, 2 // safe data
ADD r0, r0, 2 // increment counter
QB?? ??, OUT // termination
LOW:
QBBS r31, LOW, CLKB // wait for clk bit getting low
SBBO r31, r0, 0, 2 // safe data
ADD r0, r0, 2 // increment counter
QB?? ??, HIGH // reverse termination
OUT:

// Note:
// In order to get higher frequency the SBBO + ADD instructions can
// get replaced by MVIW for buffering the data in the register file,
// but this is limited to 30*2=60 sets of data.

`

The main loop contains two similar sub loops, one starting after the clk line gets high, the other starting after the clk line gets low.
If the state of the clk line starts undefined, you’ve to add an initial QBB? before the main loop, in order to start at the right sub-loop.

AFAI understand your signal diagram, you’re dealing with redundant data. Only three lines provide information.

Find example code in the libpruio documentation. Example pruss_toggle defines an output line and loads firmware to toggle that line. You can adapt that code for your input lines.

What are you actually trying to do? That’s a 50Mbit sustained transfer rate if bit time is 20ns. 100Mbit if 10ns. And 4 channels. That’s 200-400Mbps. You’re moving a TON of data even at slower clock rates–there is no resource on the BBB that can keep that up very long.

This seems quite questionable on the BBB … even moreso if your timing diagram is valid (which I suspect it is not).

First, given your diagram, the sampling point would have to be recovered. Note that the Clock, Data, and Frame are all coincident. Normally the clock falls in the middle of the data window. That’s not easy to recover without a PLL. I doubt the BBB has enough resolution to hit the data stable point on the data with any reliability at 50MHz.

Second, are you actually running at 50MHz? Are you considering every cross as the frequency (high-to-low on a single line) or are you considering an individual signal rise-to-rise as the frequency. Is that data changing every 20ns or every 10ns? If it’s 10ns, it’s probably not possible.

I would STRONGLY recommend that you use an FPGA. Even an incredibly cheap FPGA would deserialize that with ease and then you could put it into a form in which you might be able abuse something like the RMII from the Ethernet peripheral to transfer it out.

But it’s still a LOT of data. You need something that can handle gigabit speeds if you keep this up even for a couple milliseconds.

Your best bet would be to use that FPGA to also decimate the data as well so that you’re working with a reasonable amount of data.

Thank you TJF for that detailed answer and thank you too Andrew for your insights.

We are trying to extract radar data from one radar board. The goal is not have it working at 50MHz (which is the system max rate), but have the possibility if needed. Probably we will work at 10 to 20MHz, and the data on the BBB is only sent via Ethernet to another device in the same network. We only need to extract the data from the 4 Data lines (channels) whenever CLK changes, the Frame line is only used to synchronized at the beggining since the format doesn’t change.

We have already done this setup with DSPs and MCUs with Ethernet integrated with no issues, however for my new design I only can use LVDS so a fast device (like PRU) is needed. B eaglebone would give me much more flexibility than a FPGA, because, I could decimate as you well said and/or implement a simple radar data viewer on the ARM core.

About Images

I too was skeptical of FPGAs until recently, I think you owe it to yourself to check out Zynq and Altera which combine FPGAs with ARM cores. The PL (FPGA part of these chips) can align the clock edges and write the data into the memory of the ARM processor. You might find it is possible to do the decimation in the PL saving the ARM for other things. You could potentially even generate UDP packets in the PL, if all you want to do is move the data someplace else. The infamous FPGA learning curve is quite real, but not insurmountable. There are lots of tutorials available, both from the vendors and from other sources.

Thank you for your point of view Steve, the thing is I preffer to have community behind like in the beaglebone and a smaller form factor, besides my goal is to generate radar board that people could use in their projects. If PRU is enough, would be fantastic, a little of AI from the beaglebone AI could help a lot in development of new killer applications.
Don’t you think?

Anyway, I will have the Zynq Z-turn board in mind.

You can certainly persist, but I’m going to point out the existence of chips like the AWR1843–“Single-chip 76-GHz to 81-GHz automotive radar sensor integrating DSP, MCU and radar accelerator”:
https://www.ti.com/product/AWR1843

This is about $30, and does all the RF-y things while sending your ADC data straight to a DSP and Cortex R4F with extra Radar-y things to accelerate analysis. This allows you to focus on analyzing the results instead of the guts of “implementing a radar”.

Anyways, good luck. Sounds like an interesting project.

Thank you Andrew. I am a radar engineer for more than 15 years myself.
We have designs more interesting than the AWR series of TI, which is quite interesting for some applications but no so much on others. The costs are a little bit higher than those $30, specially if you have to develop your PCB antenna.

My idea is to create a simpler, better device able to work with the beaglebone (or the beaglewire), allowing users to create their own radar applications in a very fast way (including the UI).

You can use the second PRU to de-serialize the data, while the first PRU is fetching them. Meanwhile the ARM CPU can provide the data transfer over the network (or PRU IEP module?), or it can do simple evaluations like pre-selection of relevant data sections, or GUI output …

Thank you TJF, that was exactly my idea from the beggining with that exact architecture.
If you think is feasible, then I will go for it.

Best regards

Hi,

I'm reading LTC2500-32 ADCs with the PRU. The LTC2500 delivers 32 Bits every usecond

as a 320nsec burst with 100 MHz bit rate. I receive the burst with a shift register in a

Xilinx CoolrunnerII CPLD and read the shift register bytewise with PRU2.

I think I could read 3 ADCs with the current timing (minimum requirement for me),

maybe 4 with some optimization. The PRU writes the data into this 12 KW shared ram,

organized as a ping-pong buffer. The ARM reads continuously half of the buffer while

the other half is written by the PRU. That also solves the problem that is is hard

to allocate REALLY big buffers in the virtual address space of Linux and fixing

their location somewhere for the PRU, in addition to the unpredictable duration

of a memory cycle when competing with the ARM for access.

I have currently paused that software development to first fix the analog part.

Cheers, Gerhard

That is really interesting Gerhard!!

Let me ask you, why did you decide to use a CPLD instead any other device?

Actually this signal configuration is very common in ADC+Serdes devices (ADCs with DDR bit clock), in example you can take a look at these application notes:

http://www.ti.com/lit/an/sbaa205/sbaa205.pdf

https://www.xilinx.com/support/documentation/application_notes/xapp866.pdf

My idea is, if possible, to do just that with PRUs as a more powerful, lower cost, better integrated setup.

The CPLD is the ideal thing to collect a few leftover logic bits and for experimenting.

I have used the coolrunners since a very long time. They would be considered

"mature" by now. In this $2,50 device, you get 64 flipflops and enough combinatorial

logic for

-generating the 1 MHz sample clock from the 100 MHz Xosc, 30 nsec wide pulses

just like the ADC loves it

- state machine to read out the ADC in a burst. The ADC wants to be left alone

in 2/3 of each cycle to avoid coupling dirt during conversion. In the last 1/3 you

have to hurry to get your data bits,

- 32 bit shift register to collect the SPI data

- 4*8 bit mux and interface to the BBB with 2 byte select lines set by the BBB

- data_available from ADC / ready to BBB handling and allowing the PRU

to bit bang the ADC for setting up filters, decimation rate etc.

The 3 channels would not fit together, one might use a Spartan or whatever finally.

But for first tests, it is fine. The CPLD is stamp-sized mini-board with core voltage

regulator, it remembers its programming and can be programmed in the usual way

via JTAG.

The other stamp is the LTC2500-32 ADC, with low noise LT3042 regulators for analog

& digital VCC, a negative regulator, LT6655 reference and the analog ADC driver.

That's all the hardware. With the BBB and its software it would be a complete

Fourier analyzer with cross correlation and things. BTW I could compile FFTW,

the fastest FFT in the West, just so. To talk with the analyzer, you just open port 5005

on 192.168.178.33 and dump GPIB-style commands. Just like my Agilent 89441A. :slight_smile:

Methinks that the BBB can have a SRAM-like 16 Bit bus interface. That would be

very interesting for FPGA device registers, FIFOs, DMA buffers and such.

Reasonably wide single cycle acesses, but I'm not sure what has to be given up for it,

let alone how to get positively rid of these competing features. And how to switch on

and place that memory window. I think it would cost a 16 bit cmos transceiver

and an address latch, but it might ease a lot of things.

cheers,

Gerhard

Auswahl_004www.jpg

On BBB one can configure a 17 bit unidirectional interface on PRU-0 (perhaps also bidirectional by run-time pinmuxing). The SD card slot has to be given up for it. Find details in section PRU fast GPIO 16 bit.

Regards