Fastest way to transfer data from external device to DDR3

Fishayyy · March 17, 2022, 12:12am

Sorry in advance if this question doesn’t make sense or is poorly worded. I’m fairly new to embedded Linux and working with Beaglebone in general, but I started my first professional job as an embedded software engineer at a small company and my background from school was strictly in software engineering.

I’m working on a project that has a separate I/O board that is sending ADC counts from an I/O board to a Beaglebone Black. Currently the main Linux core has been handling the receiving of the ADC counts over an SPI connection, converting them into voltage and current values, and plotting them on an LCD screen to give our system a basic oscilloscope function. This process bogs down the main processor so I’ve been looking into ways to make this more efficient. My current idea is to make the board sending the ADC counts handle all the conversions and simply pass the converted values using UART instead of SPI.

My question is, what would be the most efficient way to get these values from the other board into DDR3 RAM so that the Beaglebone can focus on simply reading the data from RAM and plotting it on the screen? Should I be trying to use the DMA to accomplish this? Or should this data be sent to the PRU and have the PRU place the values into DDR3 memory? I believe we need to transfer about 60kB of data for one frame to be drawn. Hopefully this makes sense, but I can try to further clarify if it doesn’t

jcdammeyer · March 17, 2022, 2:25am

I’ve moved data rapidly from a PIC32 gathering CAN bus messages into a Pi via SPI at 10Mbps. In my case the data needed to be time stamped and queued until the Pi booted and that took up to 20 seconds. Once the Pi was up and the clock was set via GPS, it started pulling in the data from the PIC32. Each second of data was a linked list and each message had a relative time stamp from 0. As the data was brought into the Pi these relative time stamps were adjusted to real TOD before another task stored them into a file.

The approach to doing this is to create queues or what are sometimes called ping pong buffers. Data from SPI is acquired via DMA into Buffer #1. While that’s happening the acquired data in Buffer #2 is processed to make it ready for screen display.

You can do that using the canvas principal where you don’t actually write to the screen but instead to an image that would be put on the screen. Once Buffer #2 is processed a signal is passed on to the next task or thread that takes care of displaying the information. This isolates the A/D from the transfer from the processing from the display side. This must take less time than filling Buffer #1.

If not you’ll need more buffers but ultimately you need to be able to empty them faster than you fill them.

Once Buffer #1 is full, the code dealing with filling it is pointed to Buffer #2 and Buffer #1 is marked as the one to post process. And so on.

The point to do it this way is you don’t even need the hardware to test large amounts of the software. You can substitute the SPI+DMA code that fills the buffer with something that creates a fake buffer with a particular recognizable wave form data. So now the rest of the code just gets the signal that the buffer is ready and swaps it. (hence ping pong). Now you can do all the post processing and display.

How long that takes (tickle an output bit and measure the time) will give you an idea if you’re A/D to SPI is fast enough. With all that working, if you have the hardware and ability to program a module, you can now simulate the next step which is create the artificial data on the external board and transfer to the BBB via that SPI+DMA link. If the data correctly ends up looking the way it’s supposed to on the screen then you can finally start using real data.

John

Fishayyy · March 17, 2022, 3:01am

I think for our purposes we want to try to use UART since we want to try to send data directly to the PRU so that we can off load as much work from the ARM processor as possible. The more that I look around the more that it seems like we should be trying to use the L3 interconnect to pass the data to a buffer in the DDR3 RAM. From there I’m not sure if the ARM processor should try to read out the data from the the DDR3 directly or if we should be trying to move data from DDR3 to the ARM’s internal 64KB with the EDMA. We don’t really need the timestamps like you did, but the main goal for us would be to offload as much work from the ARM processor as possible and to reduce the latencies for data transfer between memory regions as much as possible.

I did find some good info here: Fastes way to copy large amounts of data from pru to arm