Remote Proc and examples to fill PRU data memory, store/read data in PRU shared memory

Paul_McManus · September 13, 2016, 2:22pm

Hi,
I am investigating the beagleboard black PRUs at the moment for data acquisition and/or transmission. For context I’ll explain what I want to do. I want to use a 16 bit ADC/DAC at 100KSps. It is a half duplex system, so I can use both PRUs to receive, then load new firmware and transmit. Starting the transmission must be to 1 us accuracy. Hopefully that will explain what I am trying to do.

I am using the PRU Software package 4.0.2. My kernel is Linux Beaglebone 4.4.9-ti-r25. I have the CCSv6 environment setup and I can build and run examples on the PRU, and I have built a user space example (Pru Lab 6 user space) and it is working fine sending strings to and from both PRUs using RpMsg. Great so far…

I would like to extend the User space example to allow me to fill PRU memory (basically Sine wave sample data to use as a carrier for modulation) , or write directly from user space any samples I want to transmit. Also any examples in user space using the pruss_intc.ko to send/receive interrupts from the PRUs would be good.

The data I want to fill into the PRU will fill most of the data memory in the PRU, I don’t want to have to load it up piece by piece through RpMsg, which looks to be maximum 512 bytes per transfer. I am hoping to trigger commands through RpMsg for the PRU to read/write data to shared or ARM DDR memory. There are plenty of examples on the PRU side to read/write to shared or DRR memory, so i should be able to plod through the examples to create my code for the PRU.

However I can’t see any example on the ARM side. Is it possible for a user space program to read/write to shared memory ? Or allocate a section of DDR memory for the each PRU to write to that nothing else will touch ?

Also once the ARM is through with writing to shared or DRR memory the PRU generate a ARM Host interrupt (EVOUT1 to EVOUT7). Does pruss_intc.kp map these interrupts to user space, or allow a callback to be added ?

I know a lot of questions, apologies if they are basic, new to this.

Regards,
Paul

Christopher_Hopwood · September 13, 2016, 5:09pm

Hi Paul,

You may wish to see some code I wrote for a quadcopter project in college. One of the pieces of code used the PRU and shared DDR memory to transfer images from a camera to the ARM.

https://github.com/Rose-Hulman-ROBO4xx/1314-BeagleBone-Quadcopter/tree/master_rev2/code/ControlAlgorithm/quadcopter_apps/camera

This may be out of date information however. It has been a while since I used PRUs. But hopefully studying my code will be enough to get you started!

Thanks,
Chris

Christopher_Hopwood · September 13, 2016, 5:13pm

In addition, you will need to do the following to allocate some DDR memory for the PRU:
modprobe uio_pruss extram_pool_sz=0x160000
Keep in mind that since the PRU and ARM will now be sharing the bus, it may slow down the system. Also I don’t believe DDR memory access is guaranteed to be deterministic like the other PRU commands.

Christopher_Hopwood · September 13, 2016, 5:23pm

I found this site by googling “accessing PRU ddr shared memory”:

http://credentiality2.blogspot.com/2015/09/beaglebone-pru-ddr-memory-access.html

It seems pretty straightforward.

Paul_McManus · September 13, 2016, 5:26pm

Hi,
Thanks Chris, it looks good if I use the prussdrv pruss_io setup. It looks like TI are pushing using rp_rpoc now instead of pruss_io, there seems to be an argument going on whether the pruss_io should be improved, or replaced with Remote Proc/Rpmsg . The kernel I have uses RpMsg/Remote Proc kernel objects.

… However if I am not having any joy with this I will try to rebuild the kernel to use prussdrv and useyour example as a guide. I’ll struggle on for a day or so, it not I’ll move to prussdrv. It looks as if you are doing what I need.

Thanks again for replying.
Cheers,
Paul

Greg1 · September 13, 2016, 10:27pm

Hi Paul-

I’m using the RPMsg character device to transfer 16 bit data PRU to ARM at a rate of 8ksps. This is a low rate compared to your requirement, however, I am not seeing significant ARM processor loading at this rate. I don’t know the practical upper limit of RPMsg as deployed in the PRU examples, but perhaps 100ksps is not out of the question.

My impression is that the kernel programmers have a very good tool box for efficient handling of data, and I assume the RPMsg took advantage of these tools.

My present scheme is to use one character device as a data stream, and another for PRU control functions. I’m not deep into it yet, so I can’t comment on the practicality of this scheme with the remoteproc/RPMsg framework.

Regards,
Greg

William_Hermans · September 14, 2016, 5:35am

From userland, and using /dev/mem + mmap() it is possible to get ~3MB/second worth of samples from the on die am335x processors ADC. Granted, many of these samples are redundant, but I wrote a C application to do this, just to see how much such an application could handle.

Theoretically, the PRU’s should be capable of much more, since the code to read from an ADC would not be loading the am335x processor. I do not recall how much this application I experimented with was loading down the am335x processor under Linux, but I am wanting to say it was pretty much maxed out.

William_Hermans · September 14, 2016, 5:45am

Perhaps this will also help ?

http://processors.wiki.ti.com/index.php/AM335x_PRU_Read_Latencies

William_Hermans · September 14, 2016, 5:56am

Also, the on die ADC is capable of 200Ksps. So the am335x on die ADC should be able to handle your samples needed easily. As for your 1 us latency expectation . . . a single PRU instruction cycle is 5ns. So we’re talking what ? 200 total cycles to play with ? It could be tight getting a sample form the ADC into memory, depending.

Paul_McManus · September 14, 2016, 9:12am

Hi,
Thanks for everyone who has got in touch. I have a few ideas now. I’ll give /dev/mem and mmap() a go since I know the absolute addresses of the memories I want to read write from, thanks William.

Greg, yes I am worried about how fast I can get RpMsg to work to spit out the ADC data, or to pass samples in to transmit. It looks a lot of overhead. For more info on what I am doing, or trying to do atleast.

If I timestamp the incoming data using the IEP timer at 1MHz, yes so I will have 200 instruction cycles (theoretically) to play with between samples. I want a free running counter (to use as my data timestamp) stored in shared memory for access for the ARM and the other PRU. I have another precision signal a 1 pulse per minute to track drift in the internal clock. This will come in as a GPIO rising pulse. I will timestamp it when it is seen and this will allow me to measure samples inbetween my 1 sec precision clock.

Every 10 polls will give me the sample clock (really the SPI rate from the chip). I will feed this out via GPO to an ADC chip and use it as an SPI clock to clock the data in over SPI. Then use the C28 pointer in the PRU to write the data into DDR together with the timestamp of the received data , or shared memory (not sure where is best yet for me). Trigger a Host interrupt on R31 to warn the ARM there is new data to process. The RpMsg will be used for configuration, turning the read off/on and halting the PRU.

Chris , yes looking at what I want to do uio_pruss (Prussdrv) seems to give me that flexibility, however it looks like it is being dropped in favour of RpMsg.

Also I am looking at building, then modifying the PRU Cape Demo example. I have some linker errors at the moment I am trying to fix, I have probably missed some detail in the build instructions below.

http://processors.wiki.ti.com/index.php/PRU_Cape_Getting_Started_Guide
http://processors.wiki.ti.com/index.php/PRU_Cape:_Building_Demos

They don’t seem to use prussDrv ( I dont have anything against PrussDrv, it seems that TI are moving away from it to RemoteProc/RPMsg). The PRU Cape example seems to be mostly deferenced pointers to PRU registers and memory so I was also going to give that a try.

I need to work on something else for a few days. However i’ll keep the topic going and record my progress (or otherwise) incase anyone else will find it useful in the future.

Best Regards,
Paul

Paul_McManus · September 14, 2016, 10:12am

Hi,
Thanks to everyone who replied.

William, yes I would use the PRU IEP timer on one of the PRUs to generate 100kHz timer for time stamping data. It gives me 2000 cycles theoretically to play with. If the main loop takes longer I could always poll multiple times for the interrupt. Every interrupt I will generate a sample SPI transfer clock via gpo to go to an ADC. Every 100kHz interrupt I will increment a counter in shared memory. When I clock the data in I will read the 100KHz timer value, and store both the data, the counter and the 100 KHz current timer value in DDR memory. I also have a one second precision timer input, I will store the counter value, and the current 100 KHz timer value in shared memory using the C28 pointer on the PRU. The one second precision timer will be used to track drift in the clock over time. The timer/sample can run for months. I will look at the /dev/mem mmap option.

I am also looking at the PRU Cape example. Struggling to build because of a linker issue but I’ll plod on. It doesn’t use prussDrv it seems. It has memory mapped registers it use to control the PRU config. It uses the starterware libraries, which are also old.

Chris, yes uio_pruss/prussdrv examples seem to give me better control than Remote Proc/RpMsg, I am thinking of trying to rebuild the kernel with the pruss_io kernel objects instead, then use examples like yours. It seems there are lots. I just can’t see many examples using RemoteProc/RpMsg.

Greg, yes I am concerned about RpMsg being fast enough to transfer the data. I think getting the C28 pointer on the PRU to spit the data into DDR memory would be faster. Then getting a user space program to mmap() /dev/mem to read the data. I want to generate an R31 host interrupt to tell the Arm data is in DDR memory but I can’t see how I can see this interrupt in user space. If I used the older uiopruss/prussdrv there are plenty of examples, I can’t see where any Remote Proc/rp_msg examples are.

I would probably prefer going to uio_pruss but I am concerned that it will not be supported as the newer images on Beaglebone Black use rp_msg. Confused…

I need to work on something else for a few days but I will track my progress, or lack of it here incase it is useful for other people in the future.
Best Regards,
Paul

DTJF · September 14, 2016, 3:26pm

Hi Paul!

Christopher_Hopwood · September 14, 2016, 5:49pm

I second TJF. It may end up being a case like this: https://xkcd.com/927/

See if RpMSG fits your needs, if it doesn’t, try using pruss…

William_Hermans · September 15, 2016, 12:41am

Hi Paul,

I would advise against doing any “production” code using /dev/mem + mmap(). For the simple reason that you’ll have latency introduced that will not be predictable. For your purposes. However, with that said, it would be ok for “test code” where timing is not too important.

I would also agree with TJF, and the information from the link he gave you will work. I’ve tested it personally. So . . . here is an example someone wrote back in . . .2014 looks like: https://groups.google.com/forum/#!msg/beagleboard/0a4tszlq2y0/SQ-Vwyr9A_AJ

Youngtae Jo has all the code and steps he took to make everything work. I have not personally reviewed or used the code, but it should be at least a good read to get an idea of what needs doing.

Paul_McManus · September 16, 2016, 10:26am

Hi,
Thanks for the link , updated kernel to latest , 4.4.20-ti-r44. Followed the instructions and the uio kernel objects are loaded. Just working through examples now. It will be easier to use uio for my application.
Cheers,
Paul

David_Edwards · January 15, 2018, 5:44pm

Paul,

Any updates on how this project turned out? I’m looking at doing something similar:

Use PRU0 as a SPI master to capture external 16-bit ADC data at 500kS/s
Store around 30MB of samples in DDR
Change PRU0 firmware to be a SPI slave to transfer these samples to an external host with a SPI clock rate in the neighborhood of 40MHz (~2.5MB/s)

Thanks,

Dave