Ti's RPMsg Examples Significantly Changed

jkridner · June 16, 2016, 7:10pm

Nice request.

I’d suggest putting things into the am335x_pru_package on GitHub, but I know there are some issues in bringing back code into TI. I’d just suggest updating that same sort of package such that we can merge the deltas, but one place with a full experience.

Thoughts?

John_Syn · June 16, 2016, 7:42pm

Looking at am335x_pru_package, I see things like loaders which conflict with RemoteProc so I’m not sure that is such a good idea. Are you proposing to modify the TI examples to work with UIO_PRUSS? That would be a horrible idea as I have already described the limitations of UIO_PRUSS.

TI already have a set of examples and a step by step process on how to build and use these examples, but I’m talking about augmenting the documentation to make it easier to understand/use. I’m also proposing to extend the existing examples with real world examples that can serve as a template for developers. I’m not sure what Jason Reeder’s schedule looks like, but either he can assist with the development, or just monitor the development of the examples in such a way that he can take ownership of the examples.

Given that RPMSG/RemoteProc is to replace UIO-PRUSS, my only goal here is to get the community behind RPMSG/RemoteProc so that we are all pulling in the same direction. At the moment, developers are having a difficult time understanding the framework and this in large part is why we are seeing resistance to change.

Finally, developers have complained about poor performance, and perhaps we need investigate why they are seeing this. Perhaps there is something in the framework, or perhaps their implementation is wrong.

Regards,
John

tcmichals · June 16, 2016, 9:39pm

Also, would suggest looking at open-amp. There is ongoing work trying to create a rpmsg framework for multiple platforms, i.2 imx6sx (ARM9/M4), Zync/Xilinx

able to create larger rpmsg sizes
baremetal library
etc…

Jason_Reeder · June 16, 2016, 9:40pm

John,

Have you seen our PRU-ICSS landing page: http://processors.wiki.ti.com/index.php/PRU-ICSS
and also the Remoteproc/rpmsg sub page on that wiki: http://processors.wiki.ti.com/index.php/PRU-ICSS_Remoteproc_and_RPMsg

If so, let me know which parts are unclear/insufficient and I can work to improve those. Of course, all of the work and documentation on that wiki will be geared toward to the TI Processor SDK Linux distribution.

Keep in mind that the latest changes to the pru-software-support-package rpmsg examples are tightly coupled to the current work that Suman is doing on an upcoming 4.4 kernel from TI. So the latest examples are not going to work until the Linux drivers are updated to use interrupts instead of mailboxes as well, which is why I revved the major version of the package to v5.

I would love to see the pru-software-support-package and rpmsg pick up steam in the community. However, any work that would not benefit the TI Linux distribution directly will have to be done at home on my own time. I’m not opposed to that idea though as my Beaglebone Green Wireless just arrived in the mail this afternoon and I’ll be needing to get more familiar with the community distribution anyway.

Jason Reeder

jkridner · June 16, 2016, 9:42pm

The repository includes a number of documents, providing a bit of a one-stop-shop for PRU documentation. A migration guide from UIO_PRUSS to REMOTEPROC would seem reasonable to add. There’s also source to an assembler.

More responses below…

Looking at am335x_pru_package, I see things like loaders which conflict with RemoteProc so I’m not sure that is such a good idea. Are you proposing to modify the TI examples to work with UIO_PRUSS? That would be a horrible idea as I have already described the limitations of UIO_PRUSS.

No, I’m saying that the existing UIO_PRUSS examples should be made to continue to work on the latest kernels. This might require a way to transition control from kernel to userspace for the PRU control registers.

The inclusion of additional REMOTEPROC-only examples could illustrate why to migrate. It seems to me some of the PRU GSoC projects like BeagleLogic might also be suitable to integrate here, as we haven’t gotten them integrated into the standard images otherwise yet.

TI already have a set of examples and a step by step process on how to build and use these examples, but I’m talking about augmenting the documentation to make it easier to understand/use.

I agree that the documentation should be the focus. I thought the older round of documentation wasn’t so bad, but there is much more to document now with REMOTEPROC. I’m just saying that making the documentation comprehensive in once place would be ideal.

I’m also proposing to extend the existing examples with real world examples that can serve as a template for developers. I’m not sure what Jason Reeder’s schedule looks like, but either he can assist with the development, or just monitor the development of the examples in such a way that he can take ownership of the examples.

Monitoring the external development of extending these examples, including the GSoC work which is on-going, would be great. Not sure if he can commit to that.

I’ll just note again that that particular repository has external patches already and cannot be integrated back into a release by TI (at least not at any reasonable level of effort).

Given that RPMSG/RemoteProc is to replace UIO-PRUSS, my only goal here is to get the community behind RPMSG/RemoteProc so that we are all pulling in the same direction. At the moment, developers are having a difficult time understanding the framework and this in large part is why we are seeing resistance to change.

Agreed, which is why I want to meet them at the information source they are using today. You can tell by the reaction that it is an important resource and care should be extended to updating it. It seems to me the right place to try to prove the value of REMOTEPROC/RPMSG.

Finally, developers have complained about poor performance, and perhaps we need investigate why they are seeing this. Perhaps there is something in the framework, or perhaps their implementation is wrong.

Yeah, I’d like to know more details about that as well. I have a near-term need to migrate some UIO_PRUSS code (https://github.com/StrawsonDesign/Robotics_Cape_Installer/tree/master/install_files/pru) to REMOTEPROC and the author has some of this same concern.

John_Syn · June 16, 2016, 10:32pm

Regards,
John

Yeah, this is my main starting point, but I hadn’t noticed the PID motor control demo and that looks like an excellent example of how to use RPMSG/RemoteProc.

I don’t recall seeing this doc before which goes the the bigger point, the layout of the wiki is just horrible. Other than clicking on each and every link, better to layout the wiki in such a way that reflects the way a developer will learn each aspect of the framework. The layout of this document probably is a good place to start, with links for each block providing more details. For example, VRING, what is it, how do I use it, what are the limitations, etc. The section describing the resource table should include the layout and how it is used. Some developers want to use GCC and not TI’s proprietary C Compiler, so what needs to change to use GCC.

Another point, when I look at “lsmod”, I see pruss_remoteproc, virtio_rpmsg_bus and rpmsg_pru, not pruss_rproc and pruss as described in this document so an explanation of what changed and when is necessary.

For the most part, you have most of what you already need. It is just badly organized and the TI acronyms need to be explained. Also, I’ve seen several PRU presentation that I don’t see in this wiki. I’ll have to look for these links.

I hear from the UIO-PRUSS developers about how great their documentation is. Perhaps it isn’t the UIO-PRUSS doc that is good, but one of the libraries such as libpruio which lead the developer through an incremental learning curve. The examples are explained line by line. Maybe on of the UIO-PRUSS developers can direct us to what they consider excellent documentation.

Yeah, Suman made us aware of this issue, but the changes to the examples look pretty straight forward.

My thinking is it shouldn’t matter if we are using TI’s distribution or Robert Nelson’s Debian distribution. The framework should be the same and the examples should work on both distributions.

I want to thank you giving us your input and for all the hard work you have done on RPMSG/Remoteproc.

Regards,
John

Greg1 · June 16, 2016, 11:00pm

Hi Jason-

I’m confused and I hope you can clear up things a bit.

I’ve got the older version of the pru package which works with the mailbox.
I was working with this just last week, and it compiled and worked perfectly with the rpmsg device appearing in /dev.
This is the example (similar to lab 5) PRU_RPMsg_Echo_Interrupt0.

I just updated the kernel to beaglebone 4.4.12-ti-r31.

Unfortunately I did not record the former kernel in which everything ran OK.
Now it is still compiling OK, but the firmwares do not run, as seen in dmesg.

Anyway, the most interesting this is the appearance of a new loadable module.
Here is a partial listing from lsmod:

pru_rproc 12632 0
pruss_intc 7223 1 pru_rproc
pruss 9408 0

This is the first time I have seen pruss_intc.
Modinfo indicates that this is the work of Andrew F. Davis.
Is this at least a partial release of what you describe as
an “upcoming 4.4 kernel from TI”?

Regards,
Greg

Suman_Anna · June 16, 2016, 11:17pm

Hi Greg,

Yes, we have introduced pruss_intc new on 4.4 kernel and this module now manages the PRUSS INTC. It provides the irqchip/irqdomain which will allow client users to use standard DT properties for listing the PRU system events as interrupts and use standard Linux APIs. There is still some more work to be done there (ability to add system event to PRU channel mapping to host interrupt from DT) rather than having to provide that mapping data in firmware resource table, so the MPU-side clients can be cleanly separated and depend on Linux infrastructure alone.

I am not sure how much the kernel you are using has caught up to the changes I have been doing on my tree, but there are a few changes over the last week where we added and switched over to PRU system events instead of mailboxes for scalability purposes (mailboxes would work too provided you choose mailboxes in DT over interrupts). This is what Jason was referring to as v5.0.0.

Regards

Suman

John_Syn · June 16, 2016, 11:41pm

I think this is key. The layout should foster an incremental learning process, starting with a 10,000 foot overview and then encompassing the ability to drill down to learn the details with suitable examples.

From Jason Reeder’s response, it looks like we have his support. I think it is important that we try to maintain a single code base for both TI distribution and Robert Nelson’s Debian releases. This way Jason Reeder won’t have to do this work at home From a maintenance point of view I think this make the most sense also.

I’m sure there are many BeagleBoard developers are going to be very happy to hear this.

http://theembeddedkitchen.net/beaglelogic-goes-kernel-mode-with-pru-remoteprocweek-2-3/190

This show that libprussdrv was too slow and had to change to remoteproc to improve throughput, which is contrary to what the UIO-PRUSS developers are saying.

Regards,
John

John_Syn · June 16, 2016, 11:45pm

From what Jason Reeder was saying, you need to update to pru-software-support-package V5.

https://git.ti.com/pru-software-support-package

Regards,
John

Greg1 · June 17, 2016, 1:06am

Hi Suman, that confirms what I suspected about this new module pruss_intc.
I am going to continue to experiment with the old and new PRU package and see if I can determine the problem.
I think I need to look at the device tree I am using and see if it has the required properties.

Regards,
Greg

William_Hermans · June 17, 2016, 1:07am

The constant uninformed assertion that everything is faster if handled by userspace reflects on the struggles we’ve had to communicate the value of working in the kernel process.

I have not seen anyone making that claim in any of these posts. Everyone knows that kernel space is faster, or if they do not, they should.

Now if you mean people like me claiming that uio_pruss is faster. Well, just because UIO drivers are partly userspace drivers, it does not mean these drivers do not have a kernel side driver too.

Now where I’ve gotten my information about remoteproc being slower has been from post on this group, as well as posts on the web discussing other hardware. Which, did not mention uio_pruss per se ( as thats a beagle only driver that I’m awre of ).

William_Hermans · June 17, 2016, 1:19am

@ Jason Reeder

I have seen much of your documentation on the ti wiki pages, as I spent a week or two a bit at a time attempting to get something working to test remoteproc. Here, one could probably very easily duplicate exactly what you’ve done, and get exactly what you’ve demonstrated, working. The problem here is that this does not teach anyone anything. So, if I for example wanted to write host code using GCC instead of CCS. There is not enough information for anyone to make this happen easily. WITHOUT digging into the source code, or pouring over what little information there is on the web about remoteproc. Which by the way most of that outside information is irrelevant because they do not have PRUs.

So, I stopped attempting to test remoteproc, because there is not enough good documentation on the subject. exact step guides are useless if all you really need to know exactly what needs doing for this to work. How does one write a PRU config( hex ) files? What are the purpose of these files, and what is a minimal example. Which drivers are needed ? Where is the API documentation ?

You all need to make this dead simple to setup, not matter where a developer is coming from. Otherwise you’re going to end up with a bunch of very experience pissed off developers, who do not even want to bother with remoteproc. This means, that exact step guides for CCS only will not cut it.

William_Hermans · June 17, 2016, 1:23am

Also, for the record, I was very easily able to get the uio_pruss examples working effortlessly using gcc from an Debian Wheezy x86 command line. Dead simple.

RobertCNelson · June 17, 2016, 1:50am

As of today, (4.4.12-ti-r31) we are sync'ed up with:

http://git.ti.com/gitweb/?p=ti-linux-kernel/ti-linux-kernel.git;a=shortlog;h=refs/heads/ti-rt-linux-4.4.y

http://git.ti.com/gitweb/?p=ti-linux-kernel/ti-linux-kernel.git;a=commit;h=39452940d0e4710677b7e50638885cc27d2ce70c

Which includes the mailboxes -> pru system events replacmement:

http://git.ti.com/gitweb/?p=ti-linux-kernel/ti-linux-kernel.git;a=commit;h=4cc71c5a52a1a9631a11b22524b2455baa3956ed

Regards,

John_Syn · June 17, 2016, 3:07am

Hi William,

I think it would be helpful for Jason to see the pruss-uio docs you think are well written. I don’t want to provide Jason with a list of complaints, but rather a list of helpful suggestions that might guide him to a better solution.

Regards,
John

John_Syn · June 17, 2016, 3:25am

Hi Jason Reeder,

I think William forgot to include you in his response. As you can see, William is an experienced developer who had difficulty understanding the RPMSG/Remoteproc framework. A lot of this I believe is TI’s tendency to use terminology assuming the reader is familiar with and in most cases we are not. I agree with William that canned examples are no substitute for a document that explains how something works. For example, what are VRings. Now I know that they are virtual ring buffers, but what are the interfaces, how do they work, what are their limitations, how responsive are they, throughput, etc. Same for Virtio. How does it work, how are drivers abstracted, what is the messaging format, etc. At each level, an explanation is necessary and as William indicated, a good API doc is also necessary.

I’ve seen this requirement to use GCC in place of CCS several times so an explanation on how to use GCC would be helpful.

I have asked William to post links to good pruss_uio docs that might serve as a guide.

Regards,
John

DTJF · June 17, 2016, 2:17pm

@John Syne:
Correct, remoteproc is stabil since month. Stabil in the point that it isn’t usable. And that’s why it is and it should be experimental. And experimental features shouldn’t polute the main stream images!

@Jason Reeder and Suman Anna:
Thanks for joining that discussion and for sharing your project. You defined big targets, unfortunatelly you forget about the basics. Following your current concept, prussdrv can never get replaced by your solution.

One reason is execution speed. It might be suitable for BeagleLogic, which uses minimal communication between ARM and PRUs before and after the measurement, in non-time critical situations. In contrast, my project libpruio is designed to work in the main controller loop. Everyting is time critical here. Therefor I use a messaging system simmilar to the one in RPMsg, but highly speed optimized. Just one example: In order to send a message from ARM to PRU, if I’d switch to RPMsg, I’d have to use function pru_rpmsg_send() for that purpose. Just the preparation of that function call (five parameters on stack) needs five times more CPU cycles than my solution. Additional CPU cycles are consumed in the kernel code and furthermore on the PRU side, before the message arrives. Not worth discussing.

A second point is the firmware load. Do you realy want to force users to use CCS and the Processor SDK (m$ habits on an open source comminity?). The PRU (and the other subsystems you target like DSP, …) are made for high speed tasks. My prefered language in that case is assembler, and I’m not alone in thinking that. Your solution needs a feature to load assembler generated firmware, if you still target to replace UIO_PRUSS anytime. And it’s out of question to remove and reload the kernel driver for firmware updates. What if one PRU should run while updating the firmware on the other? Furthermore it needs a feature to reload firmware with user privileges!

The next point is the messaging system and its big memory consumption. What if an application doesn’t need it and wouldn’t use it? Currently I cant find any feature for high speed data exchange between ARM and PRU via DRam or SRam memory.

In short, if you want to fulfill the expectations Jason Kridner or John Syne spend on your project (replacing UIO_PRUSS), you have to redo your concept and start from scratch.

Your system has to be scalable, starting at a feature set (and resource consumption) similar to UIO_PRUSS (load-start-stop firmware, bidirectional access DRam-IRam-SRam-INTC). When firmware contains the .resource_table section, then load your additional drivers and do all your remoteproc magic. But otherwise just load and start the firmware, without consuming additional resources. Let the user choise his development tools. Do not consume any resources until you get firmware to load.

@Jason Kridner: Just to make it clear, when you continue supporting the replacement of UIO_PRUSS by remoteproc, then in long term all high perfomance PRU projects like libpruio will die!

John_Syn · June 17, 2016, 5:43pm

@John Syne:
Correct, remoteproc is stabil since month. Stabil in the point that it isn’t usable. And that’s why it is and it should be experimental. And experimental features shouldn’t polute the main stream images!

I don’t agree, remoteproc framework has undergone changes, but these changes didn’t result in significant changes to the example code so user code was easily updated when upgrading to a newer version of remoteproc. With your thinking, Devicetree shouldn’t be in mainline either.

@Jason Reeder and Suman Anna:
Thanks for joining that discussion and for sharing your project. You defined big targets, unfortunatelly you forget about the basics. Following your current concept, prussdrv can never get replaced by your solution.

One reason is execution speed. It might be suitable for BeagleLogic, which uses minimal communication between ARM and PRUs before and after the measurement, in non-time critical situations. In contrast, my project libpruio is designed to work in the main controller loop. Everyting is time critical here. Therefor I use a messaging system simmilar to the one in RPMsg, but highly speed optimized. Just one example: In order to send a message from ARM to PRU, if I’d switch to RPMsg, I’d have to use function pru_rpmsg_send() for that purpose. Just the preparation of that function call (five parameters on stack) needs five times more CPU cycles than my solution. Additional CPU cycles are consumed in the kernel code and furthermore on the PRU side, before the message arrives. Not worth discussing.

I believe your use case is a special one and may not be served by remoteproc. Communicating between PRU and ARM in the main control loop seems odd. Normally, the tight control loop runs on one PRU, dumping to shared memory and the other PRU handles the communications between PRU and ARM. Why doesn’t this work for you?

A second point is the firmware load. Do you realy want to force users to use CCS and the Processor SDK (m$ habits on an open source comminity?). The PRU (and the other subsystems you target like DSP, …) are made for high speed tasks. My prefered language in that case is assembler, and I’m not alone in thinking that. Your solution needs a feature to load assembler generated firmware, if you still target to replace UIO_PRUSS anytime. And it’s out of question to remove and reload the kernel driver for firmware updates. What if one PRU should run while updating the firmware on the other? Furthermore it needs a feature to reload firmware with user privileges!

Here I agree with you. It should definitely be possible to add and remove firmware at runtime. Given that remoteproc can load, start and stop firmware, why not enable a feature for a user with the correct group permissions to load/reload new firmware at runtime? This might be done via a debugfs interface.

Also, your point about assembler support is quite valid and I would suggest that an assembler template/example be created for remoteproc. Alternatively, you can always start with a C version and then optimize the assembler output.

The next point is the messaging system and its big memory consumption. What if an application doesn’t need it and wouldn’t use it? Currently I cant find any feature for high speed data exchange between ARM and PRU via DRam or SRam memory.

In short, if you want to fulfill the expectations Jason Kridner or John Syne spend on your project (replacing UIO_PRUSS), you have to redo your concept and start from scratch.

Not true. I think the main framework is sound even if it does not suite your strange use case. Perhaps there is a way to augment the framework with a quick message transfer.

Just some thoughts from my side. In any case, thank you for your valuable input which I hope will guide Jason and Suman in their future development.

Regards,
John

Suman_Anna · June 17, 2016, 9:22pm

Hi TJF,

My responses inlined…

Regards

Suman