Ti's RPMsg Examples Significantly Changed

Hi TJF,

My responses inlined..

Regards
Suman

From: John Syne [mailto:john3909@gmail.com]
Sent: Friday, June 17, 2016 12:44 PM
To: beagleboard@googlegroups.com <mailto:beagleboard@googlegroups.com>
Cc: Reeder, Jason; jkridner@beagleboard.org <mailto:jkridner@beagleboard.org>; Anna, Suman
Subject: Re: [beagleboard] Ti's RPMsg Examples Significantly Changed

@John Syne:
Correct, remoteproc is stabil since month. Stabil in the point that it isn't usable. And that's why it is and it should be experimental. And experimental features shouldn't polute the main stream images!
I don’t agree, remoteproc framework has undergone changes, but these changes didn’t result in significant changes to the example code so user code was easily updated when upgrading to a newer version of remoteproc. With your thinking, Devicetree shouldn’t be in mainline either.

[Suman: We did make some core improvements/changes between our 4.1 kernel and 4.4 kernel trees (like pruss-intc), and we will be continuing on improving things. The same firmware images should work between 4.1 and 4.4 kernels provided you have similar DTS entries. The only disruption should be in the last week because of the switch from mailboxes to PRU events. You modify DTS to revert the patch that switches over, the old firmware should still work fine.]

@Jason Reeder and Suman Anna:
Thanks for joining that discussion and for sharing your project. You defined big targets, unfortunatelly you forget about the basics. Following your current concept, prussdrv can never get replaced by your solution.

One reason is execution speed. It might be suitable for BeagleLogic, which uses minimal communication between ARM and PRUs before and after the measurement, in non-time critical situations. In contrast, my project libpruio is designed to work in the main controller loop. Everyting is time critical here. Therefor I use a messaging system simmilar to the one in RPMsg, but highly speed optimized.

[Suman: Where do you use this from, I assume userspace? Using what - Shared DRAM or regular PRU DRAM. ]

Just one example: In order to send a message from ARM to PRU, if I'd switch to RPMsg, I'd have to use function pru_rpmsg_send() for that purpose. Just the preparation of that function call (five parameters on stack) needs five times more CPU cycles than my solution. Additional CPU cycles are consumed in the kernel code and furthermore on the PRU side, before the message arrives. Not worth discussing.
I believe your use case is a special one and may not be served by remoteproc. Communicating between PRU and ARM in the main control loop seems odd. Normally, the tight control loop runs on one PRU, dumping to shared memory and the other PRU handles the communications between PRU and ARM. Why doesn’t this work for you?

A second point is the firmware load. Do you realy want to force users to use CCS and the Processor SDK (m$ habits on an open source comminity?). The PRU (and the other subsystems you target like DSP, ...) are made for high speed tasks. My prefered language in that case is assembler, and I'm not alone in thinking that. Your solution needs a feature to load assembler generated firmware, if you still target to replace UIO_PRUSS anytime.

[Suman: I am not sure why you think CCS is needed for remoteproc. In fact, I don’t use CCS myself, and have added the single_step in debugfs so that I don’t need to connect a JTAG. It still needs couple more improvements (like ability to set PC) and then one can inspect PRU state using command line alone. As for CCS, I am sure there will be other set of users/customers/beginners who would want to use an IDE in getting up-to-speed as a stand-alone development. ]

I use CCS with JTAG and it works for me :wink:

And it's out of question to remove and reload the kernel driver for firmware updates. What if one PRU should run while updating the firmware on the other?

[Suman: You don’t need to unload and reload the driver, the individual PRUs can be rebooted using sysfs bind and unbind without affecting the other PRU.]

This I did not know. That is excellent.

Furthermore it needs a feature to reload firmware with user privileges!
Here I agree with you. It should definitely be possible to add and remove firmware at runtime. Given that remoteproc can load, start and stop firmware, why not enable a feature for a user with the correct group permissions to load/reload new firmware at runtime? This might be done via a debugfs interface.

[Suman: This is a feature missing even from the remoteproc core, and setting a new firmware name is definitely on my todo list and internal TI project need as well. The boot/shutdown etc is already added to remoteproc framework in latest upstream as a debugfs interface, but for production, it should probably be converted to a sysfs interface, depending on the individual platform remoteproc driver’s preference.]

That will be great.

Also, your point about assembler support is quite valid and I would suggest that an assembler template/example be created for remoteproc. Alternatively, you can always start with a C version and then optimize the assembler output.

[Suman: As for remoteproc driver, it really doesn’t matter what assembler/compiler is used for building a firmware image. Remoteproc core is standardized on ELF, and there is no reason to invent another format. As it is, the plain binary format is a problem for PRUSS since its address space is not unified (Address 0 for both IRAM and local DRAM). I don’t recall how prussdrv is managing this, or if they only deal with IRAM from the binary and leave the data sections for the application to copy over the interfaces to use Data RAMs. At the moment, all you would need is a small script to convert your binary into an ELF using objcopy to use with PRU remoteproc driver.]

I think what everyone was asking for is an example to show how this might be done. It seems like other than the ELF code, there is a resource table which is appended. How is this done? What is the format? Perhaps Jason can add something like this to his examples.

The next point is the messaging system and its big memory consumption. What if an application doesn't need it and wouldn't use it? Currently I cant find any feature for high speed data exchange between ARM and PRU via DRam or SRam memory.
In short, if you want to fulfill the expectations Jason Kridner or John Syne spend on your project (replacing UIO_PRUSS), you have to redo your concept and start from scratch.
Not true. I think the main framework is sound even if it does not suite your strange use case. Perhaps there is a way to augment the framework with a quick message transfer.

[Suman: Right, the basic infrastructure is already there and you would need only to build upon it. For eg., at the moment, all you would need is a small kernel module, use pruss_request_mem_region to gain ownership of PRU memories, export it to userspace and use it however you want. The interrupts would have to go through kernel though. Maybe it is the same module that exports the above desired sysfs interfaces too.].

This is why this discussion is so helpful. Once again, I would ask that Jason add something like this to his examples.

Just some thoughts from my side. In any case, thank you for your valuable input which I hope will guide Jason and Suman in their future development.

[Suman: Thanks for sharing all your feedback and concerns. The userspace usecases are not lost on me, and we will continue to close the gaps.]

For some time I have been defending RPMSG/Remoteproc, but several PRUSS_UIO users complained that TI’s wasn’t supporting their framework, had no documentation and the samples sucked. I hope this discussion has put all that behind us. Clearly we can see that TI is actively working on this framework, and Jason Reeder is working on cleaning up the docs/examples. Hopefully in the not to distant future, we should have a framework that will satisfy most of the needs expressed here.

I just like the idea that I can use the same framework for communication between ARMv7, DSP, CortexM4, PRU, etc.

Thank you again for all your hard work.

Regards,
John

Hi TJF,

My responses inlined..

Regards
Suman

Greg,

A quick test would be to revert the following patch (assuming you are using a BeagleBone variant) in your tree, if you continue to use your old firmware

https://github.com/beagleboard/linux/commit/620cab817e7a06065452d4c119c3066061ad5f03

regards

Suman

Hi Suman, thanks for your statements.

[Suman: Where do you use this from, I assume userspace? Using what - Shared DRAM or regular PRU DRAM.]

Yes, libpruio is a userspace library. I’m not familiar with your terminologie: Shared DRAM = SRam (12k@0x10000) and PRU DRAM = DRam (2x8k@[0x0|0x2000]), right? I use the later (similar to your solution).

[Suman: You don’t need to unload and reload the driver, the individual PRUs can be rebooted using sysfs bind and unbind without affecting the other PRU.]

Reloading the kernel module for firmware updates is I what read several times at this forum. I didn’t test yet. Sorry, if I confuse things here.

John asked alreay: Where can we find a tutorial / example on updating firmware the correct way? A further question: often I reload just a part of the IRam. I guess your concept doesn’t support that?

[Suman: As for remoteproc driver, it really doesn’t matter what assembler/compiler is used for building a firmware image. Remoteproc core is standardized on ELF, and there is no reason to invent another format. As it is, the plain binary format is a problem for PRUSS since its address space is not unified (Address 0 for both IRAM and local DRAM). I don’t recall how prussdrv is managing this, or if they only deal with IRAM from the binary and leave the data sections for the application to copy over the interfaces to use Data RAMs. At the moment, all you would need is a small script to convert your binary into an ELF using objcopy to use with PRU remoteproc driver.]

Seconding John: Where can we find a tutorial / example on how to load a fimware image cerated by the pasm assembler?

[Suman: Right, the basic infrastructure is already there and you would need only to build upon it. For eg., at the moment, all you would need is a small kernel module, use pruss_request_mem_region to gain ownership of PRU memories, export it to userspace and use it however you want. The interrupts would have to go through kernel though. Maybe it is the same module that exports the above desired sysfs interfaces too.]

Am I reading right here? My message is:

  • Reduce resource consumption on a small SoC.
    And your advice is:

  • Add a further module to a bloated framework.
    Are you joking?

Full sure, it’s not only me who’d ask: Why should I spend time to extend a resource consuming framework that changes every now and then – why should I start to follow a moving target, just to end up with a feature that I already have, working reliable since years? And why should hundred other developers do the same?

It has been said already. From the BBB point of view your project has nothing to offer. There are no MPU, DSP, … processors. It’s just about ARM and PRU. Your IPC is too slow. The complete topic is just about replacing a “hack” by a “real Linux driver”. And this isn’t worth to spend four times more memory and additional boot time and additional development man power, just to get what we have already.

When you aim to develop software for the main stream, replacing a proven solution, than YOU have to care about existing standards and start from that point.'This means redo your concept.

Otherwise, when you want to retain with your extisting infrastructure, it’s OK as well. But:

  • Make it optional. Place it in a separate package. Leave the main stream!
    Let the users choise if they want to add and use your additional magic and if they want to spend their resources on that. And do not block TIs kernel development nor the BBB PRUSS development with your experimental project.

Please note: the PRUSS are an important topic. They’re a major advantage the BBB has over its direct competitors, like RPi. When your project complicates and slows down the PRUSS development, this endangers the complete Beaglebone project.

TJF, and I rarely see things in a similar light, but I have to agree with him completely concerning his last post. Granted, I can’t help but feel he is echoing some of my past concerns as well as adding his own.

One thing I am seeing here though is that maybe remoteproc for some. may never replace uio_pruss. But I’m seeing potential with using remoteproc as a Linux IPC mechanism similar to how named pipes, shared memory, and shared ports are used for the same. Except, using the PRU in this manner would offer some level of hardware offload benefit.

Granted, the same could be done using uio_pruss.

Also, why do I get the feeling that UIO is looked down on by several developers here ? I see nothing wrong with it. Only that on the userspace side interrupts are a bit of a hack. But would anyone care to tell me what /proc/interrupts, and /proc/irq/* are for ? I’ll give you a hint. I already have a pretty good idea . . .

Revised PRU examples working good when I created a new SD card with your latest and greatest fantastic scripts and guidance:

https://eewiki.net/display/linuxonarm/BeagleBone

Long term 4.4x

I was not successful with the kernel upgrade script to 4.4.12-ti-r31.
It’s like the firmwares are being ignored, even the PRU_Halt, which is almost nothing.
I tried both old and new PRU support packages. No difference.

Anyway, I like the fresh start with the new card, so I am not looking back.

Regards,
Greg

Hi TJF,

Hi Suman, thanks for your statements.

[Suman: Where do you use this from, I assume userspace? Using what - Shared

DRAM or regular PRU DRAM.]

Yes, libpruio is a userspace library. I'm not familiar with your
terminologie: Shared DRAM = SRam (12k@0x10000) and PRU DRAM = DRam
(2x8k@[0x0|0x2000]), right? I use the later (similar to your solution).

Yeah, same ones. I don't use SRAM for PRU Shared Data RAM, as I use it
to mean the regular on-chip memory.

[Suman: You don’t need to unload and reload the driver, the individual PRUs

can be rebooted using sysfs bind and unbind without affecting the other
PRU.]

Reloading the kernel module for firmware updates is I what read several
times at this forum. I didn't test yet. Sorry, if I confuse things here.

Yeah, it has come to my notice that this was being suggested previously
on TI forum threads not only for PRUs but for DSPs or IPUs on other SoCs
as well. The bind/unbind does give specific control for a core.

John asked alreay: Where can we find a tutorial / example on updating
firmware the correct way? A further question: often I reload just a part of
the IRam. I guess your concept doesn't support that?

Why, it is supported. There are no distinctions made between IRAM,
DRAM0/1 or Shared DRAM. The ELF loader just goes by the program headers,
so if the new image only has a program header that loads into IRAM, then
it just loads that.

[Suman: As for remoteproc driver, it really doesn’t matter what

assembler/compiler is used for building a firmware image. Remoteproc core
is standardized on ELF, and there is no reason to invent another format. As
it is, the plain binary format is a problem for PRUSS since its address
space is not unified (Address 0 for both IRAM and local DRAM). I don’t
recall how prussdrv is managing this, or if they only deal with IRAM from
the binary and leave the data sections for the application to copy over the
interfaces to use Data RAMs. At the moment, all you would need is a small
script to convert your binary into an ELF using objcopy to use with PRU
remoteproc driver.]

Seconding John: Where can we find a tutorial / example on how to load a
fimware image cerated by the pasm assembler?

I don't know if pasm assembler is still a supported tool by TI. If so, I
will work with Jason Reeder to try to get something documented on the TI
wikis or in PRU Software Support documentation.

Anyway, here's some basic steps (thanks to a colleague) in converting a
bin file to an ELF image,

1. /* Convert your binary to prelim ELF */
objcopy -I binary -O elf32-little --rename-section .data=.text
<input.bin> <output.tmp>
2. /* Mark .text as executable, PRU remoteproc makes the distinction
between IRAM and DRAM0 based on the executable flags */
objcopy -I elf32-little --set-section-flags .text=code,alloc,load
<output.tmp>
3. Add a program header for loading the text
ld -n --accept-unknown-input-arch <output.tmp> -T linker_pru0.txt -o
prueth-pru0-firmware.elf --oformat=elf32-little -o <final-output.elf>

An example linker_pru0.txt is as follows,
SECTIONS
{
  .text 0x0 : { *(.text); }
  .resource_table 0x0 : { *(.resource_table); }
}

If you have a resource table binary data, that can be added in step 1 as
well, eg.

objcopy -I binary -O elf32-little --rename-section .data=.text
--add-section .resource_table=<rtable.bin> <input.bin> <output.tmp>

Obviously, a lot depends on what other data/sections you have within
your firmware in terms of your linker file and resource table section
placement.

Once I add the evt->channel->host interrupt mapping to DT for MPU, the
need for a resource table for MPU-related interrupt handling will mostly
go away. Until then, the interrupt mapping needs to come through
resource table, or if thats too painful, let the firmware code deal with
the mapping.

As for reloading a new non-ethernet firmware,
echo <pru-device> > /sys/bus/platform/drivers/pru-rproc/unbind
change correspoding firmware in /lib/firmware/
echo <pru-device> > /sys/bus/platform/drivers/pru-rproc/bind

[Suman: Right, the basic infrastructure is already there and you would need

only to build upon it. For eg., at the moment, all you would need is a
small kernel module, use pruss_request_mem_region to gain ownership of PRU
memories, export it to userspace and use it however you want. The
interrupts would have to go through kernel though. Maybe it is the same
module that exports the above desired sysfs interfaces too.]

Am I reading right here? My message is:

   - Reduce resource consumption on a small SoC.

And your advice is:

   - Add a further module to a bloated framework.

It is about coming up with a framework that would satisfy both the
kernel and userspace needs as well. With libprussdrv, one simply cannot
use PRU from kernel-mode. Yes, there are gaps at the moment with the
current infrastructure, and I never said it is complete.

Going by your philosophy, you don't need Linux drivers at all. Mmap
everything and do everything in userspace, why bother with the kernel at
all? /dev/mem is your friend.

Are you joking?

Full sure, it's not only me who'd ask: Why should I spend time to extend a
resource consuming framework that changes every now and then -- why should
I start to follow a moving target, just to end up with a feature that I
already have, working reliable since years? And why should hundred other
developers do the same?

It has been said already. From the BBB point of view your project has
nothing to offer. There are no MPU, DSP, ... processors. It's just about
ARM and PRU.

The TI kernel is not just about BBB, there are 4 other SoC families and
multiple boards where PRU is now supported. And we do have userspace
needs as well for Industrial usecases, they will get addressed in the
upcoming months. You might very well find a thing or two w.r.t UIO in an
upcoming Processor SDK release.

Your IPC is too slow. The complete topic is just about

replacing a "hack" by a "real Linux driver". And this isn't worth to spend
four times more memory and additional boot time and additional development
man power, just to get what we have already.

When you aim to develop software for the main stream, replacing a proven
solution, than YOU have to care about existing standards and start from
that point.'This means redo your concept.

And may I know how would you have done it if you were to redo it from
scratch supporting both kernel and userspace? You are only looking at it
from userspace angle, and as long as you stick to that, any kernel
framework will indeed look like bloat.

Otherwise, when you want to retain with your extisting infrastructure, it's
OK as well. But:

   - Make it optional. Place it in a separate package. Leave the main
   stream!

Define mainstream. What's mainstream for you is downstream for me. And I
am not aware until recently that the TI kernel is getting merged into
BBB kernel. I don't even know the merge cycle that BBB kernel is
following. The kernel changes I add are geared towards internal TI
releases, and there's always a potential downside for merging the kernel
at random points.

Let the users choise if they want to add and use your additional magic and
if they want to spend their resources on that.

It is optional, it is not even enabled by default in
omap2plus_defconfig. If one doesn't want to spend time, one can always
go back to how they were using it with whatever patches they had before.

And do not block TIs kernel
development nor the BBB PRUSS development with your experimental project.

Please note: the PRUSS are an important topic. They're a major advantage
the BBB has over its direct competitors, like RPi. When your project
complicates and slows down the PRUSS development, this endangers the
complete Beaglebone project.

I understand that it's frustrating that the new framework does not
address all your needs at the moment, and this is an active development
so I will be continuing to close the gaps during the year.

Please work with Jason Kridner for any further issues or concerns, and
he can work internally within TI to raise requirements or prioritize
features for bridging the gaps. You can always post queries on the TI
E2E forum for continued support.

regards
Suman

heh, it’s not about TI’s support. It’s about TI playing ball with the community. Instead of rocking the boat. Personally, I could say when I add the next feature to whatever - To go talk to my mom, because I don’t want to hear it. But that wouldn’t be very social would it ?

Yes, we can go back to and older config, but you people could also stop stepping all over the communities kernels, and then calling it your own. Right now, I have a stock image with 115M used memory right after a fresh reboot. I have sound, and driver for both sides of PRU modules ( remoteproc AND uio ), and I didn’t even touch this image. Here, let me just show you this crap:

debian@beaglebone:~$ lsmod
Module Size Used by
binfmt_misc 8862 1
usb_f_ecm 9336 1
g_ether 4976 0
usb_f_rndis 22191 2 g_ether
u_ether 11898 3 usb_f_ecm,usb_f_rndis,g_ether
libcomposite 43717 3 usb_f_ecm,usb_f_rndis,g_ether
nfsd 261377 2
spidev 7523 0
omap_sham 21340 0
omap_aes_driver 19045 0
pwm_tiehrpwm 4706 0
tieqep 8758 0
pwm_tiecap 3652 0
omap_rng 4423 0
rng_core 7703 1 omap_rng
c_can_platform 6602 0
c_can 9577 1 c_can_platform
can_dev 11820 1 c_can
snd_soc_davinci_mcasp 17079 0
snd_soc_edma 1290 1 snd_soc_davinci_mcasp
snd_soc_omap 3058 1 snd_soc_davinci_mcasp
snd_soc_core 155549 3 snd_soc_davinci_mcasp,snd_soc_edma,snd_soc_omap
snd_pcm_dmaengine 5209 2 snd_soc_core,snd_soc_omap
snd_pcm 83341 4 snd_soc_davinci_mcasp,snd_soc_core,snd_soc_omap,snd_pcm_dmaengine
snd_timer 19788 1 snd_pcm
snd 59495 3 snd_soc_core,snd_timer,snd_pcm
soundcore 7637 1 snd
spi_omap2_mcspi 11148 0
evdev 10695 1
uio_pdrv_genirq 3539 0
uio 8822 1 uio_pdrv_genirq
pru_rproc 12632 0
pruss_intc 7223 1 pru_rproc
pruss 9408 0

Why do I need all this garbage ? I didn’t enable it, and I really do not want, or need it.

debian@beaglebone:~$ free
total used free shared buffers cached
Mem: 504000 115864 388136 4344 16948 34548
-/+ buffers/cache: 64368 439632
Swap: 0 0 0

This is just sickening.

Anyway, it gets old having to go over these images that used to be nice, clean, and tidy. I guess I’m just going ot have ot start building my own images again, improve them myself. And not share the changes ? Isn’t that how this community is supposed to work ?

I think you forgot to take your meds this morning. I’m not even sure what this has to do with RPMsg. Looking at this list you provided, I’m not sure what you think is garbage? In any case, if there is something you don’t need, simply modify your config and remove the features you don’t want. Remember, the default kernel wasn’t designed for “you", but targeted at the general developer community and they all have different requirements. I would say this was a good place to start and easily customized to meet my own requirements.

So, looking at the list, USB, RNDIS, Ethernet, NFS, SPI, PWM, CAN, ALSA, UIO, Crypto and PRU are all important, so what do you want removed? There is nothing left to be removed.

Regards,
John

Hi TJF,

Hi Suman, thanks for your statements.

[Suman: Where do you use this from, I assume userspace? Using what - Shared

DRAM or regular PRU DRAM.]

Yes, libpruio is a userspace library. I'm not familiar with your
terminologie: Shared DRAM = SRam (12k@0x10000) and PRU DRAM = DRam
(2x8k@[0x0|0x2000]), right? I use the later (similar to your solution).

Yeah, same ones. I don't use SRAM for PRU Shared Data RAM, as I use it
to mean the regular on-chip memory.

[Suman: You don’t need to unload and reload the driver, the individual PRUs

can be rebooted using sysfs bind and unbind without affecting the other
PRU.]

Reloading the kernel module for firmware updates is I what read several
times at this forum. I didn't test yet. Sorry, if I confuse things here.

Yeah, it has come to my notice that this was being suggested previously
on TI forum threads not only for PRUs but for DSPs or IPUs on other SoCs
as well. The bind/unbind does give specific control for a core.

John asked alreay: Where can we find a tutorial / example on updating
firmware the correct way? A further question: often I reload just a part of
the IRam. I guess your concept doesn't support that?

Why, it is supported. There are no distinctions made between IRAM,
DRAM0/1 or Shared DRAM. The ELF loader just goes by the program headers,
so if the new image only has a program header that loads into IRAM, then
it just loads that.

[Suman: As for remoteproc driver, it really doesn’t matter what

assembler/compiler is used for building a firmware image. Remoteproc core
is standardized on ELF, and there is no reason to invent another format. As
it is, the plain binary format is a problem for PRUSS since its address
space is not unified (Address 0 for both IRAM and local DRAM). I don’t
recall how prussdrv is managing this, or if they only deal with IRAM from
the binary and leave the data sections for the application to copy over the
interfaces to use Data RAMs. At the moment, all you would need is a small
script to convert your binary into an ELF using objcopy to use with PRU
remoteproc driver.]

Seconding John: Where can we find a tutorial / example on how to load a
fimware image cerated by the pasm assembler?

I don't know if pasm assembler is still a supported tool by TI. If so, I
will work with Jason Reeder to try to get something documented on the TI
wikis or in PRU Software Support documentation.

Anyway, here's some basic steps (thanks to a colleague) in converting a
bin file to an ELF image,

1. /* Convert your binary to prelim ELF */
objcopy -I binary -O elf32-little --rename-section .data=.text
<input.bin> <output.tmp>
2. /* Mark .text as executable, PRU remoteproc makes the distinction
between IRAM and DRAM0 based on the executable flags */
objcopy -I elf32-little --set-section-flags .text=code,alloc,load
<output.tmp>
3. Add a program header for loading the text
ld -n --accept-unknown-input-arch <output.tmp> -T linker_pru0.txt -o
prueth-pru0-firmware.elf --oformat=elf32-little -o <final-output.elf>

An example linker_pru0.txt is as follows,
SECTIONS
{
  .text 0x0 : { *(.text); }
  .resource_table 0x0 : { *(.resource_table); }
}

If you have a resource table binary data, that can be added in step 1 as
well, eg.

objcopy -I binary -O elf32-little --rename-section .data=.text
--add-section .resource_table=<rtable.bin> <input.bin> <output.tmp>

Obviously, a lot depends on what other data/sections you have within
your firmware in terms of your linker file and resource table section
placement.

Once I add the evt->channel->host interrupt mapping to DT for MPU, the
need for a resource table for MPU-related interrupt handling will mostly
go away. Until then, the interrupt mapping needs to come through
resource table, or if thats too painful, let the firmware code deal with
the mapping.

As for reloading a new non-ethernet firmware,
echo <pru-device> > /sys/bus/platform/drivers/pru-rproc/unbind
change correspoding firmware in /lib/firmware/
echo <pru-device> > /sys/bus/platform/drivers/pru-rproc/bind

[Suman: Right, the basic infrastructure is already there and you would need

only to build upon it. For eg., at the moment, all you would need is a
small kernel module, use pruss_request_mem_region to gain ownership of PRU
memories, export it to userspace and use it however you want. The
interrupts would have to go through kernel though. Maybe it is the same
module that exports the above desired sysfs interfaces too.]

Am I reading right here? My message is:

  - Reduce resource consumption on a small SoC.

And your advice is:

  - Add a further module to a bloated framework.

It is about coming up with a framework that would satisfy both the
kernel and userspace needs as well. With libprussdrv, one simply cannot
use PRU from kernel-mode. Yes, there are gaps at the moment with the
current infrastructure, and I never said it is complete.

Going by your philosophy, you don't need Linux drivers at all. Mmap
everything and do everything in userspace, why bother with the kernel at
all? /dev/mem is your friend.

Problem with that approach is no possibility to handle interrupts, security is problematic and developers have to recreate almost everything from scratch. The calling mechanism (VRING) between PRU and ARM is quite efficient, but as TJF explained, the calling convention using multiple parameters does impact PRU performance. Perhaps look at ways to make the call work with less PRU cycles. I think TJF is confusing throughput with latency. From the BeagleLogic project, they were able to achieve better throughput with RemoteProc, but TJF is able to achieve lower latency with PRUSS_UIO. From what TJF explained, the latency occurs because of 5 parameters that must be pushed onto the stack for each call.

Are you joking?

Full sure, it's not only me who'd ask: Why should I spend time to extend a
resource consuming framework that changes every now and then -- why should
I start to follow a moving target, just to end up with a feature that I
already have, working reliable since years? And why should hundred other
developers do the same?

It has been said already. From the BBB point of view your project has
nothing to offer. There are no MPU, DSP, ... processors. It's just about
ARM and PRU.

The TI kernel is not just about BBB, there are 4 other SoC families and
multiple boards where PRU is now supported. And we do have userspace
needs as well for Industrial usecases, they will get addressed in the
upcoming months. You might very well find a thing or two w.r.t UIO in an
upcoming Processor SDK release.

This is what I like most about RemotProc/RPMSG, in that you are using the same framework between ARM, PRU, DSP, etc.

Your IPC is too slow. The complete topic is just about

replacing a "hack" by a "real Linux driver". And this isn't worth to spend
four times more memory and additional boot time and additional development
man power, just to get what we have already.

When you aim to develop software for the main stream, replacing a proven
solution, than YOU have to care about existing standards and start from
that point.'This means redo your concept.

And may I know how would you have done it if you were to redo it from
scratch supporting both kernel and userspace? You are only looking at it
from userspace angle, and as long as you stick to that, any kernel
framework will indeed look like bloat.

Yeah, so long as you don’t need security, interrupt handling, virtual peripherals (firmware defined devices), etc, why not use a hack to accomplish your task. I guess there is a reason why Linux doesn’t have a lot of userspace drivers.

Otherwise, when you want to retain with your extisting infrastructure, it's
OK as well. But:

  - Make it optional. Place it in a separate package. Leave the main
  stream!

Define mainstream. What's mainstream for you is downstream for me. And I
am not aware until recently that the TI kernel is getting merged into
BBB kernel. I don't even know the merge cycle that BBB kernel is
following. The kernel changes I add are geared towards internal TI
releases, and there's always a potential downside for merging the kernel
at random points.

BBB has two kernels, the “bone" kernel which continues to use PRUSS_UIO and the “ti” kernel which tracks TI’s kernel development and support RemoteProc/RPMSG. From Robert Nelson comments in this thread, he does pull in RemoteProc/RPMSG patches from git.ti.com.

Let the users choise if they want to add and use your additional magic and
if they want to spend their resources on that.

It is optional, it is not even enabled by default in
omap2plus_defconfig. If one doesn't want to spend time, one can always
go back to how they were using it with whatever patches they had before.

For those developers who don’t want RemoteProc/RPMSG, they use Robert Nelson's “bone” kernel.

And do not block TIs kernel
development nor the BBB PRUSS development with your experimental project.

Please note: the PRUSS are an important topic. They're a major advantage
the BBB has over its direct competitors, like RPi. When your project
complicates and slows down the PRUSS development, this endangers the
complete Beaglebone project.

I understand that it's frustrating that the new framework does not
address all your needs at the moment, and this is an active development
so I will be continuing to close the gaps during the year.

Please work with Jason Kridner for any further issues or concerns, and
he can work internally within TI to raise requirements or prioritize
features for bridging the gaps. You can always post queries on the TI
E2E forum for continued support.

Suman, you have done very good work. Please take the criticism as suggestions on how to make RemoteProc/RPMSG better. We understand that you have many other responsibilities at TI and we want to be respectful of your time. Thank you again for all your help.

Regards,
John

@John:

… so what do you want removed? There is nothing left to be removed.

I want removed:

pru_rproc 12632 0 pruss_intc 7223 1 pru_rproc

Do you remember, that’s the main topic in this discussion?

I think TJF is confusing throughput with latency. From the BeagleLogic project, they were able to achieve better throughput with RemoteProc, but TJF is able to achieve lower latency with PRUSS_UIO. From what TJF explained, the latency occurs because of 5 parameters that must be pushed onto the stack for each call.

I don’t think that I’m confusing anthing here. If BeagleLogic could really achieve better throughput with remoteproc, they must have had a masive problem in their code before. I cannot imagine how any driver could have a positive influence on libpruio thoughput.

And regarding latency, the five parameters are just the start. Then kernel code and PRU code adds further latency. (And yet the five parameters are much too much for my usecase.) The remoteproc concept isn’t made for high performance and it doesn’t provide any “hack” to pass it by.

Yeah, so long as you don’t need security, interrupt handling, virtual peripherals (firmware defined devices), etc, why not use a hack to accomplish your task. I guess there is a reason why Linux doesn’t have a lot of userspace drivers.

I see lots of userspace drivers in the real world. Ie. regarding one-wire support I know about four. And the numbers will increase, unless kernel development gets closer to user needs.

I need security, a high-level target in my projects. That’s why I don’t like virtual peripherals. That’s why I don’t want to load my firmware from files. (I’ve seen projects where firmware files from userspace gets linked to /lib/firmware and loaded. Or when a project runs from SD card, an aggressor can easy put the card in a laptop and override the firmware files in kernel space.)

So how does this project provide any additional security? When a user stops the libpruio firmware at the wrong point using “unbind”, this may damage the hardware.

Remember, the PRUSS are a safety risc per se. Old-fashioned thinking doesn’t match the current situation. Im contrast to an arbitrary peripheral subsystem, the PRUSS can access all CPU memory. And the kernel cannot protect any memory against PRU access. That’s why PRU control from the command line will not add any security. Instead it’ll increase the risc.

This is what I like most about RemotProc/RPMSG, in that you are using the same framework between ARM, PRU, DSP, etc.

If I’d be an RPi3 user, I’d love this too. Since it’d help me to develop a PRU emulator, using 1, 2 or 3 of the cores for real time tasks. Sure, such a project wouldn’t really help the BBB community and it wouldn’t be good for TIs buisiness. So I wonder why TI managers allow to develop this remoteproc thingy in public.

Suman, you have done very good work. Please take the criticism as suggestions on how to make RemoteProc/RPMSG better. We understand that you have many other responsibilities at TI and we want to be respectful of your time. Thank you again for all your help.

Good work for whom? Competitors are happy. But high performance PRU projects doesn’t work any more with TI kernels since kernel 4.x (that’s nearly two years now). And further corporate development is blocked until we find a solution.

I also have many other responsibilities. And here, I try to build a bridge, and I try to slow down (or stop) the current process of kernel development drifting away from user needs.

BR

@Suman

Thank you for your answer. Most of your colleagues just stop answering and break the discussion when it comes to the point. I really appreciate your effort.

And thank you for the examples. They may help others to get started with your framework. At the moment, for me, there is still no reason why I should spend time in testing.

I don’t know if pasm assembler is still a supported tool by TI.

As far as I understand, PRU isn’t supported by TI at all yet. But pasm is the first tool available, and therefor a lot of projects are based on it. You shouldn’t ignore it.

/dev/mem is your friend.

I don’t like /dev/mem. PRUSS are my friends.

Define mainstream.

By mainstream I mean the system a user gets when he installs an image.

It is optional, it is not even enabled by default in omap2plus_defconfig. If one doesn’t want to spend time, one can always go back to how they were using it with whatever patches they had before.

Yes, downgrading is an option in any case of conflicts. It’s the last option. And I think it shouldn’t be the standard answer when kernel developers learn about user needs.

Your project is not optional. As you can see in Williams post, your driver gets installed by default in any TI image.

Please try to imagine: not every user compiles his own kernel. Instead, most users download and install a prepared image. And this is how it should be (@RCN: thanks for all your efforts).

And may I know how would you have done it if you were to redo it from scratch supporting both kernel and userspace? You are only looking at it from userspace angle, and as long as you stick to that, any kernel framework will indeed look like bloat.

User space is what kernel is made for. A kernel without userspace applications makes no sense. And when a kernel framework looks bloated from userspace angle, it may be bloated.

You may know how I’d have started your project. As I said, I’d first make the decision:

  1. do I want to make something completely new, or
  2. do I want to replace an existing solution?
    In first case, I would create an experimental project, far away from mainstream. In the later case I’d care about existing code and solutions (and I wouldn’t go mainstream as well, unless I got the basics working and documented with examples on how to migrate).

As far as I understand your project, you made this decision. But you didn’t leave the mainstream yet (and I don’t understand how you could enter).

The TI kernel is not just about BBB, there are 4 other SoC families and multiple boards where PRU is now supported. And we do have userspace needs as well for Industrial usecases, they will get addressed in the upcoming months. You might very well find a thing or two w.r.t UIO in an upcoming Processor SDK release.

I know that you work for other SoC families as well. I wonder why you don’t add your project to the Processor SDK as an insiders’ tip or special offer. By the way, libpruio is already used in lots of industrial and scientific projects today.

I understand that it’s frustrating that the new framework does not address all your needs at the moment, and this is an active development so I will be continuing to close the gaps during the year.

It isn’t frustrating that your framework doesn’t adress my needs. I wouldn’t even care about that. It’s frustrating that such a framework is in the mainstream kernel (images). And it’s frustrating that this could get fixed very easy, but hasn’t been done yet.

Do whatever you think that’ll be helpful for anyone at anytime. But do it in a sandbox. Don’t block corporate development. Don’t steal users resources. Don’t add further riscs to kernels for boards with PRUSS.

Please work with Jason Kridner for any further issues or concerns…

For Jason Kridner I can only repeat: do everthing possible to get remoteproc out of mainstream. It blocks the PRUSS development in BB community and endangers the BB project.

BR

I will work to enable uio_pruss functionality, and I think that is what you want, not just getting remoteproc out.

@Jason Kridner

I do not think anyone is asking to remove remoteproc, and replace it with uio_pruss. What we’ve been asking, at least I have been asking is give us the option.

For example, all those modules listed in the output from my lsmod on a fresh install of the latest Jessie testing image:

debian@beaglebone:~/nfs$ uname -r
4.4.12-ti-r31
debian@beaglebone:~/nfs$ cat /etc/dogtag
BeagleBoard.org Debian Image 2016-06-19

Anyway all those kernel modules should NOT be enabled by default. With exception of perhaps the USB gadget drivers(and nfsd is something I personally loaded ). Personally, I would prefer they were not loaded by default, but I can understand the need. Additionally, those kernel modules should be loaded via a device tree file as in how the original uio_pruss module is loaded.

So again. I do not care what you all at TI work on. It’s not my place to say one way or another. But please do not force us to endure your developers poor clean up abilities. But I will say, that looking at that mess that is the output of lsmod from a released debian image. You all must have zero pride in your work . . . Harsh ? Maybe overly harsh yes, but think about the message you developers are putting out there with that mess.

Also as TJF mentioned. Feel free to create your own little sandbox, and then include the stuff you have finished into the released images. That, would make me happy, as well as many others, I’m sure. It’s either that, or I’m forced to clean up your mess. Which I am perfectly capable of doing. But that’s not the point.

Additionally, I am grateful for the work EVERYONE has done for the community. Even the people I’m bitching at right now. Take it as overly critical constructive complaints. If you want.

There, now everyone can clean their messy images . . .

https://github.com/wphermans/bbb-cleanup/blob/master/beaglebone-cleanup.md

I’ve included Ohad Ben-Cohen who was one of the originators of the RemoteProc/RPMSG framework. Hopefully he will be able to provide some prospective of what he was thinking when he created this framework.

@John:

… so what do you want removed? There is nothing left to be removed.

I want removed:

pru_rproc 12632 0 pruss_intc 7223 1 pru_rproc

Why don’t you use the “bone” kernel which pruss_uio as default. The “ti” kernel has RemoteProc/RPMSG as default. I don’t understand you problem here.

Do you remember, that’s the main topic in this discussion?

I think TJF is confusing throughput with latency. From the BeagleLogic project, they were able to achieve better throughput with RemoteProc, but TJF is able to achieve lower latency with PRUSS_UIO. From what TJF explained, the latency occurs because of 5 parameters that must be pushed onto the stack for each call.

I don’t think that I’m confusing anthing here. If BeagleLogic could really achieve better throughput with remoteproc, they must have had a masive problem in their code before. I cannot imagine how any driver could have a positive influence on libpruio thoughput.

Look at the BeagleLogic development blog, where he explains the throughput problem with pruss_uio. When he changed to RemoteProc/RPMSG, the throughput increased dramatically.

And regarding latency, the five parameters are just the start. Then kernel code and PRU code adds further latency. (And yet the five parameters are much too much for my usecase.) The remoteproc concept isn’t made for high performance and it doesn’t provide any “hack” to pass it by.

I continue to say that you are using RemoteProc/RPMSG incorrectly. You shouldn’t have a tight control loop between the PRU and ARM because this makes no sense. Linux is non deterministic so why would you want to compromise the PRU by making it dependent on the communications with Linux. Either use one PRU for the control loop and another for communicating with Linux, or use DMA to pass data between PRU and ARM.

Yeah, so long as you don’t need security, interrupt handling, virtual peripherals (firmware defined devices), etc, why not use a hack to accomplish your task. I guess there is a reason why Linux doesn’t have a lot of userspace drivers.

I see lots of userspace drivers in the real world. Ie. regarding one-wire support I know about four. And the numbers will increase, unless kernel development gets closer to user needs.

These are generally toys. The vast majority of drivers are Kernel based drivers.

I need security, a high-level target in my projects. That’s why I don’t like virtual peripherals. That’s why I don’t want to load my firmware from files. (I’ve seen projects where firmware files from userspace gets linked to /lib/firmware and loaded. Or when a project runs from SD card, an aggressor can easy put the card in a laptop and override the firmware files in kernel space.)

Once you compromise physical security, you have no security, period, so this is a silly point to make. There a many Linux device drivers that rely on firmware and these are all done is a secure way. The purpose of a kernel driver is to validate the user parameters and prevent operations outside well defined limits. Userspace drivers have no such validation and can do whatever they please, hence no security.

So how does this project provide any additional security? When a user stops the libpruio firmware at the wrong point using “unbind”, this may damage the hardware.

Remember, the PRUSS are a safety risc per se. Old-fashioned thinking doesn’t match the current situation. Im contrast to an arbitrary peripheral subsystem, the PRUSS can access all CPU memory. And the kernel cannot protect any memory against PRU access. That’s why PRU control from the command line will not add any security. Instead it’ll increase the risc.

Hence why you want to Kernel based driver to validate the firmware. Again, userspace driver can place whatever code it wants on the PRU.

This is what I like most about RemotProc/RPMSG, in that you are using the same framework between ARM, PRU, DSP, etc.

If I’d be an RPi3 user, I’d love this too. Since it’d help me to develop a PRU emulator, using 1, 2 or 3 of the cores for real time tasks. Sure, such a project wouldn’t really help the BBB community and it wouldn’t be good for TIs buisiness. So I wonder why TI managers allow to develop this remoteproc thingy in public.

TI have several processors that have PRU, DSP, CortexM4 in addition to one or more ARM processors. Just look at the BeagleBoard-x15 for example.

Suman, you have done very good work. Please take the criticism as suggestions on how to make RemoteProc/RPMSG better. We understand that you have many other responsibilities at TI and we want to be respectful of your time. Thank you again for all your help.

Good work for whom? Competitors are happy. But high performance PRU projects doesn’t work any more with TI kernels since kernel 4.x (that’s nearly two years now). And further corporate development is blocked until we find a solution.

This framework works for me so I certainly don’t want it removed.

I also have many other responsibilities. And here, I try to build a bridge, and I try to slow down (or stop) the current process of kernel development drifting away from user needs.

I think it would be more productive to make suggestions on how to improve the RemoteProc/RPMSG. BTW, I’m sure you don’t have any problem with RemoteProc, because it is just loading and starting/stopping the firmware on the PRU. PRUSS_UIO has a similar firmware loader. So perhaps we should concentrate on Virtio, vring or RPMSG.

BR

@Suman

Thank you for your answer. Most of your colleagues just stop answering and break the discussion when it comes to the point. I really appreciate your effort.

And thank you for the examples. They may help others to get started with your framework. At the moment, for me, there is still no reason why I should spend time in testing.

I don’t know if pasm assembler is still a supported tool by TI.

As far as I understand, PRU isn’t supported by TI at all yet. But pasm is the first tool available, and therefor a lot of projects are based on it. You shouldn’t ignore it.

/dev/mem is your friend.

I don’t like /dev/mem. PRUSS are my friends.

Define mainstream.

By mainstream I mean the system a user gets when he installs an image.

It is optional, it is not even enabled by default in omap2plus_defconfig. If one doesn’t want to spend time, one can always go back to how they were using it with whatever patches they had before.

Yes, downgrading is an option in any case of conflicts. It’s the last option. And I think it shouldn’t be the standard answer when kernel developers learn about user needs.

Your project is not optional. As you can see in Williams post, your driver gets installed by default in any TI image.

Please try to imagine: not every user compiles his own kernel. Instead, most users download and install a prepared image. And this is how it should be (@RCN: thanks for all your efforts).

Once again, why not install the “bone” kernel which has pruss_uio as default. Why do you insist on installing the “ti” kernel which has RemoteProc/RPMSG as default and then insist on removing RemoteProc/RPMSG. This makes no sense to me.

And may I know how would you have done it if you were to redo it from scratch supporting both kernel and userspace? You are only looking at it from userspace angle, and as long as you stick to that, any kernel framework will indeed look like bloat.

User space is what kernel is made for. A kernel without userspace applications makes no sense. And when a kernel framework looks bloated from userspace angle, it may be bloated.

You may know how I’d have started your project. As I said, I’d first make the decision:

  1. do I want to make something completely new, or
  2. do I want to replace an existing solution?
    In first case, I would create an experimental project, far away from mainstream. In the later case I’d care about existing code and solutions (and I wouldn’t go mainstream as well, unless I got the basics working and documented with examples on how to migrate).

As far as I understand your project, you made this decision. But you didn’t leave the mainstream yet (and I don’t understand how you could enter).

The reason it was added to mainstream was to encourage support by other vendors which has already started.

The TI kernel is not just about BBB, there are 4 other SoC families and multiple boards where PRU is now supported. And we do have userspace needs as well for Industrial usecases, they will get addressed in the upcoming months. You might very well find a thing or two w.r.t UIO in an upcoming Processor SDK release.

I know that you work for other SoC families as well. I wonder why you don’t add your project to the Processor SDK as an insiders’ tip or special offer. By the way, libpruio is already used in lots of industrial and scientific projects today.

Since pruss_uio is only supported on one platform, it shouldn’t be included in mainstream.

I understand that it’s frustrating that the new framework does not address all your needs at the moment, and this is an active development so I will be continuing to close the gaps during the year.

It isn’t frustrating that your framework doesn’t adress my needs. I wouldn’t even care about that. It’s frustrating that such a framework is in the mainstream kernel (images). And it’s frustrating that this could get fixed very easy, but hasn’t been done yet.

Again, use the “bone” kernel. This is why Robert Nelson has the “bone” kernel.

Do whatever you think that’ll be helpful for anyone at anytime. But do it in a sandbox. Don’t block corporate development. Don’t steal users resources. Don’t add further riscs to kernels for boards with PRUSS.

Please work with Jason Kridner for any further issues or concerns…

For Jason Kridner I can only repeat: do everthing possible to get remoteproc out of mainstream. It blocks the PRUSS development in BB community and endangers the BB project.

I completely disagree with this request.

Regards,
John

Please don’t do that. Robert Nelson has the “bone” kernel for this purpose which supports pruss_uio by default and does not have RemoteProc/RPMSG installed. The “ti” kernel however does the reverse, with the RemoteProc/RPMSG installed by default and pruss_uio no installed. I believe the two frameworks conflict so it is not possible to have both installed.

Regards,
John

Also, instead of saying “use the bone kernel instead”. Do realize that the TI kernel has some useful features that are not implemented into the bone kernel config, at minimum.

So the point is, this “toy” is a community “toy”. Not just for you.

On the contrary, the TI kernel is for all TI processors and not just for the BBB. If you want features added to the “bone” kernel, why not request those features be added.

Regards,
John

It sure seems to me that if both can exist in the source tree and be selected at runtime with configuration (ideally via device tree, switchable later by loading and unloading modules), that would be the ideal solution. It sounds like John is saying this is technically not possible, but I feel like it should be (it is just code, after all).

It sure seems to me that if both can exist in the source tree and be selected at runtime with configuration (ideally via device tree, switchable later by loading and unloading modules), that would be the ideal solution. It sounds like John is saying this is technically not possible, but I feel like it should be (it is just code, after all).

Right now, I do not believe it is possible. But as you say - it should be. But this is one of the points I was eluding to in my rants. I think that ideally, there should be only one kernel, with the option to load whatever a user at the time wishes. Of course, it would not make sense to load two PRU drivers at once. Unless somehow it could be made to run two different drivers on each of the PRU cores. That might be cool . . .

Anyway, right now I do not think that the two PRU drivers are compatible with one another as kernel config options. That in my mind should be a paramount consideration.