Is there a way to send an interrupt from userspace to the PRU-ICSS?

The mechanism for generating an interrupt from a PRU to the A8 (host) is well-documented. Is there a way to send an interrupt (one of the 64 system interrupt events documented in the PRU-ICSS literature) from userspace?

From reading the TI documentation, the only two that seem to be candidates are two “mailbox” interrupts. I recall reading something about a version of the remoteproc (or RPMsg, or virtio) drivers that utilized these mailboxes, but ultimately abandoned them as they are not available on all platforms. (that may be incorrect).

Setting a flag in PRU DRAM or shared RAM is clearly a method that will work. However, it appears that polling DRAM or shared RAM is a multi-clock task; if a PRU system interrupt can be generated, it can be polled in one clock by examining R31 bits 30/31 (if configured correctly). Is this possible?

The mechanism for generating an interrupt from a PRU to the A8 (host) is
well-documented. Is there a way to send an interrupt (one of the 64 system
interrupt events documented in the PRU-ICSS literature) from userspace?

No, there are no such things as userspace interrupts, period.

From reading the TI documentation, the only two that seem to be candidates
are two "mailbox" interrupts. I recall reading something about a version
of the remoteproc (or RPMsg, or virtio) drivers that utilized these
mailboxes, but ultimately abandoned them as they are not available on all
platforms. (that may be incorrect).

Setting a flag in PRU DRAM or shared RAM is clearly a method that will
work. However, it appears that polling DRAM or shared RAM is a multi-clock
task; if a PRU system interrupt can be generated, it can be polled in one
clock by examining R31 bits 30/31 (if configured correctly). Is this
possible?

I get the feeling however that you're misunderstanding the purpose of an
interrupt. An interrupt is a way for hardware to let software know,
something has happen that may require attention. Either way you wold
probably be better off thinking in the context of setting a bitfield, or in
this case, a single bit.

I see what you are saying. But (from what I’ve read) it seems that many would also say that you can’t receive interrupts in userspace. I guess that’s technically true, but uio does provide a way, through sysfs, to determine if an interrupt was fired.

Also from what I’ve read (I’ve been doing a lot of reading…) I thought that remoteproc was using mailboxes to communicate from userspace to the PRU - and was wondering if this “reverse-direction” interrupt mechanism couldn’t also be supported by drivers (say, by writing to /dev/uio - analogous to select()/poll() on /dev/uio to detect an interrupt.

Granted, after going through all that machinery, it may well be faster to just set a bit in PRU memory and have PRU poll.

If this is just nonsense, even in theory, then I’d welcome an education as to why.

So, there is a driver, although I do not know the name of this driver that populates /proc/irq/ , but if you cat /proc/interrupts, you’ll easily see which number correlates to what.

root@beaglebone:~# cat /proc/interrupts
CPU0
16: 39266767 INTC 68 Level gp_timer
20: 118529 INTC 12 Level 49000000.edma_ccint
22: 97 INTC 14 Level 49000000.edma_ccerrint
23: 0 INTC 96 Level 44e07000.gpio
30: 0 44e07000.gpio 6 Edge 48060000.mmc cd
56: 0 INTC 98 Level 4804c000.gpio
89: 0 INTC 32 Level 481ac000.gpio
122: 0 INTC 62 Level 481ae000.gpio
155: 18 INTC 72 Level 44e09000.serial
156: 1431 INTC 70 Level 44e0b000.i2c
157: 33 INTC 30 Level 4819c000.i2c
158: 13 INTC 64 Level mmc0
159: 587731 INTC 28 Level mmc1
167: 0 INTC 75 Level rtc0
168: 0 INTC 76 Level rtc0
172: 2117580 INTC 41 Level 4a100000.ethernet
173: 218976 INTC 42 Level 4a100000.ethernet
181: 0 INTC 111 Level 48310000.rng
183: 41 INTC 18 Level musb-hdrc.0.auto
184: 1 INTC 19 Level musb-hdrc.1.auto
185: 0 INTC 17 Level 47400000.dma-controller
186: 0 INTC 7 Level tps65217
187: 0 INTC 16 Level TI-am335x-adc

The reason why there are no userspace interrupts, is that this would create context switching between userspace->kernel space->userspace. Needless to say, this would create a system wide performance problem.

Now, concerning setting a bit flag in memory somwhere. I ran into a similar problem that was all in userspace, where I was decoding a custom CANBUS protocol, and all of the so called “non-blocking” mechanisms available through the std Linux libc API, being too slow by far. The reason I needed this mechanism, was that I have two separate processes, communicate between each other, in real-time. The CANBUS side of things decoding messages at 1Mbit/s, the other half, being a web server, displaying the decoded data, in real-time through web sockets.

Anyway, with traditional non-blocking methods I was able to switch between these two apps at around 5-8 messages a second. But when I switched to using mmap() + a structure in memory + a bit in this structure as a locking mechanism. My message count max shot up to around 1000 messages / second. Granted, actual new message values were only around 20 a second.

SO basically how I did this was again, I had a structure where one member was a single bit( 8 bits in storage, but I only used 1 ). After that, this bit would be either 0 or 1(of course ), and depending what value this bit was, indicated which process had access to the file. Which access rotation was a real factor in this given situation. Basically, the first half process would decode, and write, then set the access bit to allow the second half access. The second half would read this value, then change the value back to 0. As implied above, this turned out to be the fastest mechanism possible by a long shot.

Using the PRU’s you would not be able to use POSIX IPC shard memory as I did, but you can still use mmap() + /dev/mem/ on the user space side of the application. Which would allow you to communicate with the PRU’s through a specified location in memory.

If this is not clear enough, I can elaborate further. In either case, I’d be interested in how you dealt with your problem here, and would really love to see some simplified example code.

On Tue, 7 Mar 2017 21:52:21 -0700, William Hermans
<yyrkoon@gmail.com> declaimed the following:

I get the feeling however that you're misunderstanding the purpose of an
interrupt. An interrupt is a way for hardware to let software know,
something has happen that may require attention. Either way you wold
probably be better off thinking in the context of setting a bitfield, or in
this case, a single bit.

  Depending upon one's background, there may be different levels of
things under "interrupt".

  Your focus appears to be on (external) hardware signalling for service.

  I have encountered systems with the concept of a software interrupt
being used by unprivileged code to signal a request that needs to be
handled in privileged code; or even if the system didn't support
privileged/unprivileged just to give an entry point to system functions
that wouldn't change with system updates (single interrupt with a passed
index into a function table). Many tend to use the name "trap" for a
software interrupt.

  Even Ti-RTOS defines something called a SW interrupt, running at a
lower priority than HW interrupts (with a suggestion that the HW interrupt
should trigger a SW interrupt if more than a minimum amount of processing
is needed -- which, granted, is the opposite of a low priority task
requesting an operation by triggering a SW interrupt)

  Hardware interrupts are pretty much always asynchronous -- taking
control away from whatever process is running. Software interrupts used to
access system services are synchronous -- the running process is the one
transferring control; though they may also be asynchronous if they are part
of a priority scheme and triggering may come from code running at other
priority levels (immediately if triggered by lower priority code, deferred
if triggered by higher priority code, only to be activated when the higher
priority code exits)

For the purpose of this discussion with ags, I do not think the actual
definition of what an interrupt is, is quite so important, as much as how
to achieve an end goal. On a single threaded "system", I also do not think
asynchronous is really ever a factor. But I usually do tend to view
interrupts as prioritized, and preemptive.

Additionally, what I proposed, should not interfere with system interrupts
much, if at all. But should complete the task as fast as the system would
allow, and is blocking in nature.

One thing I did not mention however, is that even though my idea is
blocking in nature, you can give processor back to the system by using
sleep(), or usleep(). Instead of continuously polling to the point that
you're keeping the processor so busy, it has little time to do anything
else.

In my use case, I think I used usleep() with a value of 10,000, which
seemed responsive enough for my purpose, and only used 1-3% processor
time.

According to this 2015 video from the Embedded Linux Conference, the PRU does not support asynchronous interrupts:

https://youtu.be/plCYsbmMbmY?t=22m6s

I think there is some sort of PRU interrupt queue, but it does not interrupt the PRU’s execution. Your PRU code must explicitly monitor the PRU interrupt queue to check for an interrupt.

Alternatively, I’ve used a method for ARM/PRU coordination that is similar to what William Hermans described: when the ARM CPU wants to trigger something on the PRU, it writes a 1 to the bottom byte of the PRU data RAM. The PRU continuously monitors this bottom byte to watch for a change.

-Justin

Correct - to preserve deterministic execution, the PRU cannot be asynchronously interrupted. Polling (of some form) is required.

Back to the OP, there is a way to register a (non-async) interrupt with the PRU. One can force a system interrupt (any one of the 64 that the PRUSS recognizes) by setting a bit in the Interrupt Status Register. From userspace it looks just like writing to the PRU DRAM since it’s just writing a value to mmap()'d physical address. The advantage over what’s been discussed here is that depending on how it’s set up, it could be faster than polling from DRAM. I will have to implement to provide actual measurements.

So I remember asking Charles, some time ago, if it would be more efficient
writting to DRAM, or to one of the shared memory area for the PRU. From the
ARM side of things. I think perhaps in this case, writing to one of the
PRU's memory area's might be more efficient. In this one case. My reasoning
here is that through userspace, one would have to write out to memory
through /dev/mem/ anyhow. So why not make that to a memory location where
the PRU has single cycle read speeds ? One *would* have to take extra care
to make sure this memory location is correct, but no more so than writing
into DRAM. . . .

Something to think about anyhow.

I’ve had a hard time getting any definitive responses to questions on the subject of memory access & latency. It is true that the PRU cores have faster access to DRAM that is part of the PRU-ICSS (through the 32-bit interconnect SCR) - though not single-cycle - than to system DDR. However, the ARM core accesses DDR through L3 fabric, but the PRU-ICSS through L4FAST, so I’m thinking that it can access DDR faster than PRU-ICSS memory.

I’ve also asked about differences in latency/throughput/contention comparing PRU-ICSS 12KB shared RAM v the 8KB data RAM. No response. Since both 8K data RAM is accessible to both PRU cores, I’m not sure what the benefit of the 12KB shared RAM is (thought I imagine there is, I just can’t figure it out).

Lastly - and even more importantly - is total agreement that you have to be careful about accessing any memory correctly. I have posted several times asking about the am335x_pru_package examples (using UIO). In at least one (https://github.com/beagleboard/am335x_pru_package/blob/master/pru_sw/example_apps/PRU_PRUtoPRU_Interrupt/PRU_PRUtoPRU_Interrupt.c), there is hardcoded use of the first 8 bytes of physical memory at 0x8000_0000. I don’t see how that can be OK. It may be that I don’t know some secrets of Linux internals, but from a theoretical perspective, I just don’t know how one can make the assumption that any part of main memory is not in use by another process unless it is guaranteed by the kernel.

So here is what I meant. Of course, I have no personal hands on,but looking
at things from 35k feet. I *know* writing directly to the PRU shared memory
from userspace, would be, performance wise, just as fast as writing to the
512M of system DDR. Through /dev/mem/. On the PRU side however, the PRU's
would have single cycle access to their own memory. So the tricky part for
me here would not be making sure we're writing to the right memory
location, but knowing it's possible to begin with because I have not
attempted this personally. In fact my hands on experience with the PRU is
limited to just setting up a couple examples, and proving to myself it
would work with a 4.x kernel.

So my only real "concern" is, if it really is possible to mmap() the
physical address for the PRU's shared memory, and if that could be done
"safely". But I do know that if it is possible, it would be faster than
reading and writing to the systems 512M DDR because of the fabric latency.
From the PRU side. Not only that, from what I've read in the past, is that
accessing devices, or memory through that fabric can add a little bit of
non deterministic latency. So my thinking here is that "we'd" gain back our
little bit of determinism that we lost using DDR.

After that, I have no idea how important what I'm talking about is to you,
with your given project. Address 0x8000000h though, I seem to recall is
possibly related to the kernel, or perhaps the initrd. But another thing,
that I do not pretend to know 100% about is how Linux virtual memory works.
So when we say we're accessing "physical memory", through mmap() we're
actually accessing the device modules, or external memory through virtual
memory. Which it could very well be possible the person who wrote the uio
pru examples knew this going in, and it's not by accident at all. But
rather by design. I'd have to look further into the gory details of
everything, before I could make this determination.

Thinking on it for a little longer, I almost want to say that the Address 0x8000000h is actually the start of Linux’s virtual memory map. But I’m not 100% sure.

I’m doing my own research for a paying project, so can’t really dive into documentation for something else right now . . .

OK, according to some dicumentation I was able to find quickly, address 0x8000000 is the base address for the start of the DDR memory on the TI EVM board. Which is very similar to the beaglebone in memory layout.

Here is another link that should explain it clear enough. http://processors.wiki.ti.com/index.php/HOWTO_Change_the_Linux_Kernel_Start_Address#Modifying_memory.h

So I would say that it is not by accident that the base address of 0x8000000 works. In fact, if you think about it a little bit. . Read the opening paragraph labeled “purpose”, and replace “DSP” with “PRU”, for all intents and purposes. of this discussion.

William,

Thank you so much for this information. Will really help for that thread I’m doing on BB. Just trying to get the P8/9 up on my little BBBW. Its nice having a little insight into the internals of them…as much information as I can get, I’m happy about. Not quite finished reading all my emails, but give me time.

If I have any questions can I bounce them off you, bro? It takes me a little while to get things done but I’m starting your email. I do GREAT in burst mode, so feel free to continue communicating as you have done.

Woody.

Check out the new website at: https://woodystanford.wordpress.com/stanford-systems-home-page/
Download the current (and past) quarterly newsletter on the development of our suborbital offerings at https://woodystanford.wordpress.com/downloads/ - at the bottom of the page.
Cell: 480-740-5610

Here is another link that should explain it clear enough. http://processors.wiki.ti.com/index.php/HOWTO_Change_the_Linux_Kernel_Start_Address#Modifying_memory.h

So I would say that it is not by accident that the base address of 0x8000000 works. In fact, if you think about it a little bit. . Read the opening paragraph labeled “purpose”, and replace “DSP” with “PRU”, for all intents and purposes. of this discussion.

@William Hermans like you I won’t be able to dig into the gory details of loading Linux. This is an interesting read (albeit high-level and prompting more questions). I think I can say a few things without understanding all the details:

It is correct (from detailed reading of the TI TRM) that 0x80000000 is the physical memory address of the L3 DDR.
If Linux is leaving any physical memory unmapped, unused - that’s a shame. Wasted precious resource.
The PRUSS UIO driver allocates memory and exposes the physical address in userspace. If this is not used, it is also a precious wasted resource.

Now comes the subjective stuff:

I’m going to presume that Linux isn’t stupid, and not count on it leaving permanently-allocated and undocumented physical memory addresses available for those that know the secret handshake.
I will use the memory allocated by the PRUSS UIO driver to communicate between userspace the PRUICSS.

If someone from TI/BeagleBoard.org responds with clarification on where I’m incorrect, I’ll adjust my position. As of now, for over two years I’ve been asking this same question and gotten no definitive response. Anyone know who came up with the the am335x_pru_package examples?

Thanks for your input and replies. Much appreciated.

@William Hermans like you I won’t be able to dig into the gory details of loading Linux. This is an interesting read (albeit high-level and prompting more questions). I think I can say a few things without understanding all the details:

It is correct (from detailed reading of the TI TRM) that 0x80000000 is the physical memory address of the L3 DDR.
If Linux is leaving any physical memory unmapped, unused - that’s a shame. Wasted precious resource.
The PRUSS UIO driver allocates memory and exposes the physical address in userspace. If this is not used, it is also a precious wasted resource.

Now comes the subjective stuff:

I’m going to presume that Linux isn’t stupid, and not count on it leaving permanently-allocated and undocumented physical memory addresses available for those that know the secret handshake.
I will use the memory allocated by the PRUSS UIO driver to communicate between userspace the PRUICSS.

If someone from TI/BeagleBoard.org responds with clarification on where I’m incorrect, I’ll adjust my position. As of now, for over two years I’ve been asking this same question and gotten no definitive response. Anyone know who came up with the the am335x_pru_package examples?

Please understand, that TI has nothing to do with BeagleBoard.org. Also, there is no BeagleBoard.org support staff. We are all users just like yourself and we volunteer our time to help others. If no one answers your questions, then perhaps your questions are not interesting or no one has the time to investigate answers that you need. To answer your questions, we would have to read the TRM and then do some experimentation to get the answer. Why should we do this work for you when you can do this for yourself.

Learn how to use the tools and help yourself. For example, clone the am335x_pru_package repo and then do a “git blame <file.c>” and it will give you the e-mail of the person who wrote each line of code for <file.c>. Pick up a good book on GIT as this is a very powerful tool.

Regards,
John