Passing Mailbox messages between ARM and PRU

I’ve never done anything complex enough to interact with a UART within the PRU, however if you have existing code intended for the PRU_ICSS, I would think it wouldn’t be too terribly difficult to migrate that to a PRU in the PRU_ICSSG. The main thing(s) that would need to change are the memory addresses to the UART you are using. That’s just a matter of looking them up in the TRM. And just to give you some confidence, you might want to look them up in the TRM of the AM335x and verify the values you are changing are the same as what you identified in the AM335x’s documentation. Then change them to the address documented for J721E. It “should” be just that simple. And if TI was consistent about the UART values it exposes and the values that are to be written to those UART address locations, then I don’t know why it wouldn’t be this simple. However if there are interaction differences between the two platform’s UARTs, then it will get a little more complicated. For instance, lets say (and I’m making stuff up here) the AM335x wants you to write:
1 for 9600 baud
2 for 19200 baud
3 for 28800 baud

But the J721E’s UART expects the actual baud value be written. In this case, you’d have to update the code to write what the J721E expected. But again, that shouldn’t be horrible to figure out.

All that said, I personally wouldn’t try to communicate anything more than the absolute simplest coms out a UART from a PRU. My experience with UARTs has been with highly structured languages that require a full-up stack and multiple threads to perform asynchronous transmit/receive functions. Typical UART interactions are FAR easier done in an RTOS than a PRU. But if all the PRU code is doing is outputting human-readable characters as a debug mechanism, then that’s probably appropriate.

@Chris_Grey, @BarryBeagle, et. al:

I was just reviewing my old R5F gpio toggle experiment post. In that post, Chris stated that he was working on the BBAI-64 PRUs but hadn’t yet gotten an IO to toggle. I wanted to check back in with you guys: Have you been able to use the BBAI64 PRU_ICSSG to control IO?

Thanks and Best regards,
Fred Eckert

I have not experimented with physical I/O yet. I haven’t done much since mid January. I did most of this investigation over the Christmas holidays. And once “real” work (and life) kicked back in, I haven’t had a lot of time to dedicate to looking into this further. I still have the BBAI64 here on my desk. I just haven’t been able to break away to do much with it. Dealing with logical things was easy enough with just the board. But Physical I/O requires I connect things and get more involved. And while that isn’t impossible, I haven’t take the time to do that work. I do have one of those prototyping capes with the tiny breadboard and 2 LEDs and some buttons. I got that back when I was working with the BBB over a decade ago. I’m hoping it’ll plug in and let me do some of that I/O testing with simple things like the switch and LEDs it has. That just hasn’t happened. It’s still on the to-do list…but it may be Thanksgiving before I get back to it.

I’m also hoping there’s more maturity to the development environment to make my investigations a bit easier. I’m also hoping that some more support for sleep/quick-wake is developed. The product I’d hoped to use the BBAI64 as the heart of would REQUIRE that. When I investigated this, it seemed all the way up to TI, this had not really been explored or exercised to know how well it really works and certainly no code-support for it by TI or BB. It seems to only be a silicon possibility, not a supported feature. That, alone, was the biggest loss of wind from my sails for investigating anything on the BBAI64 further.

Do you really need that much horsepower for your product. Ti is not doing a good job at making friends with us, other options do exist so I have been looking at other vendors. Not sure why Beagleboard is so committed to Ti product.

So far it makes me feel like we have been chasing our tails, doing way too many deep dives into docs that are lacking content between point A and point B. Then so much of it is stale. Pretty sure if we were one of Ti’s top ten customers our journey would not be so difficult.

Do I need that much HP, the answer is yes and no.

To do the base-functionality of what I need, probably not. And if I was willing to have companion software on the PC that did a lot of the configuration, datalogging, and general UI, then I could probably get away with just a BBB…maybe.

But the product I envision is one that also hosts its own WebUI that will allow a user to make changes to the configuration as well as show datalogging and trending information in realtime using nothing more than a browser on their PC. And to get the rich experience without a lot of lag requires a fair amount of server-side heft, even if the browser is doing a lot of UI lifting.

And the reason I say this is because of my current experience doing EXACTLY THAT on the AM335x. The product I work on professionally has real-time logic running along-side a web UI. And when the WebUI is exercised, not only is it a SLOW user experience, it takes clock cycles away from the normal operation code (duhh, it’s a single core processor). Even with thread priority tinkering, it’s still not great.

So I really want something with more processing power in each core, to get a more usable user-experience. And I’d like multiple cores so I can do things like dedicate Linux processes to one processor and put all other system functionality and WebUI on the other.

The AI64 would be an excellent choice. I had to kill the desktop manager and run it as a headless server and it is solid. One is in production here and it has not caused any trouble. Its up against metal box dells and its holding its own and from a wall transformer. Its on NVMe and that puts it in a class of its own, kinda like a crossover server. We only need the reliability for internal connections and it is holding its own. Its not even on the backup supply and has been off from power outages and it recovered without intervention.

I was reading the docs on the AM64 and AM62 regarding the PRU and those two don’t play the same tune. Some functionality is on one variant and it does this and not that… It would be nice if they would explicitly indicate what it can and cannot do. Running down a rabbit hole then to discover it dead ends is not good.

Did get CCS up and tried the PRU demos out and it loads the those cores just fine with JTAG, when it wants to. Its flaky but still usable. Now I tried to load the A53 from CCS and it does not debug or load the cores, they spend so much time putting CCS together then the ball bounces. I made the wrong assumption when starting out, from all the hype it appeared to be a continuous solution, that is not the case.

If you spent the time to integrate GDB into CCS it would be okay, that would be about the only way to do it with Linux kernel.

Oh yeah, I have no complaints about the BBAI64’s performance. But with that performance comes a lot of heat, even when the processor is just sitting idle. And that’s a level of power consumption you cannot tolerate in an automotive environment when the engine is OFF. So a low-power mode is needed. And when the ignition is turned ON, a quick-power-from-sleep is needed to return the processor to full power and speed is needed to be ready to do the control necessary with minimal sleep->wake transition lag.

So as much as I like about the BBAI64, there are just as many things that are, essentially shortcomings. The scattered and incomplete documentation on TI’s side, the lack of support and example code for features that TI claims are possible, and as you said, the differences in peripheral capabilities. Just to give an example, look on the website’s home page. The BBAI64 is listed to have 4 PRUs, which is not incorrect, but is incomplete. The J721E/TDA4VM has 2 PRU_ISCSSG subsystems, each of which have 2 PRUs, 2 RTUs, and 2 TX cores. Each of those cores are similar-but-different, and while I did find the documentation on what exactly those differences are, it’s not easy to come across…especially given the TDA4VM documentation doesn’t even mention the presence of those subsystems. (why TI, why?) And you have to just know to also look for J721E documentation (a processor that doesn’t actually exist as far as I can tell). And when you don’t know, you have to go glean that knowledge via forums like this.

So there’s a lot of good, bad, and ugly as it relates to the BBAI64, BB’s relationship with TI, and what TI has offered as support & guidance. And as good as the good is, the bad and ugly are where we, as developers live.

Yes, that is so very true. They do have a big TRM, we were going to use Khadas VIM boards but amlogic is way too secretive for us to use. No TRM and they would not let me have developer access to the good stuff. TI on the other hand does provide MANY points of entry into the device. It must be working very well for others or they would not be where they are at. If you have a large pool of researchers, developers, programmers, hardware people it would be much easier. When you have limited this and that plus trying to find an employee that even knows what an ARM processor is, the other major hurdle.

From what I have read so far the PRU is a home run in theory. All I am doing is getting tired of connecting the missing dots. This is actually way worse than in college. My prof. would give us a “black box” projects all the time. He would sketch an input waveform and an output waveform. We had to do the rest and also said every one’s homework better be different. Point I am getting at is you can always use a scope and signal generator to characterize and experiment with the device. This stuff is purely invisible. Back in the old days we had an ICE (in circuit emulator) that would plug in the SOCKET of the processor and then work on it. This SoC stuff is for the birds.

I read this thread for the first time today… the topic is very specific, and not one that I am particularly interested in. HOWEVER if you are interested in gomer’s $.02 you might abandon your quest to decipher remoteproc.

If your project is at the prototype / poc stage (just my guess), you might consider revisiting the trusty BBB. It still has a lot going for it despite its’ age, like coherent documentation.

At the risk of being a Johnny one note, I’m going to suggest that you abandon remotemsg, and look into ring buffer(s) for your communication between ARM and PRU. At minimum, you could put up your poc without dealing with the poor documentation for more current BB products.

if you take a look at the app that I’ve made available, you’ll see that its’ ARM utilization is barely measurable even though the most complex processing is done there. it takes very few ARM cycles to do the ring buffer processing. While this clock app has no feedback from PRU to ARM, it would be trivial to implement (as a poc).

Ring buffers have lot’s of advantages over even the vaporware speak of remotemsg. They are always async, low overhead, and don’t rely on interrupts.

this gif is the best explanation of how they work, the text of the wikipedia article sadly is meh.

@FredEckert has recently posted that he successfully implemented this application. This gives me confidence that I haven’t missed any critical code or instructions to get it running. Thanks again Fred.

For your timing needs, I’m going to suggest this approach that I use routinely in other projects:

  • set up a 1mhz pwm signal into a pru fast input
  • increment a register for each pulse of the pwm (accurate 1uS timer)
  • segment your pru code into instruction blocks < 200 instructions (run in < 1uS)
  • block (waste) the remainder of the uS waiting for the pwm pulse.

the math of this approach is 4G / 1M = 4K. 4K seconds is about 68minutes 15 seconds then you have to code for rollover.

good luck!

1 Like

Might very well work good in bare metal, but with a linux kernel. Just way too much going on. The part I like about the PRU is according to docs it should run totally autonomous from the main A53 cores running linux. Have I gotten to that point yet, no, other than playing with the “hello world” and testing JTAG has been my reach on the issue. It does seem like it is worth trying to get going. Just assuming the docs are being truth full.

For some reason the docs appear to be written from developers notes. Whomever is writing them does not understand what is going on or they would be able to fully articulate to others what is going on.

Your gif did not load, it seems like many years ago we did implement ring buffers using the old ttl / cmos logic chip designs. That stuff was locked in place so everything was predictable. Its has been many years ago so please forgive me if this is not correct.

it loads for me … strange

main wikipedia page : Circular buffer - Wikipedia

again, not a fan of this article, but the gif is excellent. The main advantage of a ring buffer is that it allows BOTH processors to run simultaneously if sufficiently large. The clock app demonstrates this in code, which is MUCH better than I can describe in english.


1 Like

HA! The whole reason I poked this thread was to see if it was feasible to port gomer’s PRU clock over to the BBAI-64. I figured it would be a good learning experience. However, as a wise man once said, I am not up to the skill level to be the BBAI-64 PRU pioneer… In fact, I hope I never have to use the PRUs. Hope to stay with the R5F processor.

Since I managed to derail yet another thread, we might as well keep the derailment going… As part of my RF5 research, I have been trying to find the time to study RemoteProc and RPMsg. Aren’t vrings ring buffers? I have gotten TI’s RPMsg ipc_echo_baremetal_test program to work Linux to R5F baremetal.

Gomer why are you so against remoteproc?

If I keep all my realtime code in the R5F, it shouldn’t matter. However, I need to dump over more that 512 bytes of data in some sort of a double buffering mechanism. My very preliminary research indicates that DMA-BUF may be what I may need… Any comments?

Gomer thinks not. ring buffers do not use interrupts to ‘kick’ their fellow processors, alerting them that they have a msg. This seems to gomer to be synchronous. Ring buffers are asynchronous, and if sized adequately, neither writer nor reader needs to concern itself with the status of the contra processor. Of course, gomer could be wrong about anything.

Gomer isn’t against remoteproc, although there was a period of annoyance porting from the earlier UIO code. To make the sample clock app turnkey, gomer uses remoteproc to load and start the PRU(s). Remoteproc does this fine.

Gomer doesn’t understand RPMsg and sometimes has thought RPMsg and written remoteproc or remotemsg. Sorry for the confusion gomer is.

Gomer likes the PRU asm coding and has not found any documentation to guide using RPMsg with asm.

R5F baremetal is a new term which gomer does not grok. BUT, since the raw url for this link includes the string ‘ccs’, gomer will ignore.

Realtime code and R5F and bare metal… gomer seems to have strayed into a realm that gomer has NO experience in. Better to remain silent and be thought a fool, than to open your mouth and remove all doubt.

ccs 12 has a demo and I have had that up and running. Have you looked into that one.

You’ve gotten a CCS based, TI provided, demo for the BBAI-64 running!? For the BBAI-64, I’ve only been able to use the makefile demos. Is there a particular device you selected in the wizard or did you have to manually load something? What OS is your host? I need to see this demo!

I tried using their Ubuntu 18 build instructions for ccs and yocto. Both turned into a headache.
Now I am running yocto on Ubuntu 22.04 build server that is a dedicated to TI builds. Did get it up on 20.04 then wiped it out and went upto 22.04. I am building images that boot beagleplay, AM62, BBB. Also have ccs up on 20.04 and 22.04 desktops. Not sure if I kept notes on the ccs and yocto. It will run, did have to hunt for some dependencies. I will check my cherrytree files and see I have any notes. Started a to do a write-up for 20.04 and then changed to 22.04 so that is only in a primative note state at the moment.

Pretty sure if you start out with ccs on 22.04 it will only be trivial issues. Keep in mind they have that loaded with tons of different products some of that stuff might not build. I have only tried it on AM62, also have the wifi evm too but have not gotten to that one yet.

I am doing it on the AM62 EVM board with JTAG probe XPS110, if you get the EVM board you don’t need the probe. The debugger is on the board so it is just a USB cable.

I should also learn to read, just noticed you are on AI64… we are currently with the Beagleplay / AM62.

I had the opposite experience. I had to revert FROM 22.04 to 18.04 after having trouble with getting CCS installed. When I went back to 18.04, all went well and I was able to get CCS along with the C-compiler for PRUs up and compiling. It took some doing, but I eventually got it working. I even wrote myself some makefiles that would build the code I had written and deploy it onto the BBAI64 for me. It’s not perfect code, but it worked for my purposes at the time. Although this was my own code, not demo code. It was absolutely imperative I get the C-compiler for PRUs working because I wasn’t about to try writing assembly or hinge a potential product on maintaining assembly…even as simple as the PRU assembly is, I ain’t about that assembly-life.

1 Like

Glad you got it up and running. One thing that was an issue is firefox snap version. That was the reason some of the stuff on the resource page would not open.

Here is some PRU stuff I just found this morning and posted just in case you might not know about it.

Link to TI-git

Programmable Real-time Unit (PRU) Software Support Package v6.1.0


   The PRU Software Support Package (PSSP) is an add-on package that provides a framework
   and examples for developing software for the Programmable Real-time Unit
   sub-system and Industrial Communication Sub-System (PRU-ICSS) in the supported
   TI processors.  The PRU-ICSS achieves deterministic, real-time processing, direct
   access to I/Os and meets ultra-low-latency requirements.

   This software package contains example PRU firmware code as well as application
   loader code for the host OS. The examples demonstrate the PRU capabilities to
   interact with and control the system and its resources.

   See Release_Notes.txt for details about this specific PSSP version.

   For more details about the PRU, visit

   This package includes the following resources:

	---------	--------
	examples 	Basic PRU examples
	include 	PRU firmware header files
	labs 		Source code for step-by-step labs
	lib 		PRU library files and library source files
	pru_cape 	Demo software for the BeagleBone PRU Cape


   For more information about the PRU, visit:

	Support			-

Hey @foxsquirrel,

Thanks for refreshing my memory on this TI PRU support repo. I looked at this a while back but, didn’t pursue it at the time. I am going to bookmark it for future reference. Way back when I was just getting started with the AI-64, I purchased a PRU-CAPE. As I got into looking at it, it looked like TI (or maybe BB) dropped devicetree support for the PRU-CAPE after a certain point. As I learned more about the AI-64 hardware, I found that even though the header pinout is the same as the BBB, they are not 1:1. So, this made me even less interested in looking at the PRU-CAPE.

I recently purchased a SK-AM64B evaluation kit from TI. When I get some time, using this eval kit, am going to dive into learning CCS and the TI MCU+ SDK. With my understanding at this point, I think this is as close as I am going get to having a TI guided learning experience that will apply to the AI-64. If anyone knows better, please give me some advice.

I am also going to use the cherrytree note system that you have mentioned several times. I need to keep better track of this stuff.


EDIT: I just clicked on the pru-software-support-package link and see that there are recent commits for AM62x, AM64x/AM65x!

1 Like