Is it possible to write PRU firmware for remoteproc completely in Assembler?

n.dammin · February 19, 2016, 6:34pm

Hi,

I’m using the AM335X Starter Kit from TI with an AM3359 SoC and I use the TI Processor SDK Linux version 02.00.01.07. I managed to get remoteproc driver working, then I removed the Display module and used the flatflex connector to breakout some GPIOs of PRU1 to hook up some LEDs. I wrote a blink-led firmware in CCS v6 as described in the PRU HandsOn Lab and successfully bootet the PRU1 to let my LEDs blink.

Now I need to write a very fast code so I have to write it in Assembly language. I would like to use the AM335x PRU-ICSS Reference Guide to write my code and use the PASM compiler.

Is it possible to write pure assembler code like with the PASM and make the code work for the newer remoteproc? Or can I write assembly code in CodeComposer Studio with the TI compiler? I couldn’t figure out yet how I can do that. Is there a Tutorial somewhere?

I read that TI does not support PRU so good so this is why I ask here.

Regards
Nico

John_Syn · February 19, 2016, 8:05pm

I recommend that you develop your code in C and then hand optimize the assembler where required. This helps document your code and make it more manageable. I have used CCSV6 for developing PRU apps and I think the support is pretty good. Make sure you use the scripts to configure the processor memory map and bring the PRU out of reset or you will have all kinds of issues when debugging. I have used both XDS200 and Blackhawk USB560M JTAG emulators and they both work without issue.

Regards,
John

William_Hermans · February 19, 2016, 10:01pm

Is it possible to write pure assembler code like with the PASM and make the code work for the newer remoteproc? Or can I write assembly code in CodeComposer Studio with the TI compiler? I couldn’t figure out yet how I can do that. Is there a Tutorial somewhere?

So just like any other language in Linux, I’m sure you could write remoteproc completely in assembly. But there is no Assembler like originally for PRU’s, that I’m aware of.

However, I think the more important question would be why on earth would you want to write code for remoteproc / rpmsg in assembler ? The whole idea of remoteproc / rpmsg is to abstract many of those low level details, to make using multiple processors in this way much easier.

Charles_Steinkuehler · February 19, 2016, 11:16pm

The ARM side should be written in C, unless you have a _really_ good
reason not to.

For the PRU, you can code in C or ASM as desired. If you do write
assembly, you will probably want to use the C calling conventions so
you can call your assembly PRU code from C or perhaps a C shim for
remoteproc. Ultimately, it doesn't really matter what you code in as
long as you generate and process the remoteproc messages and/or
interrupts your application needs.

NOTE: The C calling conventions are in the TI compiler documentation
(spruhv7a), section 6.3 "Register Conventions" and 6.4 "Function
Structure and Calling Conventions", and there's a section on mixing
assembly and C: 6.6 "Interfacing C and C++ with Assembly Language".

http://www.ti.com/general/docs/litabsmultiplefilelist.tsp?literatureNumber=spruhv7a

Greg1 · February 20, 2016, 4:11pm

The support from TI is quite extensive:

http://processors.wiki.ti.com/index.php/PRU-ICSS

Download the C compiler manual. There is a section which describes several ways to incorporate assembly code.
This looks like a very detailed manual, which combined with the examples in the pru support package should be very helpful.

I’m still coming up to speed on all of this, and it’s complicated because you have to think about what is going on with the C compiler, remoteproc, rpmsg, and
all of the details of what is going with these sort of kernel processes and the virtIO bus mechanism. Too much going on for a Linux newbie, I’ve had to retreat
and study some of the fundamentals before getting back to this (I hope!).

You need to be aware the PASM is no longer supported. The path forward is clpru, which is the C compiler which works with the included assembler (asmpru?).
There are some differences in the way assembly code is written for the newer assembler (there are notes on this in the command line package download).

I was also able to get the examples going with the PRU cape using remoteproc and version 4 kernel (Robert Nelson’s testing image). This massively simplified the process
compared to what you see the in the TI “Hands On Labs” tutorial. Pretty much everything with regards to remoteproc and the clpru compiler is ready-to-run. You don’t need cross-compilation
or the IDE, all can be done at the command line on the BBB. If you prefer to operate at the command line all the tools are there.

Please correct me if I’ve got this wrong, but I think it’s fair to say that TI has provided a wealth of information for the PRU, however, they expect further support to be coming from the community.

Here’s another really great contribution by TI:

http://processors.wiki.ti.com/index.php/PRU-ICSS_Remoteproc_and_RPMsg

Regards,
Greg

Dimitar_Dimitrov · February 20, 2016, 7:01pm

Nico,

There are two prerequisites for your PRU firmware to be loaded by the remoteproc d
river:

PRU firmware image must be in ELF format. Only TI’s clpru and the unofficial G
NU PRU toolchain support this.
PRU firmware must include a “.resource_table” ELF section containing a resourc
e table.

You cannot use PASM with remoteproc because PASM cannot output ELF.

Here is a GNU assembler example that can be loaded and executed by remoteproc: htt
ps://github.com/dinuxbg/pru-gcc-examples/blob/master/blinking-led/pru/main1.S . Yo
u should be able to write something similar for TI’s clpru assembler. But as others
have pointed, it is more sensible to start with C and optimize only the critical p
arts of your program in assembly.

Regards,
Dimitar

John_Syn · February 20, 2016, 7:23pm

The support from TI is quite extensive:

http://processors.wiki.ti.com/index.php/PRU-ICSS

Download the C compiler manual. There is a section which describes several ways to incorporate assembly code.
This looks like a very detailed manual, which combined with the examples in the pru support package should be very helpful.

I’m still coming up to speed on all of this, and it’s complicated because you have to think about what is going on with the C compiler, remoteproc, rpmsg, and
all of the details of what is going with these sort of kernel processes and the virtIO bus mechanism. Too much going on for a Linux newbie, I’ve had to retreat
and study some of the fundamentals before getting back to this (I hope!).

You need to be aware the PASM is no longer supported. The path forward is clpru, which is the C compiler which works with the included assembler (asmpru?).
There are some differences in the way assembly code is written for the newer assembler (there are notes on this in the command line package download).

I was also able to get the examples going with the PRU cape using remoteproc and version 4 kernel (Robert Nelson’s testing image). This massively simplified the process
compared to what you see the in the TI “Hands On Labs” tutorial. Pretty much everything with regards to remoteproc and the clpru compiler is ready-to-run. You don’t need cross-compilation
or the IDE, all can be done at the command line on the BBB. If you prefer to operate at the command line all the tools are there.

Please correct me if I’ve got this wrong, but I think it’s fair to say that TI has provided a wealth of information for the PRU, however, they expect further support to be coming from the community.

Here’s another really great contribution by TI:

http://processors.wiki.ti.com/index.php/PRU-ICSS_Remoteproc_and_RPMsg

This is an excellent explanation of the workings of Remoteproc/RPMSG. Thanks for sharing.

Regards,
John

William_Hermans · February 20, 2016, 7:45pm

This is an excellent explanation of the workings of Remoteproc/RPMSG. Thanks for sharing.

Regards,
John

Yeah I’ve seen that, or something similar it is pretty good, except there is still one problem. That explanation implies it instructs us how to use the PRU hardware with rpmsg, and I suppose on some level it really does. But what it does not explain, is how to interact with the rest of the on chip hardware through this mechanism.

Sending text messages between ARM, and PRU processors is a good intro demonstration of the software, but it is not really the least bit useful in the real world.

Anyway, people like me who are very experienced with writing code, will be put off using rpmsg etc because of this. Is it really so much to ask for example code to demonstrate how to interact with the on die hardware ? Without having to download 1GB of pretty much useless library . . .

John_Syn · February 20, 2016, 8:01pm

Hi William,

So here is how I like to use this. The PRU is performing some function and I send commands to modify that function. An example would be controlling the position of a stepper motor. The ARM app sends a new position and the PRU takes care of stepping the motor to that new location. I think of the PRU as being good at doing low latency stuff and I use RPMSG/Remoteproc to send instructions and then I get feedback on measurements from the PRU. The interface isn’t fast enough to do anything more that this. Simply flashing an LED by sending a command isn’t the best use of this technology. Changing the flashing rate or the duty cycle is more appropriate. I hope I’m answering your question.

Regards,
John

William_Hermans · February 20, 2016, 8:29pm

I hope I’m answering your question.

No, not even close. I need an answer that gives an example in code, how to use on die peripherals, through the PRU’s, when using remoteproc / rpmsg. Passed that, I do not want to download a couple gigs of data for software I do not need, or even want.

What would be really good, would be a github example. Blinking an on board LED or toggling a GPIO would be the simplest, but anything demonstrating using the onboard peripherals. ADC, I2C, CAN, or even just GPIO - whichever. The ARM processor side code would not exactly be so important, except it would be a good example of how the two sides of software interact with one another.

John_Syn · February 20, 2016, 9:08pm

The PRU examples that I have pointed out several times do exactly what you are asking for. Also, several other posters have shown how to build these examples without CCSV6. After you build the PRU code, you have to place it in /lib/firmware so that Remoteproc can load it into the PRU, configure resources and start the PRU code.

Regards,
John

William_Hermans · February 20, 2016, 11:59pm

The PRU examples that I have pointed out several times do exactly what you are asking for. Also, several other posters have shown how to build these examples without CCSV6. After you build the PRU code, you have to place it in /lib/firmware so that Remoteproc can load it into the PRU, configure resources and start the PRU code.
Regards,
John

We’ll just have to agree to disagree. Since I’m a very experienced programmer who has not had any problems setting up, or writing / using software for multiple other aspects of the hardware. Somehow, it must be my fault.

Przemek_Klosowski · February 21, 2016, 4:45am

William,

I must be missing something, because I see remoteproc as a
communication and management mechanism for code on CPUs other than the
main processor. The actual code that you are running on those
subsidiary processors does not depend on the mechanism you use for
talking to it (other than the parts that do the talking, of course).

In particular, running ADC, I2C or GPIO should be the same, regardless
whether you use remoteproc or not---what changes is how you tell this
code what to do.

Does it make sense to you?

William_Hermans · February 21, 2016, 5:39am

William,

I must be missing something, because I see remoteproc as a
communication and management mechanism for code on CPUs other than the
main processor. The actual code that you are running on those
subsidiary processors does not depend on the mechanism you use for
talking to it (other than the parts that do the talking, of course).

In particular, running ADC, I2C or GPIO should be the same, regardless
whether you use remoteproc or not—what changes is how you tell this
code what to do.

Does it make sense to you?

What it is suppose to do hs always made sense to me. How exactlyit is done, is another story.

with uio_prussdrv, you have a driver module, which sets various things up, loads the PRU binary, and then enables / runs the PRU(s). On the PRU side, the code runs, communicates with various peripherals as needed( usually one, if any ), and then the PRU code performs it’s function as specified in assembly. Sometimes, dumping data into ddr3( as per the example ), and sometimes not.

Anyway, the above is a fairly rough description, but how each aspect communicates with the other is abundantly clear in code. Some have even attempted to describe what happens, but if you ask me inadequately. No matter though the code is pretty clear.

With remoteproc, the Documentation/*txt documentation is very minimal, and does not describe the process in which it works very well. However, the code is fairly clear as to how the ARM, and PRU sides communicate with one another( rpmsg ). However, what is not clear, is how the PRU code actually manipulates the physics on system hardware. Additionally, to confuse matters even more, the assembler has changed to a compiler( C - clpru ), and there is something like “map” files for hardware configuration that do not seem to be very well documented. Just some examples, that are not very clear as to how, or why these are even needed.

So here I am, attempting to learn a few things new to me. Documentation is very poor, TI refuses to answer any questions in relation to PRUs on their e2e forums(" go to beagleboard.org google groups . . ." ). I spend several days learning about everything PRU related, and immediately pick up the concept of uio_prussdrv. Still having a hard time with the TI C compiler on the PRU side of things, largely due to these mysterious configuration files. But no matter, the TI Assembler is fairly straight forward, the PRU instruction set is a minimal Cortex M3 set, and easy.

Anyway, for context of my competence level. Not long ago I wrote a set of processes / applications to read from the CANBUS in realtime, decode the CANBUS data, and shuffle this decoded data out over a websocket. This required me learning several aspect of Linux systems programming from scratch. Including POSIX shared memory files, socketCAN, and process spawning / management. All from scratch, since this was my first major Linux application. All of this including reverse engineering parts of the high level CANBUS protocol took me around a month. The point here is, I have no problem picking up / understanding technologies, and / or API’s, libraries, and such that I’ve previously have had no experience with. So long as there is at least a little decent documentation on the subject, or I can talk to someone who does understand things that may be confusing to me.

Additionally, I’m not saying exactly that remoteproc can’t be made to work, because obviously it can. What I am saying is that since the concept is so poorly documented, is still in experimental phase, and now I learn that it is slower than traditional prussdrv drivers / methods. That it’s just not worth my time to even attempt to get working.

That and I have spent some time ( roughly a week ), just because I’m the type that does not mind experimenting with new technology in software. But only new technology that is not too argumentative. As my time is far too valuable to me than to screw around with technology that honestly makes very little sense to me.

Also for what it is worth. remoteproc / rpmsg in my own mind is far more useful in cases where a processor may have multiple application / general purpose cores. In that one core can be made to run Linux, while the others can be made to run bare metal - Simultaneously. Less useful on the case of the PRUs since we already have a software layer that is well documented, works very well, and quite honestly far superior to remoteproc / rpmsg in this case. If nothing else. Speed.

William_Hermans · February 21, 2016, 5:53am

I do expect that TI will improve the documentation on their implementation of remoteproc / rpmsg sometime in the future though. As in the case of the X15, there are not only 4 on die PRU’s, but there are 4 IPU’s( 2 usable for general purpose ), and two DSP’s( on the dual core A15 ). I’ve no idea what TI has compiler / assembler wise for these DSP’s but the IPU’s from what I understand are fairly new( in the context of general purpose ). So I’d assume this is where remoteproc / rpmsg will make the most sense. the on die IPU’s

John_Syn · February 21, 2016, 6:20am

William,

I must be missing something, because I see remoteproc as a
communication and management mechanism for code on CPUs other than the
main processor. The actual code that you are running on those
subsidiary processors does not depend on the mechanism you use for
talking to it (other than the parts that do the talking, of course).

In particular, running ADC, I2C or GPIO should be the same, regardless
whether you use remoteproc or not—what changes is how you tell this
code what to do.

Does it make sense to you?

What it is suppose to do hs always made sense to me. How exactlyit is done, is another story.

with uio_prussdrv, you have a driver module, which sets various things up, loads the PRU binary, and then enables / runs the PRU(s). On the PRU side, the code runs, communicates with various peripherals as needed( usually one, if any ), and then the PRU code performs it’s function as specified in assembly. Sometimes, dumping data into ddr3( as per the example ), and sometimes not.

Anyway, the above is a fairly rough description, but how each aspect communicates with the other is abundantly clear in code. Some have even attempted to describe what happens, but if you ask me inadequately. No matter though the code is pretty clear.

With remoteproc, the Documentation/*txt documentation is very minimal, and does not describe the process in which it works very well. However, the code is fairly clear as to how the ARM, and PRU sides communicate with one another( rpmsg ). However, what is not clear, is how the PRU code actually manipulates the physics on system hardware. Additionally, to confuse matters even more, the assembler has changed to a compiler( C - clpru ), and there is something like “map” files for hardware configuration that do not seem to be very well documented. Just some examples, that are not very clear as to how, or why these are even needed.

What do you mean by “how the PRU code actually manipulates the physics on system hardware?

This is standard PRU code that toggles PRU dedicated IO, sets/clears register values of peripherals, in exactly the same way as the code that you run via prussdrv which is just doing the same, but via UIO. I think you are just pulling my leg here. This is trivial stuff. What is complicated? I’m scratching my head and totally confused

Regards,
John

John_Syn · February 21, 2016, 6:20am

The IPU’s are CortexM4 processors.

Regards,
John

William_Hermans · February 21, 2016, 6:22am

The IPU’s are CortexM4 processors.

Regards,

John

You’re just now figuring that out ?

William_Hermans · February 21, 2016, 6:30am

I think more correctly said. They’re similar to a Cortex M4 that sits on an Lx host processor interconnect. So you can not just use the eabi-none gcc port to make them work . . .

John_Syn · February 21, 2016, 6:40am

Ah, so I just use CCSV6 which has all the scripts that take the CortexM4s out of reset and configures their memory map so that I can write code and debug pretty quickly. Now if you don’t use CCSV6, you have to do all that via the CortexA15s and that is going to be very difficult for development. I’ve been doing this on the OMAP5 for several years, which has many of the same features as AM5728. I also use CCSV6 for the DSPs, which have the same issues. The TI DSP C compiler is highly optimized for the C66 DSP which has many cores that operate in parallel. Also, the instrumentation provided by CCSV6 makes it possible to do very accurate measurements while running live code. This is especially important for multithreaded applications. BTW, I believe CCSV6 doesn’t need a license for code that is less than 16K.

Regards,
John