Bela Support for BBAI/Later Ti-Chips

This Topic aims to discuss everything related to providing the support for Bela Cape and Software to be compatible with the BBAI/ and later Ti Chips.

Proposal link: BeagleBoard/GSoC/2021 Proposal/bela on bbai - eLinux.org
Mentor: @giuliomoro

As mentioned in my proposal,

I will need to change the initialization code in PRU.cpp that is currently relying on lib prussdrv and move to rproc .

I was unsure then, if rproc provides some functionalities to access the PRU’s RAM the way prussdrv_map_prumem() used to.
After some digging online, I came across remoteproc.txt, which describes the User API. After going through, I looked up @Pratim_Ugale 's informative blog on RPMsg. In the refer. of the blog, I then found rpmsg.txt that describes how to write rpmsg drivers. It says,

Rpmsg is a virtio-based messaging bus that allows kernel drivers to communicate
with remote processors available on the system. In turn, drivers could then
expose appropriate user space interfaces, if needed.

and it also describes it’s User API.

Upon further lookup, I stumbled across am335x-pru-and-c-c-compiler-memory-access which then led me to rpmsg-quick-start-guide-wiki where I found this zip file. It says,

RPMsg is a method of passing messages and data between the PRU cores and the ARM core running Linux in Texas Instruments’ Sitara devices (currently supported in AM335x, AM437x, AM572x, and K2G). RPMsg is enabled by a combination of remoteproc and the virtio framework.

For more information, I visited Foundational_Components_PRU-ICSS_PRU_ICSSG which covers the software aspects of getting started on the PRU-ICSS and the PRU_ICSSG.

I just wanted to ask the mentors if I am headed in the right direction or not?
Once I get their approval, I will start detailed study of RPMsg and the Foundational_Components_PRU-ICSS_PRU_ICSSG .
I am also not very clear on where exactly should I start? Should I start going through the Bela Core and understand all the headers and source files, or should I start studying how to write device tree overlays to write one for the Bela Cape? Or should I study about what I have just described above in more depth and try to convert the PRU.cpp code to use RProc?

I think the easiest way to do this is to replicate the mmap approach that is used under the hood by libprussdrv. This if for a few reasons:

  • the code already exists (in the libprussdrv source). We can even thing of keep using libprussdrv for memory sharing purposes, but loading the firmware via rproc.
  • the ARM/PRU interaction is based on shared memory and either PRU->ARM interrupts or ARM polling a flag in PRU RAM. Switching to a message-based protocol like rpmsg would require a rethinking of the architecture, and I don’t think there is scope for it in this project
  • rmpsg from the PRU side will probably need to programmed in C. The current PRU firmware codebase is entirely in assembly.
  • I shall assume that the rpmsg user will go through some non real-time safe paths. Using it from a Xenomai real-time context would require at least to check the driver for such paths and probably rewrite it as rtdm (Xenomai) driver. This could be a massive time sink that I don’t think belongs in this project.

The very first step could be to port the BB-AUDI-02 overlay. This to verify that all the clocks and peripherals are set up appropriately and that the audio runs at the expected pitch. Besides being an achievement in itself, this step ensures we have a reference setup that can be used in troubleshooting later down the line. If helpful, you can use as a reference the commits I made on top of this. Once this is done, you should be able to run the programs in here.
Next step would be to use what you learned in creating the above overlay to port the Bela device tree overlay, which is largely based on the BB-AUDI-02 above.

At this point, you will look around if a Xenomai kernel that supports the device tree you created is indeed available, or look into procuring one (Xenomai stable support for ARMv7 is now at 5.4 IIRC. Testing is already happening on more modern kernels). If this hits a roadblock, nevermind, we can progress without Xenomai for the time being.

This would be a good time to move to using rproc instead of librpussdrv. This involves editing core/PRU.cpp in the main Bela repo to load the firmware and start the PRU via rproc instead of libprussdrv. As discussed above, for now you could leave libprussdrv in place. At the same time, you will need to modifiy the Makefile in the main Bela repo to implement the workflow described in your proposal (.p → pasm → clpru) to generate a PRU binary ready to be used via rproc. At this point, you will be building the Bela code with BELA_USE_DEFINE=BELA_USE_POLL, to temporarily avoid the dependency on the rtdm_pruss_irq driver, which may also need to be updated, but this can wait until later.
Once that firmware loads correctly and it starts, it will probably hit a wall trying to access the peripherals (as the hardcoded memory addresses for the peripherals are specific to the AM335x). So at that point you will have to modify the McASP, McSPI and GPIO address constants in the PRU code. If you have trouble once the addresses have been updated, you may have to do a side-by-side comparison of the register maps of those peripherals between the AM335x and AM572x TRMs to see if anything changed. Running the code with SPI and GPIO disabled could also help to troubleshoot things one at a time. The GPIO pins in include/digital_gpio_mapping.h and the mappings in pru/pru_rtaudio.p and will also have to be amended to the pins actually in use.

If everything goes fine till now, it’s time to look at rtdm_pruss_irq. Updating the INTC_PHYS_BASE address there may be enough.

Thank you for answering all the questions!
I will go through everything you just mentioned and get back.
Meanwhile, just for record, @giuliomoro and I had an E-mail discussion on a few other questions, and I am pasting his reply email below:


  1. Please can you provide some docs about what this repo does and where it has been implemented and how?
    GitHub - giuliomoro/beaglebone-ai-bela

IIRC, that code is based on the repo that came with the BBAI I got (it was a beta version). The two commits I made on it are basically backporting the BB-AUDI-02 overlay from bb.org-overlays into the device tree itself, setting the correct pins and setting up the McASP and I2C peripherals. The BB-AUDI-02 overlay can be used on the BeagleBone Black to use the audio codec of the Bela cape via ALSA. The work on this repo was therefore a proof of concept that the pins are working, without need to have the PRU and Xenomai setup. Something was still not working, if I remember correctly: the MCLK clock generated by the McASP was 20MHz instead of 24MHz, but that shouldn’t be too complicated to fix. In this case I was editing the device tree directly (there was no support for overlays on BBAI back then). Note that to use the Bela codec via the Bela API, using PRU and Xenomai, the overlay to use is this one instead bb.org-overlays/BB-BELA-00A1.dts at master · BelaPlatform/bb.org-overlays · GitHub .

Now, if you start working on this, you may just want to use these two commits as a guideline as to what the pinmux settings are and what the device configuration should be like. I’d expect you’d be creating an overlay (maybe through the cape compatibility layer), where the changes I made here

  1. We have planned on adapting the Bela PRU and ARM code and workflow to use the PRU via remoteproc instead of uio_pruss.
    So basically what I know is that we can write C codes that then get converted to asm and then run on the PRUs. Should I attempt to do that in core/PRU.cpp? What we want is to make pru.cpp run on the pru right? Using PRU Compiler then linker and then installing the firmware onto the pru?
    Is my understanding correct?

No. PRU.cpp runs on ARM. It loads the firmware to the PRU, starts it and communicates with it via shared memory. The firmware that runs on the PRU is pru/pru_rtaudio.p (or pru/pru_rtaudio_irq.p when pruUsesMcaspIrq in PRU.cpp is true). You wouldn’t need to write any C code for the PRU, at most modify the
file(s) above to have different McASP, McSPI and GPIO addresses. The workflow I suggested allows to use the existing PRU assembly and have it run via rproc.

  1. I am not sure as to why we shud want to test the code first running on vanilla Linux without Xenomai? If we are developing for xenomai then shouldn’t we juice it’s advantages of realtime capabilities?
    Won’t things be vv different in vanilla linux compared to xenomai?

A Xenomai kernel (4.14 or 4.19?) has been built for the BBAI, but it has not been tested - to my knowledge - and it may be too old to support cape compatibility. We can probably ask Robert Nelson to build a more modern one, but I thought it may be worth exploring the vanilla kernel option first if any of these things gets in the way. The difference in performance would be such that a vanilla kernel will most likely not be able 100% hard real-time performance, but with larger blocksizes we can probably work around it for testing purposes.

  1. Should i create a PR to this repo:
    Bela/Bela.h at master · BelaPlatform/Bela · GitHub
    Adding bbai into the enum so that it can be used by
    Bela/board_detect.cpp at 86af0dd741d9668cf0d4f13b4d5202ee7d86f607 · BelaPlatform/Bela · GitHub
    By Bela_detectUserHw() and Bela_checkHwCompatibility()…?
    Or is there some other place that I should create the pr?

I can give you access to a private fork which we will be releasing in a few weeks, where things are a bit more structured and easier to make additions to.

a. how exactly is this belaconfig generated?

Bela_detectHw() performs some heuristics (involving accessing EEPROM. I2C and SPI) to figure out what board and combination of capes we are currently running on. When it finds a valid combination of hardware, it caches it to /run/bela/belaconfig, so that upon further calls it does not poke with the peripherals anymore (until reboot, when /run/ is wiped).
~/.bela/belaconfig is generated manually by the user in case they want to override the result of the automated detection. For instance, when using a Bela cape and a CTAG cape at the same time, the user may want to only enable one or the other at runtime. This can be done by passing --board ACTIVEBOARD as a command-line option, or by setting BOARD=ACTIVEBOARD in ~/.bela/belaconfig.

b. If on a BBAI, what name will it give to BBAI? ( like it does for CtagBeast /Bela/ Bela_mini) or does it not detect the BBAI, rather is made only for the cape hence it will show bela only since we use Bela cape with the BBAI?

To keep going with the current logic in there, we would need to have an entry for each combination of BBB/BBAI and Bela/CtagBeast/CtagBeastBela/CtagFace/CtagFaceBela/. To preserve backwards compatibilty, this effectively means adding these enums:

BelaAi
CtagBeastAi
CtagBeastBelaAi
CtagFaceAi
CtagFaceBelaAi

Yes, this is ugly, and for the scope of your project, only BelaAi should suffice. When we introduced this system, it was to distinguish between Bela and BelaMini and it was perfectly reasonable … now it is getting real weird. Combinatorial explosion! :slight_smile: A better way would instead of having enums, using the value as a set of flags. We can discuss further on what the best way of doing it is.

Q1. As discussed in the proposal

I will need to change these MCASP addresses in pru/pru_rtaudio.p

So I have gone through the TRMs and came up with this:

address McASP# in BBAI
0x48460000 McASP 1
0x48464000 McASP2

However, if I replace the older addresses with the one’s for BBAi, will that not break BBB compatibility? How can we tackle this issue?

Q2. from page 6033 AM57x TRM:

The device have integrated eight McASP modules with:
• McASP1 and McASP2 supporting up to 16 channels with independent TX/RX clock/sync domain
• McASP3 through McASP8 modules supporting up to 4 channels with independent TX/RXclock/syncdomain

from pg 4655 AM335x TRM:

The device contains two instantiations of the McASP subsystem: McASP0 and McASP1

Q2. A. Just to be sure that I have read it right, The McASP 3 to 8 are completely exclusive to the BBAI and were absent in the earlier BBB is that right?
Q2. B. Also in the AM57x they have changed McASP0and1 to McASP1 and 2 is that right? So will we be making that change as
MCASP0_BASE 0x48038000 to MCASP1_BASE 0x48460000 (and same for MCASP1 change to 2)
or will we still keep using the names MCASP0_BASE0 and 1?

Q3. As discussed in our suggested workflow, we are going to Process the bin through the disassembler and make it ready to be included inside an *__asm__* directive
However the repo : GitHub - giuliomoro/prudebug at disassembler mentions that

THIS PROGRAM HAS VERY LIMITED TESTING - USE AT YOUR OWN RISK.

So I was thinking is there any other approach that is more stable?


Q4.

the BBAI can be added to the BelaHw enum in include/Bela.h and used in Bela_detectUserHw() and Bela_checkHwCompatibility() in core/board_detect.cpp.

I have tried to complete this objective and have pushed it to our repo, can you please help me debug if it works or not?
As discussed on e-mail

A better way would instead of having enums, using the value as a set of flags. We can discuss further on what the best way of doing it is.

If the above changes work, and we are able to suvvessfully detect BelaAI, then we can move on to discuss the flags approach…

you can use #ifdef to make them conditional at compile time. The Makefile could detect whether we are running on an AM335x or an AM572x and define a flag accordingly. This same flag can be used for conditional compilation of the libprussdrv vs rproc stuff in core/PRU.cpp. This could be the initial approach because it’s faster to get it done. Later in the project, we could move to a runtime flag if appropriate by passing a flag to the PRU at runtime for whether we are on am335x or am527x and allow it to select the relevant addresses at runtime. Each approach has its own merit, we can figure out which is best down the line. In the meantime, a conditional define in the code triggered by either automatic detection in the Makefile or a manual option when calling make is probably the best starting point.

that seems about right.

I think we are using only MCASP1 on the BBAI, so you don’t really need MCASP2. Also, the define for MCASP1 on the AM335x is not needed, because we never use that. The MCSPI and GPIO addresses will also need to change between the two boards. We can think about how to best rationalise the defines.

if you find another stand-alone PRU disassembler, feel free to plug it in. This one has very limited testing in that I only tested it to ensure it produces assembly that is recognised as valid from clpru when disassembling the Bela PRU code. Personally, I have no reason to think it doesn’t work properly, so I’d wait until it proves unfit for the task before looking elsewhere.

I commented on it just now

great

1 Like

Updates: This Project has been selected for GSoC 2021.
Will now use this thread for all further discussion and will maintain communication logs here as well.
Mentors: @giuliomoro , @rma (Robert Manzke ), and @nerdboy (Stephen Arnold)

Todays chat logs on slack:

Dhruva gole Today at 10:30 PM

@giuliomoro I was going through the current Bela repo makefile and couldn’t really understand much, but if I am not mistaken it will only work on xenomai kernel builds and not the vanilla one’s?
If that is so then will I have to write a new makefile from scratch for running on vanilla?

29 replies

giuliomoro 1 hour ago

it’s largely true that it won’t build on a non-Xenomai machine. There are a couple of workarounds for that without rewriting everything from scratch and I guess that would be good exercise to get familiar with the build process and Makefile. Do you have a Linux machine available?

giuliomoro 1 hour ago

or will you be waiting for the BBAI to reach you? By the way, send me an email with your shipping details so I can get you a Bela cape. The BBAI will come from BB people, as I understand you already gave jkridner your address.

Dhruva gole 1 hour ago

yes I am currently running Linux Mint 20.1 x86_64 .

Dhruva gole 1 hour ago

How can I compile on my linux laptop?

giuliomoro 1 hour ago

is libmercury available on your system?

giuliomoro 1 hour ago

it is Xenomai’s soft-real-time user-space library.

giuliomoro 1 hour ago

and that may well just about get you to build the code on your laptop. Possibly it may even allow you to run it on BBAI without a Xenomai kernel.e (edited)

Dhruva gole 1 hour ago

okay I will check that out

giuliomoro 1 hour ago

if you can’t find libmercury, you can build it from source Installing_Xenomai_3 · Wiki · xenomai / xenomai · GitLab

Dhruva gole 1 hour ago

okay, I have also stumbled across this: Installation steps for xenomai 3 on Ubuntu 16.04 - Stack Overflow
Do you think I shud give it a try as well?

Stack OverflowStack Overflow

Installation steps for xenomai 3 on Ubuntu 16.04

I would like to install Xenomai 3 on Ubuntu 16.04 LTS. I wan unable to find any proper installation guide/procedure for the same. Request help.

giuliomoro 1 hour ago

that involves installing a Xenomai-patched kernel , which I don’t recommend you do because you won’t be able to run the code on your computer anyhow (because it lacks a PRU!)

Dhruva gole 1 hour ago

hahaha yes true

giuliomoro 1 hour ago

so my understanding is that you’ll be building locally only until you get a beaglebone

Dhruva: YES

Dhruva gole 1 hour ago

so assuming the makefile will work, the additions to be made are according to our discussed workflow right? For starters I will try to make a flag for detecting cur_hardware

Dhruva gole 1 hour ago

cur_hw = 1 // BBB
cur_hw= 2 // BBAI
cur_hw = -1 // Not supported

Dhruva gole 1 hour ago

Until I get a BeagleBone, yes I will be building on my personal computer

giuliomoro 1 hour ago

where would that flag be? in the Makefile ?

giuliomoro 1 hour ago

[PS: I sent you a private message]

1

Dhruva gole 44 minutes ago

yes, in the makefile I was thinking?

Dhruva gole 42 minutes ago

as you had mentioned on the forum:
you can use `#ifdef` to make them conditional at compile time. The Makefile could detect whether we are running on an AM335x or an AM572x and define a flag accordingly

giuliomoro 40 minutes ago

Sure, that’s fine. Alternatively, we could make it a runtime flag, given how runtime detection will take place anyhow, but probably best to start with build-time detection, so it will also be easier to modify the PRU files.

1

giuliomoro 38 minutes ago

and anyhow, I understand that BB releases different images for AM5729 vs AM3358, so … build-time detection may just be the way forward.

Dhruva gole 36 minutes ago

Right