Minimal Cortex-R5 example on BBAI-64

Hi all,

After scouring TI’s documentation and failing to find a minimal working example for the Cortex-R5 cores that doesn’t depend on the Processor SDK or TI’s compilers, I have decided to whip one up myself.

(Extremely) Minimal Cortex-R5 remoteproc example for Beaglebone AI-64

This is a bare minimum program showing how to initialize the resource table and output into the remoteproc trace.

Any feedback is appreciated.

Regards,
Tony

5 Likes

Nice one Tony, I will have a look when I get some time…

Cheers

Andy

What a great example. Really appreciate you sharing!

An update: I was able to properly configure the MPU and cache to run code from external memory. As a proof of concept I ported the Dhrystone benchmark to the Cortex-R5 running out of DDR memory. Performance is nearly identical to using TCMs due to the benchmark fitting entirely within the R5’s L1 cache.

Code is here in a new branch: Cortex-R5 Dhrystone on DDR memory with MPU and cache for Beaglebone AI-64

Note that you will see many warnings from GCC about the dated Dhrystone code, which for this demonstration can be safely ignored.

Regards,
Tony

5 Likes

You are doing some great stuff here, very useful. Thanks :slight_smile:

Amazing work @Tony_Kao! Thanks for publishing this.

I don’t yet have a BB AI-64 of my own to play with – how is the latency between writing “start” to the “state” sysfs attribute and getting the “hello” message back? I imagine it’s under a second? (And although not directly related to your work, how long does the stock OS image take to boot to userspace?)

I infer the remoteproc devices exposed by the kernel are for the “MAIN”-domain cores. Are you aware of any existing exploration into the “MCU”-domain cores and implementing software images for them? I’m finding a lot of theoretical capabilities of the R5 cores in the reference manual for the TDA4VM as I read but successful implementations are rather sparse. It seems like the expected methodology is to load a boot image into the local RAM of the MCU-domain R5s and then communicate with it via internal SPI or I2C. Perhaps I’m just not looking hard enough in the upstream TI samples.

Also, are you or others aware of what the debugging capabilities on the R5 cores look like? I see the AI-64 maps some UART pins to external debug headers, as well as an MCU-specific header which has a selection of GPIOs. But I don’t see a JTAG interface for on-chip debugging of the R5 cores. Those pins don’t seem to appear in the AI-64 schematic. Are you aware of what’s going on there?

Thanks!

@kaelinl there are Tag Connect JTAG pads next to the SD Card socket.

1 Like

Oh nice, thank you! I was overlooking that port expecting something specific to the MCU cores but evidently I should have read the reference manual more closely.

@Tony_Kao thanks for posting this example! Is there a recommended JTAG and debugger for working with the R5 cores? It looks like @Nishanth_Menon is running openocd and debugging with gdb on his fork.

Any suggestions are appreciated.

BTW, the first example runs fine for me but, the dhrystone is not printing anything in the trace. I need to debug!!

The Dhrystone example worked for me, I haven’t tried the simpler one. I just compiled with the default settings and loaded the ELF firmware onto the R5 core. I got numbers almost exactly the same as the samples in the original GitHub repo. I might be able to help debug.

Re: debugging via direct memory access on-chip, I’m also interested in this. I came across @nmenon’s four patches here (https://review.openocd.org/c/openocd/+/7088) and have a local openocd tree in which I’ve applied them. I cooked up a config file for the j721e and tried to debug code running on the R5F “main” domain cores, but attaching to the second set (running the custom Dhrystone benchmark) failed claiming they were offline and attaching to the first set (running TI’s EdgeAI coprocessor firmware) locked up the whole SoC. I was kind of guessing on the appropriate values from the datasheet to use in config and might have gotten it wrong. I haven’t looked into it more deeply than that. I am also only inferring that this “dmem” driver for emulated debug ports is supposed to work on the TDA4VM. I’d be happy to collect and post what I have if it’s helpful, perhaps in a new thread.

Docs: Jtag is functional (openocd) (#31) · Issues · BeagleBoard.org / BeagleBone AI-64 · GitLab → Are you able to see this? you should have a config file for this.
I am also trying to put some documentation together here: Welcome to an HowTO with OpenOCD on TI’s K3 SoCs | OpenOCD primer for TI’s K3 SoCs

1 Like

aah yes - i need to followup to get it merged, but this should work way easier than the jtag route. Though, it also needs the very latest tifs firmware. my production ai64 is on order, waiting for it to arrive to run some experiments.

Awesome news! Looking forward to working with the R5 cores.

@kaelinl, If it is not too much trouble, what was your DogTag and Kernel for the working Dhrystone. I am going to start another thread to discuss some R5 experimentation I would like to perform.

@FredEckert @kaelinl → Apparently the TIFS firmware (that allows firewall access) update is yet to be released - due in 8.5 release of the firmware from TI. So you might not be able to use the openocd dmem path just yet for j721e… This could explain the various fails you might see as a consequence.

Hmmm OK, that’s unfortunate. Thanks for the information though!

Do you have any sense of when it’ll become available? And is this firmware something I can flash in-place myself? (…what is TIFS?)

Separately, if I end up going the JTAG+TagConnect route, is it indeed infeasible to connect the probe without removing the heatsink? Where’d you get that fan shown in the pictures?

I figure if that new firmware is going to take more than a couple weeks I’ll spend the few hundred dollars on the aggressively overpriced debug equipment and do it the hard way.

Sorry, what is DogTag?

I’m running the following (unmodified) kernel:

debian@BeagleBone:~$ uname -a
Linux BeagleBone 5.10.120-ti-arm64-r63 #1bullseye SMP Fri Sep 2 01:18:17 UTC 2022 aarch64 GNU/Linux

This is the bbai64-debian-11.4-xfce-edgeai-arm64-2022-09-02-10gb.img.xz image.

Oh, also, for reference, these are the openocd options I derived from @Nishanth_Menon 's patches and the J721e datasheet. I have no idea where the 0x1d500000 size came from in the original sample for am625 and am pretty sure it’s wrong.

adapter driver dmem

dmem base_address 0x4C40002000
dmem ap_address_offset 0x0ff
dmem max_aps 10

dmem emu_base_address 0x4C60000000 0x1d500000
dmem emu_ap_list 1
if { ![info exists SOC] } {
        set SOC j721e
}
source [find target/ti_k3.cfg]
adapter speed 2500

No worries on the dogtag, I figured it out. I had to tweak the makefile for my configuration.

Output:

Microseconds for one run through Dhrystone: 0.308095 
Dhrystones per Second:                      3245750.000000 

Running:

Debian@BeagleBone:~$ cat /etc/dogtag
BeagleBoard.org Debian Bullseye Xfce Image 2022-08-26
debian@BeagleBone:~$ uname -a
Linux BeagleBone 5.10.120-ti-arm64-r64 #1bullseye SMP Tue Sep 27 18:52:35 UTC 2022 aarch64 GNU/Linux

Thanks to all for the information. As a new user, the main beaglebone-ai-64 repo wasn’t even on my radar.

Thanks. I just got my ai64 yesterday. I tried the steps below a bit…

Build steps:

sudo apt-get install libhidapi-dev libtool libusb-1.0-0-dev pkg-config 

git clone https://github.com/nmenon/openocd.git
cd openocd
git checkout dap_dmem
./bootstrap
./configure --enable-dmem
make -j4

Debug steps:

cd openocd/tcl
sudo ../src/openocd -f ./board/ti_j721e_swd_native.cfg
This is the tcl snippet from above comment..

$ sudo      k3conf dump processor
|--------------------------------------------------------------------------------|
| VERSION INFO                                                                   |
|--------------------------------------------------------------------------------|
| K3CONF | (version 0.2-nogit built Wed Jul 27 21:21:50 UTC 2022)                |
| SoC    | J721E SR2.0                                                           |
| SYSFW  | ABI: 3.1 (firmware version 0x0008 '8.5.0-v08.04.07-9-g13fbe (Chill)') |
|--------------------------------------------------------------------------------|

|--------------------------------------------------------------------------------------|
| Device ID | Processor ID | Processor Name   | Processor State  | Processor Frequency |
|--------------------------------------------------------------------------------------|
|   202     |      32      | A72SS0_CORE0     | DEVICE_STATE_ON  | 2000000000          |
|   203     |      33      | A72SS0_CORE1     | DEVICE_STATE_ON  | 2000000000          |
|   142     |       3      | C66SS0_CORE0     | DEVICE_STATE_OFF | 1350000000          |
|   143     |       4      | C66SS1_CORE0     | DEVICE_STATE_OFF | 1350000000          |
|    15     |      48      | C71SS0           | DEVICE_STATE_OFF | 1000000000          |
|   250     |       1      | MCU_R5FSS0_CORE0 | DEVICE_STATE_ON  | 1000000000          |
|   251     |       2      | MCU_R5FSS0_CORE1 | DEVICE_STATE_ON  | 1000000000          |
|   245     |       6      | R5FSS0_CORE0     | DEVICE_STATE_OFF | 1000000000          |
|   246     |       7      | R5FSS0_CORE1     | DEVICE_STATE_OFF | 1000000000          |
|   247     |       8      | R5FSS1_CORE0     | DEVICE_STATE_ON  | 1000000000          |
|   248     |       9      | R5FSS1_CORE1     | DEVICE_STATE_OFF | 1000000000          |
|--------------------------------------------------------------------------------------|

So, Info : starting gdb server for j721e.cpu.main1_r5.0 on 3340
should have connected, but…

gdb-multiarch --eval-command="set architecture arm" --eval-command="target remote localhost:3340" test.elf

Error: target->coreid 0 powered down!

I need to dig in more on the addresses a bit more…

TIFS - is TI Foundational security (firmware that runs inside the security enclave)… I hear it is due to be released in a couple of weeks, but working in TI, I do have access to a prebuilt, so I am able to check things out… as in the log above, I did try it out a bit…

 debian@BeagleBone:~/openocd/openocd/tcl$ sudo devmem2 0x4C60000000 
/dev/mem opened.
Memory mapped at address 0xffff9d48f000.
Read at address  0x4C60000000 (0xffff9d48f000): 0x00001003
debian@BeagleBone:~/openocd/openocd/tcl$ sudo devmem2 0x4C40002000
/dev/mem opened.
Memory mapped at address 0xffff9df24000.
Read at address  0x4C40002000 (0xffff9df24000): 0x1BB6402F