PRU Mem Map - General Registers ,Shared Ram,Data Ram

Hi All,

My next goal is to have a have a C program on the Main processor read and write values into → PRU Memory. There are plenty of examples about this however they are not returning my register values. Indeed I am able to read/write somewhere and persist value into PRU memory. Successive program runs which implement a counter seem to work, but I am guessing that my understanding of the start address for what I think I am writing /reading is wrong. I would call it half working. On the Flip side , inside the PRU I have a simple Assembly program that has a counter and various static values in other R1,2,3,4,5,6,7 etc to signal to me that I am accessing the right registers form the C program. Note that the two counter loops are decoupled. The first one was just used for testing.

Here is my basic understanding - Focusing on PRU0:

Each PRU has 8K of ‘dataram’ - This is where I expect R1,R2,R3 ---- R31 to be stored. Is this true? I see many people changing the reference at 0x0000_0n00, n = c24_blk_index[3:0], do I need to set where the Rn’s lay down in memory?

Note that in one of my assembly registers R2 I have an value that determines the speed of a blinking LED. For all intents let’s pretend I want to change that value.

Moving on.

prussdrv_map_prumem(PRUSS0_PRU0_DATARAM, &sharedMem);
sharedMem_int = (unsigned int*) sharedMem;

Looking at the above C code - this appears universal in the examples. Get an array pointer to the allocated PRU memory PRUSS0_PRU0_DATARAM.

Since in my PRU assembly program I have Registers set everywhere to values I will recognize I would expect to see my values when calling:

result_0 = sharedMem_int[0];
result_1 = sharedMem_int[1];
result_2 = sharedMem_int[2];
result_3 = sharedMem_int[3];

etc…

Printing these out do not reflect the values I am setting and using in the PRU.

So now we must be entering into a world of offsets. We must be reading some other address space,

I did see alternative examples that an offset was used like sharedMem_int[OFFSET +1]; but that I am sure had more to do with the Shared memory area of 12K ram.

Should the registers be available as I have described?
Do I need to do anything in the assembly to allow the C program into the general Rn registers?
Do general Rn registers need an offset?
Does the assembly need to set a pointer for where the general registers get written? When R1 is set for example does it look for a base address in the constants table?

Docs also state that the PRU 0 Data ram starts at 0x4a300000;

int registerStart;
registerStart = (int)0x4a300000;
printf("–> R0 = %d" + registerStart);

However I get a seg fault trying to print what is in R0 that way. That was more to just do a direct look see if possible and go around all the interfaces.

Here is my basic understanding - Focusing on PRU0:

Each PRU has 8K of 'dataram' - This is where I expect R1,R2,R3 ---- R31 to be
stored. *Is this true? I see many people changing the reference at *0x0000_0n00,
n = c24_blk_index[3:0], do I need to set where the Rn's lay down in memory?

NO

The data ram is what it says...data ram. The registers are what they
say...registers. Registers are *NOT* data ram. If you want the
register values to appear in memory, you have to write them out using
the SBBO instruction.

Docs also state that the PRU 0 Data ram starts at *0x4a300000*;

     int registerStart;
     registerStart = *(int*)0x4a300000;
     printf("--> R0 = %d" + registerStart);

However I get a seg fault trying to print what is in R0 that way. That was more
to just do a direct look see if possible and go around all the interfaces.

0x4a300000 is a physical address. You can use that if you are
directly accessing memory (via /dev/mem, bus-mastering DMA, or
something that doesn't use an MMU like the PRU core). If you try to
access a physical address from a standard application that has not
been mapped into your process memory space, the MMU will forbid access
and your program seg-faults.

To access the PRU memory in your application, use the address provided
to you by the prussdrv_map_prumem function.

Thx Charles, that was it. I was treating the registers as application of dataram memory.

In the assembly loop: I did a : sbbo r0, r0, 0 , 48

and like magic my c pru memap dumped out values I have stuffed in some of the registers.

see below

Hi, check my comments inline.

Thx Charles, that was it. I was treating the registers as application of dataram memory.

In the assembly loop: I did a : sbbo r0, r0, 0 , 48

and like magic my c pru memap dumped out values I have stuffed in some of the registers.

see below


value R0 = 0
value R1 = 65535
value R2 = 8192
value R3 = 16
value R4 = 777
value R5 = 25
value R6 = -136853601
value R7 = 2146680819
value R8 = 1
value R9 = -45491713
value R10 = -89
value R11 = -1345356802


I do have a more basic question though about the value in R2 = 8192. My understanding is the general purpose registers are 32 bit.

In my assembly I set

r2 = 0x0BEBC200 // decimal 200,000,000 to reflect the core frequency.

however as you can see the R2 after the mem copy to dataram shows 8192. Why is it not reading 200,000,000 in R2 after the transfer?

Could you share your full source code?


Also, another question. Syntax wise the first r0 in the statement below ‘should’ have &r0 but I get unknown register error when compiling. If I leave out the & it works and the transfer does occur. Is this a nuance of the gcc-pru compiler vs a direct pasm compile?

sbbo r0, r0, 0 , 48

Yes, the & is not needed for pru-gcc. But for the sake of compatibility I’ll make it optional with the next release.

Yet another question: the second argument of r0 reflects the starting address point in dataram. I would have expected dataram as a free for all address space that I managed. Is the reference of an Rn type syntax simply a convenience for addressing in dataram and dataram has the notion of its own register mapping?

Dataram has no register mapping. It is simply memory. Consider the following example:
ldi r1, 101
ldi r2, 64
sbbo r1, r2, 0, 4
Converted to C syntax, it would look like:
unsigned int r1 = 101;

unsigned int *r2 = (void *)64;
r2[0] = r1;

Description of the program:

An LED toggles on and off from a set delay time in R2.

A separate C program loads the PRU program, starts the core and then prompts the user for a Time to do a delay. Upon the user entering a time, the c program writes that value to dataram and reads back the mapped memory from the PRU to show.

The PRU loop does a SBBO each time as well as a LBBO for a single R2 . My LBBO call however is not returning the proper value, I am likely using the wrong pointer value.

lbbo r2, r2, 0 ,4 // read 4 bytes from there and store it into r2

After I write from C into shared_int[2], I am not able to load that value from the PRU. Since I stored SBBO from reference point of r0 =0 then I would expect R2 to be starting on the 9th byte over. I tried that too - no go.

PRU program.

`

#include “memparams.hp”

#define CONST_PRUCFG 0xC4

#define CPU_HZ (200 * 1000 * 1000)
//lbco r3, CONST_PRUCFG, 4, 4

.text
.section .init0, “x”
.global __start
__start:
/* Initialize stack pointer. */
ldi sp, %lo(__stack_top)
ldi sp.w2, %hi_rlz(__stack_top)
r2 = 0x0BEBC200 // set r2 to a default of 200,000,000
jmp main

.text
.section .text
main:

// init
ldi r0, 0
ldi r1, 0xffff
ldi r3, 777
ldi r4, 777
ldi r8, 1000
ldi r5, 10000

main_loop:

//Load valeu of PRU data memory in general register r2
//ldi r9, 9 // offset to the start of the third
lbbo r2, r2, 0 ,4
mov r6, r2 // to prove in the c program that data arrived and is correct when displayed R2 should equal R6- debug

sbbo r0, r0, 0 , 48 // copy all 12 registers to memory R0…R11 .

// the goal is for R2 to get set in a C program outside theis assembly. Thus changing the speed of the
// blinking LED - defualt is set to 1 second = 200,000,000 cycles in CPU delay.

// led on
mov r30, r1
ldi r14, %lo( r2/4 )
ldi r14.w2, %hi_rlz(r2/4)
call delay_n2_cycles

// led off
mov r30, r0
ldi r14, %lo(r2/4)
ldi r14.w2, %hi_rlz(r2/4 )
call delay_n2_cycles

jmp main_loop

delay_n2_cycles:
sub r14, r14, 1
qbne delay_n2_cycles, r14, 0
ret

my_resource_table:
.word 1, 0, 0, 0 /* struct resource_table base /
.word 0 /
uint32_t offset[1] */

`

C program

`

#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <err.h>
#include <sys/mman.h>
#include <libelf.h>

#include “prussdrv.h”
#include “pruss_intc_mapping.h”

#define AM33XX_PRUSS_IRAM_SIZE 8192
#define AM33XX_PRUSS_DRAM_SIZE 8192
#define PRU_NUM 0
#define ADDEND1 0x98765400u
#define ADDEND2 0x12345678u
#define ADDEND3 0x10210210u
#define LOOPS 30

#define DDR_BASEADDR 0x80000000
#define OFFSET_DDR 0x00001000
#define OFFSET_SHAREDRAM 0x00000000 //equivalent with 0x00002000

#define PRUSS0_SHARED_DATARAM 4

static int LOCAL_exampleInit ( );
static unsigned short LOCAL_examplePassed ( unsigned short pruNum, unsigned int millis );
static int mem_fd;
static void *ddrMem, *sharedMem;
static unsigned int *sharedMem_int;
static int counter ;

int main (int argc, char *argv[])
{
counter = 0 ;

tpruss_intc_initdata pruss_intc_initdata = PRUSS_INTC_INITDATA;
int ret;

printf(“Initializing the PRUs…\n”);
prussdrv_init();

/* Open PRU Interrupt */
ret = prussdrv_open(PRU_EVTOUT_0);
if (ret)
errx(EXIT_FAILURE, “prussdrv_open open failed\n”);

/* Get the interrupt initialized */
prussdrv_pruintc_init(&pruss_intc_initdata);

printf("\tINFO: Initializing example. - Writing Data to Local CPU DDR Ram \r\n");
LOCAL_exampleInit(PRU_NUM);

printf(“Starting …\n”);
prussdrv_pru_enable(0);
prussdrv_pru_enable(1);

unsigned int blinkySpeed = 1000;

while (counter < LOOPS){

printf(“Please Enter a blinky speed in milliseconds:”);
scanf("%d" , &blinkySpeed );
LOCAL_examplePassed(PRU_NUM, blinkySpeed );
//usleep(5 * 1000 * 1000);
counter = counter + 1;
}

fflush(stdout);

/* Disable PRU and close memory mapping*/
prussdrv_pru_disable(PRU_NUM);
//munmap(ddrMem, 0x0FFFFFFF);
//close(mem_fd);
prussdrv_exit();

printf(“Program done.\n”);

return EXIT_SUCCESS;
}

static int LOCAL_exampleInit ( )
{
void *DDR_regaddr1, *DDR_regaddr2, *DDR_regaddr3;

/* open the device */
mem_fd = open("/dev/mem", O_RDWR);
if (mem_fd < 0) {
printf(“Failed to open /dev/mem (%s)\n”, strerror(errno));
return -1;
}

/* map the DDR memory */
ddrMem = mmap(0, 0x0FFFFFFF, PROT_WRITE | PROT_READ, MAP_SHARED, mem_fd, DDR_BASEADDR);
if (ddrMem == NULL) {
printf(“Failed to map the device (%s)\n”, strerror(errno));
close(mem_fd);
return -1;
}

/* Store Addends in DDR memory location */
DDR_regaddr1 = ddrMem + OFFSET_DDR;
DDR_regaddr2 = ddrMem + OFFSET_DDR + 0x00000004;
DDR_regaddr3 = ddrMem + OFFSET_DDR + 0x00000008;

(unsigned long) DDR_regaddr1 = ADDEND1;
(unsigned long) DDR_regaddr2 = ADDEND2;
(unsigned long) DDR_regaddr3 = ADDEND3;

return(0);
}

static unsigned short LOCAL_examplePassed ( unsigned short pruNum, unsigned int millis )
{
unsigned int result_0, result_1, result_2, result_3,result_4,result_5,result_6,result_7,result_8,result_9,result_10,result_11;

/* Allocate PRU Dataram memory. /
prussdrv_map_prumem(PRUSS0_PRU0_DATARAM, &sharedMem);
sharedMem_int = (unsigned int
) sharedMem;

// set R2 which hold our delay valy for the blinky action in the pru
sharedMem_int[2] = (millis * 1000 * 200);

// read all the current data ram fields in
result_0 = sharedMem_int[ 0];
result_1 = sharedMem_int[ 1];
result_2 = sharedMem_int[ 2];
result_3 = sharedMem_int[ 3];
result_4 = sharedMem_int[ 4];
result_5 = sharedMem_int[ 5];
result_6 = sharedMem_int[ 6];
result_7 = sharedMem_int[ 7];
result_8 = sharedMem_int[ 8];
result_9 = sharedMem_int[ 9];
result_10 = sharedMem_int[ 10];
result_11 = sharedMem_int[ 11];

printf("-------------------------------------\n");
//printf("%p\n", (void *) &sharedMem_int[0]);
printf("value R0 = %d\n ", result_0);
printf(“value R1 = %d\n”, result_1);
printf(“value R2 = %d\n”, result_2);
printf("value R3 = %d\n ", result_3);
printf(“value R4 = %d\n”, result_4);
printf(“value R5 = %d\n”, result_5);
printf("value R6 = %d\n ", result_6);
printf(“value R7 = %d\n”, result_7);
printf(“value R8 = %d\n”, result_8);
printf("value R9 = %d\n ", result_9);
printf(“value R10 = %d\n”, result_10);
printf(“value R11 = %d\n”, result_11);
//return ((result_0 == ADDEND1) & (result_1 == ADDEND2) & (result_2 == ADDEND3)) ;

return 1;

}

`

Here is the pru program

First, 9 is not the proper offset for the third 32-bit value (that
would be 12, or 3 values * 4 bytes/value).

Second, you are reading *AND* writing the memory location you are
trying to monitor in your PRU code. That means a new value will
*ONLY* be picked up if the ARM side updates the value in between the
write and the read. You should structure your code so that for any
given memory location, only one side (ARM or PRU) writes the values.

OK some progress… I set in the C prompt to 100. The new LBBO code of lbbo r2, r0, 8 ,4 worked. The reason I know is that after the lbbo I move it into another register for comparison an unadultered register read back into C. yay

Please Enter a blinky speed in milliseconds:100

Yes, thank-you, I already know the prompt cycle needs to run twice to pick up the write in the subsequent read cycle, no big deal there, I just enter the same vale twice and I get the feedback.

The part I am focused on is why the value from the LBBO does not seem to be used in the delay call.

You may have missed my last post where the LBBO worked using the 8 bytes of offset, R2 is the third 32 bit number in my reference R0 is the first , R1 the second, R2 the third

lbbo r2, r0, 8 ,4 works like a champ.

After the lbbo I transfer the value I entered into R6 in the PRU and it comes back to me in the sbbo so I know it is working and getting to the general registers.

The question now is why R2 used as a delay value does not change the delay time when it is truly reaching the PRU.

Any comments on the initialization of R2 = 200,000,000 ? does that syntax lock it into a constant?

Ok I got it working, the part I changed is commented out.

Essentially I used my r6 debug register that had the correct one in it. Now I can dial in the blinky action! fun!

So for some reason even though we were setting the r2 form the lbbo it just did not like this syntax.

ldi r14, %lo( r6/4 )
ldi r14.w2, %hi_rlz(r6/4)

Seemed to always load the old initial value. I have to search for the %lo and %hi_rlz meaning I know it is used to load a high and low set of bytes due to limitations of ldi to a max of 65535 but it was probably messing things up.

`
lbbo r2, r0, 8 ,4
mov r6, r2 // to prove in the c program that data arrived and is correct when displayed R2 should equal R6- debug

sbbo r0, r0, 0 , 48 // copy all 12 registers to memory R0…R11 .

// the goal is for R2 to get set in a C program outside theis assembly. Thus changing the speed of the
// blinking LED - defualt is set to 1 second = 200,000,000 cycles in CPU delay.

// led on
mov r30, r1
mov r14,r6
call delay_n2_cycles

// led off
mov r30, r0
mov r14, r6
call delay_n2_cycles

/*
// led on
mov r30, r1
ldi r14, %lo( r6/4 )
ldi r14.w2, %hi_rlz(r6/4)
call delay_n2_cycles

// led off
mov r30, r0
ldi r14, %lo(r6/4)
ldi r14.w2, %hi_rlz(r6/4 )
call delay_n2_cycles
*/
`

Hi Neil,

The “r2 = 200000” syntax does not load a value in a register. It is for setting symbols - see https://sourceware.org/binutils/docs/as/Setting-Symbols.html#Setting-Symbols . You probably meant to load r2 with a constant integer:

`

ldi r2, %lo(200000)
ldi, r2.w0 %hi_rlz(200000)

`

The %lo(X) returns the lower 16 bits of a 32-bit constant integer. The "%hi_rlz(X) returns the higher 16-bits of a 32-bit integer, and marks the instruction for possible elimination if those higher bits are all zero. You may want to declare and use the following helper macro:

`
.macro ldi32 rx, expr
ldi \rx, %lo(\expr)
ldi \rx().w2, %hi_rlz(\expr)
.endm

; Use like this:
ldi32 r2, 200000

`

Please note that “sp” (Stack Pointer) is an alias for “r2”, and “ra” (Return Address) is an alias for “r3”. Hence R3 is used whenever you use the “call” instruction to store the return PC address.

I also don’t think that “lbbo r2, r2, 0 4” is correct. You are overwriting the R2 address with a value, which value you are using as an address in the next iteration. Equivalent C code:

`

uint32_t *r2;
r2 = (uint32_t)(*r2);

`

Also, $lo(r6/4) is probably not what you meant. %lo expects a constant integer as an argument. If you want to copy a register, simply use mov:

mov r2,r6

Regards,
Dimitar

Thx, correct on many accounts. I found it strange that there was no default way load in a 32 bit value given it is the default register size.

Love the macro! I am going to adopt it :slight_smile:

I’ll likely now dig into the next level up which implementing the same in pru-gcc / c code vs assembly. Going even higher level , do you know if there are core bindings to the pruMemMap in python? I know PRU Speak works a bit but not sure if it does the memory mapping parts.

Neil

Unfortunately I have not dealt with python for Pru.