PocketBeagle 2 - Problems loading code to the PRU

Hi guys,
since BBB(Blue and PB1) and BBAI we was using the folowing method to load code to the PRU:
opening /dev/mem, stop pru - setting the CONTROL to 0, load hex code to the IRAM and start PRU using CONTROL.
I’m trying the same method to the PB2 but with no success.
I’m stoping the PRU but when loading the hex to the IRAM I’m always receiving a SIGBUS.
Using the PRUDEBUG I see the follow.
This is the HEX code:

const uint32_t PRUcode[] = {
0x240000c0,
0x24010080,
0x0504e0e2,
0x2eff818e,
0x230007c3,
0x240001ee,
0x23000ac3,
0x10c3c380,
0x1f07fefe,
0x20800000,
0x23000cc3,
0x21000b00,
0x10000000,
0x20c30000};

Running the program, just a part of the code is loaded to the IRAM of PRU
PRU0> DI 0x00000 128

Absolute addr = 0x34000, offset = 0x00000, Len = 128
[0x00000] c0 00 00 24 80 00 01 24-e2 e0 04 05 8e 81 ff 2e 
[0x00010] c3 07 00 23 ee 01 00 24-c3 0a 00 23 80 c3 c3 10 
[0x00020] 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
[0x00030] 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
[0x00040] 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
[0x00050] 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
[0x00060] 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 
[0x00070] 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 

If I load a bigger code - the total of loaded code increase but not enough for load all the code.
The code is simple:

printf("IRAM\n");
   uint32_t *iram = (uint32_t*)mmap(0, 0x4000, PROT_READ|PROT_WRITE, MAP_SHARED, mem_fd, RCOUT_PRUSS_IRAM_BASE);
   printf("CONTROL\n");
   uint32_t *ctrl = (uint32_t*)mmap(0, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, mem_fd, RCOUT_PRUSS_CTRL_BASE);
*ctrl = 0;
 memcpy(iram, PRUcode, sizeof(PRUcode)); // This is the part of the crash

any help will be appreciated,

What’s the sizzeof(PRUcode)? Can you over-allocate so that it’s a multiple of 64 or 128 bytes?

Copy directly to the mmapped PRU memory is tricky, particularly if using a super highly optimized method like memcpy. If it starts using some of the fancy NEON optimizations, it will fail with the SIGBUS, particularly if it’s not a full NEON register of 64/128 bytes.

I’d suggest using your own utility, but even then you have to be careful as GCC will happily optimize various things into NEON which is just as bad. You can see the code we use in FPP at:

We use this for the pru local ram and the pru shared ram, but most likely its would apply to the iram as well. (we use remoteproc to load the programs onto the PRU’s and start/stop them)

Hi @Daniel_Kulp
thanks for your help.
The question is really yhe memcpy - I made a ugly test - loading the code using a simple for
const size_t pru_firmware_words = sizeof(PRUcode) / sizeof(uint32_t);

  for (size_t i = 0; i < pru_firmware_words; ++i) {
     iram[i] = PRUcode[i];
  }

and the code was loaded and working - I have a work to adjust the pru code to the new register/devices and after that I will look for a more fancy and optimized way to load the code - the size of the code is 780 words.

thanks for your help again :slight_smile:

Just be careful with that because if you compile with -O2, it will likely get optimized into the equivalent memcpy code and start faulting again. It’s stick a #pragma GCC optimize(“O0”) ahead of it to make sure the optimizations are turned off.

hi @Daniel_Kulp
Yes, with optimization active the loading crash, I had to first disable it and then use the for method to load the code