Question about parallel processing the PRUs and signaling

Two questions, really:

1) I have an assembly program running on PRU0 receiving data in a tight loop, and I want to signal PRU1 at some point to do further processing on the bytes received. Is it possible to have PRU0 write into a register (say r2) on PRU1 using a SBBO and wake up PRU1 from executing a WBS instruction ? It seems so, from the memory map, but the docs seem to discourage WBS with anything other than r31

2) I'm seeing conflicting values for the size of program and data ram on the PRUs. How much of each do the PRUs on the beaglebone black actually have ?

Two questions, really:

1) I have an assembly program running on PRU0 receiving data in a
tight loop, and I want to signal PRU1 at some point to do further
processing on the bytes received. Is it possible to have PRU0 write
into a register (say r2) on PRU1 using a SBBO and wake up PRU1 from
executing a WBS instruction ? It seems so, from the memory map, but
the docs seem to discourage WBS with anything other than r31

There are many ways you can communicate between the PRUs, but if you
want to use a WBS instruction, you need to get one PRU to alter a bit in
r31 of the other PRU. The only way I know to do this is to use the
event/interrupt mechanisms. See Chapter 6 "Interrupt Controller" in the
PRU reference guide.

2) I'm seeing conflicting values for the size of program and data ram
on the PRUs. How much of each do the PRUs on the beaglebone black
actually have ?

Each PRU has 8K of program memory and 8K of data memory. There is an
additional 12K data memory shared between both PRUs.

There's a nice picture of the PRU subsystem in the PRU Reference Guide
(Figure 2, page 15).

The easiest way to communicate between PRU-0 and PRU-1 is to use the data ram (= DRam, 8 kB for each PRU, as Charles said). Ie. when PRU-0 receives data and stores them at address 0x0100, PRU-1 can access them at address 0x2100. Also when PRU-1 writes to address 0x0100, PRU-0 can read this data at 0x2100. You may use some bytes for handshaking and take the rest of the DRam to exchange the data.

An alternative (and slower) way is to use a memory block allocated on the host. The kernel driver allocates an external memory block for the PRUSS (512 kB by default, this can get customized up to 8 MB).

If you actually need to communicate data, the fastest way is using the
scratchpad registers, where you can send up to 248 bytes in a single
clock. It's also possible to directly send data from one PRU to the
other using this method (execute the XOUT instruction on one PRU and the
XIN on the other), but your execution timing between the two PRUs has to
be within 1024 clocks or the instruction will time out. This is also a
good way to exactly synchronize code on the two PRUs, if you ever need
to do that.

Thanks all, the scratchpad idea looks like the best to me. I'm trying to parse a fast-clocked input signal, so every cycle counts.

Cheers,
  Simon