i suggest instead that you have the pru’s write to pru memory (28K total) and have the arm chip read this memory. This sidesteps the need for the pru’s to know where the allocated memory is in linux space.
A ring buffer is the ideal means to do this. An example can be found here: Turnkey PRU deskclock application for BBB … it achieves great speed b/c both processors are active simultaneously, one reading (arm) and the other writing. The stress test video shows a data transfer of ~1.2MB / sec (60MB in 50 seconds).
it is the reverse direction of what you are needing and uses remoteproc. the linux program must run as root to have access to /dev/mem.
good luck
gomer