Yes, a ring buffer is fine!
But instead of using three blocks of 28k PRUSS memory, I recommend using a single continguous block of kernel memory allocated by the uio_pruss driver (ie 256k). The PRU directly writes to that memory (over L3 bus), where the ARM can fetch the data. Find source code in examples rb_file or rb_oszi. (In case of parallel access, ARM reading blocks PRU writing.)
No root access, all is running from user space.
Regards