remoteproc write to PRU over rpmsg device blocks even when set non-blocking

I was getting some strange bugs from some remoteproc stuff I was doing on a BBB, and eventually I tracked it down to the overunning the rpmsg system which can block for several seconds on a write.

Okay, fine. No big deal. This is what poll() was made for–flip “/dev/rpmsg_pru30” to O_NONBLOCK, set up POLLOUT, wait for a write event, write the data, and check the error.

Except that my overrun writes to “/dev/rpmsg_pru30” still block for several seconds (very bad) and then terminate with an Error 512 (huh?).

I can handle the error, but the big problem is the blocking. That absolutely should not be allowed to happen.

What’s going on? And where do I file a bug about this?


uname -a

Linux beaglebone 4.19.94-ti-r42 #1buster SMP PREEMPT Tue Mar 31 19:38:29 UTC 2020 armv7l GNU/Linux

It appears that the problem is in rpmsg_pru.c.

rpmsg_pru_read has the following code:

if (kfifo_is_empty(&prudev->msg_fifo) &&
(filp->f_flags & O_NONBLOCK))
return -EAGAIN;


rpmsg_pru_write presumably needs a similar piece of code with kfifo_is_full() or it needs to look for O_NONBLOCK and then use rpmsg_trysend instead of rpmsg_send.

Unfortunately, I’ve got nowhere near the Linux kernel programming chops to debate the implications of that.

Presumably, I need to file a bug somewhere?


Nobody knows where I should file this bug?

Which repo has the code that is causing problems?

I took a quick look at and it seems to be structured a fair bit differently. If the same issue had been there, I’d recommend posting to

Switching over to the kernel, I see the function you mention:

The driver isn’t upstream yet:

The post to a public list seems to be here:

The development tree seems to be here:

The code seems the same in the latest development branch:

Er, I guess that is an example of doing it right and the issue is here?

Since it isn’t upstream, I’d think an e2e post might be OK, but it might be more productive to reply to the latest post on linux-omap:

Copy Jason Reeder, Anthony F. Davis and Suman Anna. Not sure why it has been so long between revision posts.

Personally, I don’t see any harm in modifying the _write code with a fifo check on O_NONBLOCK.

If it is for support from a TI SDK, please post a query to E2E.

Can someone clarify meanwhile exactly what the issue is? The kfifo is used only on the receive path because of the asynchronous callbacks. The Tx-path is synchronous, the copy is attempted directly on the vring buffers, and you have a number of vring buffers (dictated by firmware), and if all of them are busy (implies PRU has either stopped processing or is overwhelmed), then you get a failure.


Hi Suman

Here is original thread so you have background info and time to respond if Andrew has more to add.!msg/beagleboard/6Ch7Do4Hm7k/CAcSRi1pBQAJ


Hi, folks,

The issue is that requests cause the rpmsg channels to the PRU to fill. Which is actually fine, the PRU in this case is servicing slow requests and the rpmsg being full should exert backpressure.

The problem is that the rpmsg system HANGS several second before timing out and throws a fairly bizarre error. Quoting my original message:

Except that my overrun writes to “/dev/rpmsg_pru30” still block for several seconds (very bad) and then terminate with an Error 512 (huh?).

This is not good behavior from all manner of perspectives:

  1. Why does the write time out at all when not O_NONBLOCK? That’s certainly not expected behavior. There is no reason why the PRU might not take a couple seconds to service a request. If that’s a problem, you either set a timeout manually (usually only valid for file descriptors of sockets) or you put the file descriptor into non-blocking mode. (It appears that this is the fault of the rpmsg driver which will time out after 15 seconds and then return ERESTARTSYS)

  2. Why does the write hang at all when in O_NONBLOCK? That’s also not expected behavior. If the queue is full, an attempt to write to it should return IMMEDIATELY with something like ENOMEM/EAGAIN. (This appears to be the fault of the rpmsg_pru driver).

The file I was looking at is here:

Two solutions seem to present themselves:

  1. Use rpmsg_trysend when O_NONBLOCK is set (see rpmsg_eptdev_write_iter in rpmsg_char.c line 243 for an example)

  2. Check the queue for space and return immediately with ENOMEM. (Saves the call to rpmsg_trysend and all its indirections).

  3. Do both. (It’s possible that trysend covers other cases than just kfifo full–but the kfifo check may be a useful optimization and catch 99%+ or all the cases quickly).


Urk, sorry I didn’t quite get the implications of this statement:

The kfifo is used only on the receive path because of the asynchronous callbacks. The
Tx-path is synchronous, the copy is attempted directly on the vring buffers

That means that kfifo doesn’t exist on send so the only available solution appears to be calling rpmsg_trysend when in O_NONBLOCK mode.

That will hit the full vring buffers and should bounce back immediately with ENOMEM.


You could increase the vring buffers or check for full and retry depending on how critical the timing is.

Sure. Right now, I just keep track of how many messages are in flight and I don’t allow it to queue too many.

That’s useful once you know what the bug is. Fortunately, I hit this bug before I had two threads (one receiving USB and one receiving ethernet) which would have made hunting it down quite painful. So, at least now I know that I must have a single thread acting as a gatekeeper on top of the rpmsg system.

If, however, you try to use a library on top of this bug that actually expects the O_NONBLOCK behavior to work, you will have a long debugging chain.

What originally tripped all of this was that I tried to use Rust and Tokio, which failed mysteriously. After far too much fruitless debugging, I switched down to Rust and mio, which also failed weirdly.

So, I switched down to C, poll, and O_NONBLOCK, which then gave the incorrect blocking behavior and the ERESTARTSYS. After that, I could actually pinpoint the incorrect behavior as belonging to pru_rpmsg and as being due to a full queue with incorrect blocking semantics.

Getting to that point, however, was neither pleasant nor straightforward.

So, we’re still back at the original question of “Where do I file this bug so that it gets tracked?”

I see some recent work on rpmsg bugs at, so I’ll file a bug there. But, is there somewhere else I should file it?


Bumping this. Again.

I’d like to NOT have to keep supporting the fix for this on the user side in the 5.X series when this really needs to get fixed on the kernel side. I’ve filed the bug reports. They’re just sitting.

In reality, the rpmsg system doesn’t really have the hooks to even support the fix from the user side as I can’t query the size and depths of the buffers. This needs to get fixed in the PRU rpmsg kernel subsystem.