Low-latency I/O RISC-V CPU core in FPGA fabric

On several BeagleBone boards, there are an ultra-low latency 32-bit RISC CPUs called Programmable Real-time Units (PRUs) with register-mapped I/O directly to pins that is a custom architecture. We’d like to implement a new CPU using a RISC-V instruction set in FPGA fabric with a similar feature on BeagleV-Fire.

PRUs are designed to provide software-defined peripherals as part of the Programmable Real-time Unit Industrial Control SubSystem (PRU-ICSS) and are capable of implementing things like 25 pulse-width modulators, 4 soft UARTs, stepper motor drivers, and much more. Having these controllers integrated is really handy to avoid throwing in another device to control or interface to a new peripheral. The real power comes when you need high bandwidth between the main CPU and these controllers, such as in LEDscape. See more examples in the PRU Cookbook.

It would be great to have a RISC-V based PRU running on the FPGA fabric of BeagleV-Fire. Existing cores like VexRiscv, NeoRV32, SweRV can be explored for this application.

Goal: RISC-V-based CPU on BeagleV-Fire FPGA fabric
Hardware Skills: Verilog, Verification, FPGA
Software Skills: RISC-V ISA, assembly, Linux
Possible Mentors: @jkridner, @vauban
Expected size of project: 175 hours
Rating: Medium
Upstream Repository: TBD
References: TBD


Im interested… will be watching this thread

Highly interested, watching it.

I would encourage the prospective student to consider implementing this unofficial RISC-V extension: GitHub - jnk0le/XTightlyCoupledIO: custom riscv extension to provide directly accessed peripheral registers



Very nice find, especially the research on the various approaches to similar problems, but I found it hard to interpret a solid recommendation. Further, I think the addition of new instructions will make it harder to implement by modifying an existing CPU.

That said, I’m in favor of this recommendation and suggest we engage the specification developer to comprehend the status of implementations as well as ratification. I just have some doubts on a few fronts.

With all the complexity of the multitude of instructions added, is this really something still suitable for a GSoC contributor?

Jason, I get your point. All of it might be too much effort for GSoC.

The new instructions implementation could be a nice follow-up project for GSoC 2025 :slight_smile: If you have a functioning core from GSoC 2024, I think it would be feasible to add the new instructions in one summer of coding.


1 Like

Vertically easiest solution would be the alternative #1 (most similar to original PRUs)

Most of those instructions are just mirroring standard ones, microarchitecturally identical to alt#1 (RR - ALU - WB)

status of implementations as well as ratification.

Instructions still have temporary encodings and everything can change, though there is semantic versioning so HW can claim compliance against specific versions.

Definitely not attempting freeze/ratification solely by myself.

1 Like

awesome to have you here.

I’m not able to follow your very terse explanation. I’ve done some ASIC design, including state machines and extremely minimal ALUs, a very long time ago, but I wouldn’t know what “RR” or “WB” mean in this context. From your proposal, it felt like you’d rolled all the alternatives into a single proposal toward the end, but perhaps I just didn’t get how it was organized.

If the instructions mirror standard ones, why add new instructions and not simply utilize some specific register bits with the existing instructions? Sorry for being slow on this. I get that some registers might have some standard use, but this would be a bit of an application-specific processor in my view and thus grabbing bits out of R30/R31 and wiring them to the external world wouldn’t be too strange.

What am I missing here?

Well, I’d love to see a GSoC-scoped idea pan out where a student can make a proposal that will be able to implement and demonstrate some of your architectural ideas. I think your study of the various architectures alone demonstrates tremendous value in your work and I’d like to see BeagleBoard be involved in helping to advance it.


It’s basically the same concept as the r30/r31 on PRUs where entire pipeline is recycled but certain registers are rewired to somewhere else in regfile read and writeback stage. (tio. instructions employ additional 2 bits for banking)

If the instructions mirror standard ones, why add new instructions and not simply utilize some specific register bits with the existing instructions? Sorry for being slow on this. I get that some registers might have some standard use, but this would be a bit of an application-specific processor in my view and thus grabbing bits out of R30/R31 and wiring them to the external world wouldn’t be too strange.

It is possible but (OP,OP-IMM) encoding is cramped and not all instruction would fit (e.g. tio.addi). There are also some new instructions like single bit branching already present on PRUs.

“r30/r31” (up to a total of 16 regs reserved by RVE) should be enough for “fpga PRU” but for a full blown microcontroller/DSP it’s not enough.
If we consider only gpio/timers/ADCs/DACs (and limited only to “critical” regs) present on typical stm32g4/f28xxx, then that should already take around 2 banks of those.

Hi folks, this project is quite intriguing.

Looking at the block diagram for the PRU, it seems like it shall be a subsystem residing on the FPGA Fabric. However, I have a question: what level of contribution would be expected from a GSoC participant for this project? Would it involve writing or utilizing a RISC-V core to develop the complete PRU subsystem, or would the task entail creating a portion of the subsystem while leaving the remainder to be developed by subsequent contributors? I believe that developing the full subsystem might be too extensive for a single GSoC contributor to undertake.

1 Like

I’d expect the project to start by getting an existing core running, then make incremental improvements. How much improvement needs to be scoped interactively between contributors and mentors.


Interesting idea to develop RISC-V core in FPGA fabric. So, would it be like a soft IP, similar to microblaze in Xilinx FPGAs?

Yes, that is what we mean by in the fabric in this case. Built from Verilog RTL.

Got it! Thanks. I am interested in this project. Will watch this space for more.

For those interested, there is already, soft-core RISC-V processor IP for AMD adaptive SoCs and FPGA. The AMD MicroBlaze™ V Processor (xilinx.com)

1 Like

I want to join the project What should I do ?

1 Like

I am Atharva Kashalkar student at VJTI Mumbai, India. I am really interested in this project and want to contribute to it .
I have already implemented a RISC-V IM Core on DE0-NANO FPGA that can run basic C codes compiled into binary instructions.
I was wondering if you could assign me some relevant task that I can implement considering my prior knowledge that could help with this project.
@jkridner @Vauban

Hello , this is Pranav Kapoor from IIITD, India . I have previously worked with FPGAs and RISC ISA assembly code, I am very interested in this project and would love to contribute.

Hello Everyone!

I am Mahela Ekanayake, a 3rd year undergraduate from Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, Sri Lanka.

I have previously worked with Verilog, SystemVerilog, FPGA, Assembly Language, CPU and computer architecture.

New to the RISC V architecture, but previously have worked with a simple architecture close to MIPS.

I am eager to contribute to this work and get to know each other!

For everyone posting introductions, please try to dive a bit deeper on the topic and also how to engage the BeagleBoard community and GSoC mentors. Check out the site editing section on https://gsoc.beagleboard.org and browse the projects on https://openbeagle.org. I believe some folks have already started building some projects that include RISC-V CPU cores.

Specifically, check out the template setup at BeagleV-Fire / Gateware · GitLab and the documentation in progress under https://docs.beagleboard.io. Try to fork the gateware and start making some custom Verilog additions to it.

Then, get a proposal started based on what you learn about the complexity of the task.