Hey all
Follow project on:
https://hackaday.io/project/5837-pruss-support-for-newer-kernels
pruss_remoteproc module code modifications:
Prevent pruss_remoteproc module from loading during boot. ( Since 3.14 doesn’t allow device trees to be loaded after boot, probing the driver when the device is instantiated should be avoided. )
Prevent firmware image from loading during probe time.
Expose sysfile file to write f/w name from userspace.
Set up PRU code generation environment ( SDK, PRU-addon pkg, CCS ) and compile firmware. ( TI pruss_remoteproc driver requires resource table with code )
Modify code to load firmware and boot PRU when required by user.
Explore mailbox framework for upcoming work. Mentor discussion reveals lack of PRU code space to implement this solution. Will work towards developing custom-light weight IPC model inspired from existing frameworks.
Week 2
Developments:
Issued faced:
Next Week:
Code on github
Week 2
Developments:
- Exposed appropriate sysfs attributes for data length,offset,memory type and data buffer to userspace.
- prussdrv like functions for reading and writing to DRAM0, DRAM1 and SHR-RAM work now, allowing
a maximum of 4kB data to be written in one call.- Tested writing 2D integer arrays to PRU mem and subsequent read. Example code on github.
- Dynamically boot/shutdown PRU from userspace at any time using driver-device bind/unbind function.
Allows for rmmod pruss_remoteproc without --force option.- Able to provide custom sysevt->channel and channel->host mapping to 3.14 pruss_remoteproc driver
Issued faced:
- Abandon python for userspace library. Realized python is not feasible for writing code which interacts with hardware which
has extremely limited space. C saves the day.- Had to decide whether to send data via sysfs or send down userspace pointer (to data buffer) to driver. Reading from sources
advised against sending userspace pointers down to kernel. Hence former was chosen.- A persistent segmentation fault which took significant time to debug. Was due to dereferencing a null pointer.
- Writing to sysfs was triggering both show/store functions. Issue resolved through mentor discussion. [ use open( ),write ( ) etc
instead of fopen ( ) fwrite ( ). ]Next Week:
- Allow INTC mapping and configuration from userspace.
- Ability to dynamically allocate larger circular buffers in DDR mem.
- Extend userspace library to allow DDR read/write.
Code on github
Was this code meant to be managed separate from the Linux tree? That doesn’t quite make sense to me for the kernel patches. Is there a strategy I should read somewhere?
Nouveau does this too, since it's setup by default as module, it's
really easy to build this quickly and test small changes on target.
When it's working, it should be easy to convert into a patch set with
individual changes...
Regards,
Week 3
Developments:
Issues:
Next Week:
More detailed report will be up on hackaday with links to very useful resources and critical things learnt along the way.
Week 4
Issues faced:
Next week:
Week 5
Developments:
Fix major bug in driver which had gone unnoticed. Word offset given to read/write routines was buggy.
Improved user library. Previously pru cores booted when remoteproc driver was probed. Now user has pruss_boot ( “fw_path” , PRU0/PRU1 ) and pruss_shutdown(PRU0/PRU1) routines to independently handle power to each core.
Able to register virtual device with vrings using remoteproc.
4. Better example application in which ARM writes 2 values to pru memory → interrupts pru 1 → pru 1 adds 2 values and writes result to new location → pru1 sends interrupt to ARM → ARM validates result
**Issues Faced:**
**Next week:**
Week 6
Developments:
Issues faced:
Next Week:
Get back to vrings.
Week 7
Developments:
Issues:
Next Week
1.Try to finish vrings and begin writing example codes.
Week 8
Developments:
Issues:
Single biggest issue which was most troubling was to find location of resource table within PRU DRAM.
Solution : the .resource_table SECTION specified in link (am33xx.cmd) file is written to a fixed address.
Next Week:
Expose vring to userland as char device and test performance.
Week 9
Developments:
I am running a slightly behind schedule. Instead of exposing vrings to userspace I worked on
allowing custom callbacks for virtqueues which are executed after the kick. These callbacks
are necessary consume pending messages. Pushed code to gh.
Issues:
Lately I have been able to spend a little less time than I should have. Need to cover up.
Next Week:
Finish previous weeks objectives as fast as possible and get back to working on 4.1 patches.
Week 10
Developments:
Earlier I was able to write to all 512 (or less buffers) from PRU but they were not being added back to the vring (as
free buffers) after the data was consumed by host processor. This last roadblock (hopefully) has been solved hence resulting in
successful transfers from PRU to ARM, even continuous streaming data using the vring
Issues:
One major optimization remains i.e I am able to get different transfer rates depending on what frequency one kicks (ie interrupts)
the ARM. Kicking the ARM after writing to each 512byte buffer followed by a kick results in stalling the ARM because of too many
interrupts within too short a time. Kicking after filling all 512 buffers have been filled in bu the PRU leads it to wait for ARM to consume
buffers first before PRU can start using them again. So, in order to attain maximum throughput, an optimum value needs to be found
out after which the ARM is kicked. The user also has freedom to decide this value on his own (in the pru firmware) depending upon his requirements.
Next Week:
Patches for 4.1 expose misc char device to stream data to user.
Week 11
Developments:
Issue:
Blocking read on misc device still facing synchronization issues. This involves correcting flow of pending buffer interrupts
Next Week:
Wrapping things up, example code and documentation.
Week 12
Developments:
Solved major synchronization issues which was bugging me for a long time related to streaming data from PRU to ARM. copy_to_user functions sleeps
and cannot be a part of irq handler. Hunyue’s suggestion to move to threaded irq handlers solved this issue. Now able to stream data to ARM using vrings without
buffers being overwritten.
Used kfifo to create buffer for incoming data i.e fast vrings dump data to kfifo which passes on to user through kfifo_to_user. kfifo is dynamically allocated
twice the size of vring which was specified by the user in resource table.
Code commenting, library docs and a blog post giving insights into vring implementation (not yet complete)
Post can be found on project wiki and hackaday page
Next Week:
Complete remaining task of code commenting and blog post.
Final Report
This project aimed at making it easy for developers to write application for PRU using newer linux kernels. I believe the project status has reached very near to completion
and is currently able to provide all functionality it proposed. The course has been altered a few times during the project execution phase, since better suited alternatives
as compared to what were proposed initially. This was made possible with the help of mentor discussion and the insights I gained as I worked upon the project.
Accomplishments:
The initial phases involved understanding the workings of TI’s pruss_remoteproc driver which was significantly different from pru_rproc driver being used till 3.8 kernel.
This also included working on some limitations of the new driver and improving it by borrow functionality provided by pru_rproc driver.
In order to provide elementary data exchange between PRU and ARM, suitable routines were added to driver. To lower barrier for end user, a userspace library was written
which makes use of sysfs attributes exposed by driver. Demo app in GitHub repo along with library API can help user to get started.
Configuring the PRU interrupt controller followed by sending and waiting for interrupts to/from PRU is possible using user library and resource table.
Register vdev and vrings published in resource table using driver. Make use of vring library for PRU to set up vrings in firmware.
Synchronizing kicks from PRU and subsequent transfer of buffers to user was the most challenging part. Implemented control flow on vrings
so that no buffers are dropped and continuous data streaming from PRU is achieved. Makes use of threaded irq handler.
I would like to thanks Google and Beagleboard.org for giving me wonderful opportunity to learn immeasurable amounts during this summer. Thanks to
my mentors Pantelis Antoniou Hunyue Yau and Abhishek Kumar for making this project possible. I would also like to thank Alexander Hiam, Deepak Karki
and Jason Kridner for your support at critical junctures.
Will definitely keep contributing to this as well as other open source projects.
Shubhangi Gupta
Forgot to post the link to the blog post. The article tries to eplain how vrings work in general and how they are implemented in the PRU context.
Understanding vrings for pru
Thanks