I wanted to follow up with my earlier post Interest in GSOC 2021 - #10 by lpillsbury
about TensorFlow Lite on BBAI. It seems there is both general interest in the user community, and that this would help get around some of the other challenges that @Jakub_Duchniewicz discovered when looking into YOLO on BBAI.
I’m envisioning a project that is partially getting it working smoothly, and partly some documentation/example zoo. @jkridner mentioned that @RobertCNelson has more info/ code that needs to be put together. I’m wondering if anyone can share more about what code exists already and whether this project scope would be feasible? The BBAI has the ARM M4 processors that are supposed to work with TensorFlow Lite, though I see it also has ARM® Cortex®-A15 RISC CPUs. Any pointers on how these are working together?
@cwicks you asked me about this in the chat yesterday. Chat seems to be down now, but I’d love your feedback and any suggestions for additional potential mentors.
@Jakub_Duchniewicz and @yoder I’d also appreciate any suggestions you have if you have time.
@lpillsbury I was leaving you some comments in IRC today (along with another mentor) but apparently the matrix/irc bridge isn’t working so well. Mainly we were asking if you can add some detail to the milestones, plus a few questions about expereinece:
<lorforlinux[m]> How do you think the Linux Kernel will communicate with the ARM Cortex M4 processor of BBAI?
<lorforlinux[m]> meaning the way we will transmit and receive data and commands between processors!
<lorforlinux[m]> Any specific MCU board with ARM Cortex M4 core you know that can run TensorFlow Lite models?
<lorforlinux[m]> Which Arduino board are you talking about?
and:
<lorforlinux[m]> Please try to elaborate bit more on the Milestones you have set for the project.
<lorforlinux[m]> `Milestone #2, Working version of TensorFlow Lite on a BeagleBone AI `
<lorforlinux[m]> Doesn't tell us how you are going to achieve that :(
I haven’t actually touched a BBAI, so I don’t really know the M4 answer yet either (and I don’t think the Udoo version of the answer is helpful in this case).
(otherwise it looks pretty good so don’t stop now, keep it coming
@jkridner thanks for the TI Sitara link it was very helpful and helped me refine my proposal . @lorforlinux and @Stephen_Arnold thank you so much for these comments–some of them I know the answer already and will add to the proposal, and some I need to go track down. One very important outstanding issue is since I proposed this project myself, I don’t have a specific mentor in mind. @jkridner suggested that @RobertCNelson is the right person to talk to, and I emailed him yesterday but I haven’t heard from him. I wouldn’t want to assume he has time to work on this with me without discussing first… what should I put for the Mentors field in the meantime? I can put that I’ve gotten feedback from a variety of sources and would continue to do so. Are either of you interested or have suggestions for additional people to ping? Thanks, Leah
Hi @lpillsbury, sorry weekends, i’m usually in the brewery working my 2nd hobby so i’m usually behind on emails by monday… I’m not really an expert in Tensorflow, mostly an expert in taking a part TI’s SDK and getting stuff that we need built on debian. I will help as much as i can.
Yeah @lorforlinux is Deepak. I saw his name in the pictures on your post, so I thought those comments were from him. Anyways glad to know who is who and nice to meet you all
Those were his comments I pasted (mainly I was chiming in on the milestone comment). And in this case, Arduino => cortex-M4 which is available on the BBAI, except you get the bonus of running on 2 cortex-M4s in parallel.
I don’t know how to answer his first question: How do you think the Linux Kernel will communicate with the ARM Cortex M4 processor of BBAI? This is where I need to learn more/ask for guidance because there are several processors in the AM5729: AM5729 data sheet, product information and support | TI.com and I don’t know how the chip delegates work between them or if this is something that needs to be explicitly specified in the build.
TensorFlow Lite documentation says: TensorFlow Lite for Microcontrollers is written in C++ 11 and requires a 32-bit platform. It has been tested extensively with many processors based on the Arm Cortex-M Series architecture, and has been ported to other architectures including ESP32. The framework is available as an Arduino library. It can also generate projects for development environments such as Mbed. It is open source and can be included in any C++ 11 project.
Any thoughts on this or should I just state it as an outstanding question for pre-project investigation?
Hmm, I think we should probably both look for that nugget, but I don’t think your proposal necessarily requires a detailed answer right now. Since the Arduino IDE can load that tiny board, we can assume for now that it works with one of the standard flashing tools under the hood (of the IDE). Likewise the TI SDK would need to document at least the usage of their flash/load method. On the (armv7) udoo boards with cortex-M processors they are connected directly via UART.
This is all I see so far in SDK section 3.15 so it looks like all the coprocessors can load on boot just like the PRU firmware on a BBB:
3.15.1.4.4.1. Firmware
OpenCL firmware includes pre-canned DSP TIDL Lib (with hard-coded kernels) and EVE TIDL Lib following Custom Accelerator model. OpenCL firmware is downloaded to DSP and M4/EVE immediately after Linux boot:
Also, specifically a question for this forum that we are also discussing in slack with @nerdboy, @jkridner@jduchniewicz and others, but posting here to involve different people:
I am still trying to deploy a Tensor Flow Lite model on the BBAI’s IPUs. Yesterday @nerdboy and I realized that we should be able to set this up as long as we can get debug messages out of the IPUs. Given that they don’t have their own stdin/out or UART setup, does anyone have experiences/ideas doing this? Maybe we can get a debug output through rproc?
here is what we have from TFLite Micro as a starting point:
I was, am, still interested in if TF.lite would work on the AI. I have come across Bazel issues before on the BBB while trying to build TF.lite.
…
Now that docker has built specific TF.lite builds, I was thinking that this may currently work. Are you guys still performing at making this work?
Seth
P.S. I also saw their Cross-Compilation documentation for neon, armhf, and porting guide. Anyway, if you guys and gals are still around, please do and try to reply.
Just an update: I found this from TI’s git repo. online.