Weekly Progress Report: Machine Learning on Bela

ezra · July 6, 2022, 5:09pm

This thread will serve as a place to follow along with the Running Machine Learning Models on Bela GSoC 2022 project.

Intro video: Ezra Pierce - Deep Learning for Bela - Workshop on Embedded AI for NIME - NIME 2022 - YouTube

ezra · July 6, 2022, 5:48pm

Summary of Weeks 1-3

This project aims to develop performance analysis tooling for Bela, this will eventually include profiling and benchmarking. For the first few weeks the focus has been on setting up a developer workflow to be able to translate high-level models (PyTorch, TFLite) to the Bela platform. The initial investigations have been involving the Intermediate Representation Execution Environment from Google (GitHub - iree-org/iree: 👻). The IREE project uses MLIR (https://mlir.llvm.org/) to build an ML compiler and runtime environment for heterogeneous hardware. This tool was chosen for evaluation due to it’s flexible nature (able to provide support for both the BBB and newer archs like BBAI64 & it’s accelerators) and support for various high-level frontends. It also has support for benchmarking using google benchmark and profiling using the Tracy profiler. The IREE runtime will be compared to other approaches so far attempted on the Bela, namely the DeepLearningForBela project, GitHub - rodrigodzf/DeepLearningForBela.

The first few weeks were used to setup a toolchain for building ML models on Bela using libtorch, TFLite and IREE, with the goal of comparing the different options.

Set up Bela cross compilation docker container
Compiled libtorch & TFLite for Bela
Recreated simple benchmark of a MLP model using GitHub - rodrigodzf/DeepLearningForBela
Started investigation of ML compilers and runtimes
Background reading getting up to speed on MLIR/LLVM and compiler design
Background reading on modern build systems, CMake
Built IREE host compiler, to be used in Docker cross compilation container
Presented project at NIME Embedded AI workshop (see video link above)
Cross-compiled IREE runtime components, needed to write custom CMake Toolchain, set up configuration of IREE target HAL backends that works with BBB/Bela
Combined approaches from IREE examples of project setup for Cortex-M and Android to find options that work for the BBB/Bela architecture
Background research into IREE architecture, compiler options, build options
Setup working repo, structure still WIP, will be updated with IREE example soon GitHub - ezrapierce000/bela-ai-toolbench: bela-ai-toolbench: a collection of ai tools for the bela platform

Current task being worked on: Running ‘iree-benchmark-module’ with test MLP model on Bela. Planning on making a decision next Wednesday the 13th on whether or not to pursue IREE support.

ezra · July 12, 2022, 6:11am

Week 4 updates

Built IREE simple embedding example for Bela successfully
Started work integrating simple embedding example into DeepLearningForBela to evaluate benchmarks
Further reading based on new IREE RFC, https://groups.google.com/g/iree-discuss/c/qyTy88KLq2c

Current task: Writing IREEFrontend class to be used for benchmarking in DeepLearningForBela project.

This task will be done by Wednesday July 13th to allow for the comparing of benchmarks between IREE and other approaches.

ezra · July 18, 2022, 5:39am

Week 5 updates

Successfully built iree-benchmark-module for Bela and benchmarked simple MLP model, resulting in a measurement of 26.2 ms over 1000 iterations. For comparison the lowest latency seen thus far using DeepLearningForBela measured for the same model was 20.59ms using TFLite & XNNPACK, while libtorch with mobile optimizations on was 59.27ms. This benchmarking looks promising so will be pursued further by trying out different model architectures this week.
Started new git repo for an IREE on Bela docker container which will include cross-compilation tools, precompiled IREE tools and build system for cross-compiling IREE projects: GitHub - ezrapierce000/bela-iree-container: Dockerized cross-compilation and Machine Learning Tools for Bela

TODO this week:

Finish writing setup instructions for others to recreate my benchmarks
Try benchmarking other models
Investigate new microkernel implementations (Lower VMVX linalg ops to microkernels. by stellaraccident · Pull Request #9662 · iree-org/iree · GitHub) and compare benchmarks
Improve issue management of project on github
Setup BBAI64 (may end up doing this the week after)

ezra · July 20, 2022, 3:57pm

Updated TODO for the week, in order of priority:

Updating README for IREE specific instructions, how to compile, benchmark and profile models
Bela IREE runtime/backend proof of concept
Investigate Tracy profiler further, try sampling instead of instrumented profiling and any Bela specific configuration
Try out and document PyTorch import to IREE
GitHub Actions/CI pipeline to DockerHub for docker image
Investigate new microkernel implementations (Lower VMVX linalg ops to microkernels. by stellaraccident · Pull Request #9662 · iree-org/iree · GitHub) and compare benchmarks
Improve issue management of project on github
Setup BBAI64

ezra · July 27, 2022, 4:00am

Week 6 Report

Completed

Documented setup of docker container and benchmarking instructions for IREE in README
Setup IREE builds in docker image, can cross compile all IREE device runtime components and sample projects for Bela, running into a CMake error for building host projects so just pulling in prebuilt images from the IREE github instead for now
Added some Python package installs for importing models in docker
Added installs for Tracy profiler dependencies, flags for building instrumented runtime and samples. Able to profile models running on Bela in the iree-benchmark-module with instrumented Tracy. Running the instrumented binary is causing the Bela to crash intermittently, will try sampling profiling this week.
Included simple script in docker container to verify the docker image is able to run IREE benchmarks
Started to write an IREE runtime, for now just starting with building the demo from the IREE repo and compiling it with the Bela library. Need to update docker image to make sure it includes or builds libbelafull.so this week. I’ve found these docs were helpful understanding how the runtime is designed.

Blockers

Need to investigate why simple PyTorch models are failing to be imported via torch-mlir, will need to do some background reading on the Torch dialect
Bela SD card issues, switched to using Bela mini for now

TODO

Finish basic runtime and create full demo of benchmarking, profling and running an IREE model
Reorganize how to docker image is built so that it can be added to GitHub Actions
Add exporting of vmfb files from model zoo and document how to use them
Clean up CMake configuration so that it follows best practices

ezra · August 3, 2022, 4:10pm

Week 7 Report

Took a couple days off so not many updates this week.

Completed

Continued work on IREE runtime, don’t understand the scoping of the runtime yet to only able to run inferences in the setup() function not render()
Built torch-mlir from source, to be added to docker container

Blockers

Unable to access IREE HAL buffer views from render(), need to better understand how the runtime scopes variables and manages memory

TODO

Finish basic runtime and create full demo of benchmarking, profling and running an IREE model
Reorganize container image so that it can be built without a Bela connected and so that it passes through model zoo from host instead of copying over when building
Add exporting of vmfb files from model zoo and document how to use them
Clean up CMake configuration so that it follows best practices

ezra · August 10, 2022, 4:33pm

Week 8 Report

Completed

Completed a prototype of an IREE runtime for Bela, now able to run IREE ML models in real-time (at a reduced sample rate).
Added VMFB exporting to embedded-model-zoo repo, will be adding a PR soon

Blockers

Importing of most models from model zoo is failing in Torch-MLIR

Todo

Taking performance measurements with the Tracy profiler
Debugging importing failures from Torch-MLIR
Document use of the runtime so that others can try out IREE models on Bela

ezra · August 17, 2022, 3:09pm

Week 9 Report

Completed

Cross-compiled IREE binaries with tracing enabled to allow for profiling
Started setting up Tracy tools in container for host capture
Reading about how to add profiler support directly into the runtime instead of using the iree-benchmark-module utility

Example of Tracy being used on Bela:
IMG_20220720_004607

Blockers

None

Todo

Begin creating a public dashboard with benchmarking results & profiling traces of models that have been able to run on Bela
Try profiling in sampling mode
Document use of Tracy on Bela with demo

ezra · August 24, 2022, 3:43pm

Week 10 Report

Completed

Updated IREE to latest release, updated docker image accordingly
Took benchmarks of various models on Bela:

Model	Export to VMFB	Bela IREE Benchmark
basic_mlp	Yes	24.8ms
resnet_1d	No - IREE compile fails	NA
simple_conv_1d	Yes	2549ms
simple_rnn	No - Can’t export to TF or TOSA	NA
single_mm	Yes	19.7ms
siren_mlp	Yes	802ms
transformer_block	Yes	Segmentation fault
variational_encoder	No - Can’t export to TF or TOSA	NA

Began writing automated process to be able to update the above table without needing a manual process
Installed linux-perf on Bela

Blockers

Tracy profiler is unstable on Bela, intermittently crashing, will need to debug
Transformer block segfaults on Bela, will reduce to simplest reproduction of the bug and file an issue with IREE

Todo

Finish an automated process for model benchmarking dashboard for Bela, including TFLite
Try out linux-perf as an alternative to Tracy
File issue with IREE regarding transformer block segfault

ezra · September 7, 2022, 3:57pm

Update

Back from vacation this week, the deadline for this project has been extended to September 26th. The following three weeks will be focused on documentation, organization and demos of the different IREE & ML tools set up during this project. With the final deliverable allowing people to benchmark and record profiles of their ML workloads on Bela/BBB. I will also be updating a dashboard for current model benchmarks on the BBB.

ezra · September 14, 2022, 3:52pm

Week 13 Report

Completed

Consolidated scripts in to docker image for compilation and benchmarking of models
Added perf visualization tools to docker image, fixed script to save /proc/kallsyms after each benchmark
Cleaned up docker image build and VSCode setup, now mounts working directory instead of copying
Updating benchmarks

Model	Export to VMFB	Bela IREE Benchmark
basic_mlp_1024	Yes	24.8ms
resnet_1d_1024	Yes	NA
simple_conv_1d_1024	Yes	2549ms
simple_rnn	No - Can’t export to TF or TOSA	NA
single_mm_1024	Yes	19.7ms
siren_mlp_1024	Yes	802ms
transformer_block	Yes	Segmentation fault
variational_encoder	No - Can’t export to TF or TOSA	NA
single_mm_256	Yes	0.593ms
simple_conv_1d_256	Yes	1176ms
siren_mlp_256	Yes	200ms

Blockers

perf crashes when enabling --call-graph

Todo

Focus on profile visualization and which profiling data is most pertinent
Work on demo example
Debug model architectures that fail