Weekly Progress Report: Machine Learning on Bela

This thread will serve as a place to follow along with the Running Machine Learning Models on Bela GSoC 2022 project.

Intro video: Ezra Pierce - Deep Learning for Bela - Workshop on Embedded AI for NIME - NIME 2022 - YouTube

1 Like

Summary of Weeks 1-3

This project aims to develop performance analysis tooling for Bela, this will eventually include profiling and benchmarking. For the first few weeks the focus has been on setting up a developer workflow to be able to translate high-level models (PyTorch, TFLite) to the Bela platform. The initial investigations have been involving the Intermediate Representation Execution Environment from Google (GitHub - iree-org/iree: 👻). The IREE project uses MLIR (https://mlir.llvm.org/) to build an ML compiler and runtime environment for heterogeneous hardware. This tool was chosen for evaluation due to it’s flexible nature (able to provide support for both the BBB and newer archs like BBAI64 & it’s accelerators) and support for various high-level frontends. It also has support for benchmarking using google benchmark and profiling using the Tracy profiler. The IREE runtime will be compared to other approaches so far attempted on the Bela, namely the DeepLearningForBela project, GitHub - rodrigodzf/DeepLearningForBela.

The first few weeks were used to setup a toolchain for building ML models on Bela using libtorch, TFLite and IREE, with the goal of comparing the different options.

  • Set up Bela cross compilation docker container
  • Compiled libtorch & TFLite for Bela
  • Recreated simple benchmark of a MLP model using GitHub - rodrigodzf/DeepLearningForBela
  • Started investigation of ML compilers and runtimes
  • Background reading getting up to speed on MLIR/LLVM and compiler design
  • Background reading on modern build systems, CMake
  • Built IREE host compiler, to be used in Docker cross compilation container
  • Presented project at NIME Embedded AI workshop (see video link above)
  • Cross-compiled IREE runtime components, needed to write custom CMake Toolchain, set up configuration of IREE target HAL backends that works with BBB/Bela
  • Combined approaches from IREE examples of project setup for Cortex-M and Android to find options that work for the BBB/Bela architecture
  • Background research into IREE architecture, compiler options, build options
  • Setup working repo, structure still WIP, will be updated with IREE example soon GitHub - ezrapierce000/bela-ai-toolbench: bela-ai-toolbench: a collection of ai tools for the bela platform

Current task being worked on: Running ‘iree-benchmark-module’ with test MLP model on Bela. Planning on making a decision next Wednesday the 13th on whether or not to pursue IREE support.

1 Like

Week 4 updates

Current task: Writing IREEFrontend class to be used for benchmarking in DeepLearningForBela project.

This task will be done by Wednesday July 13th to allow for the comparing of benchmarks between IREE and other approaches.

Week 5 updates

  • Successfully built iree-benchmark-module for Bela and benchmarked simple MLP model, resulting in a measurement of 26.2 ms over 1000 iterations. For comparison the lowest latency seen thus far using DeepLearningForBela measured for the same model was 20.59ms using TFLite & XNNPACK, while libtorch with mobile optimizations on was 59.27ms. This benchmarking looks promising so will be pursued further by trying out different model architectures this week.

  • Started new git repo for an IREE on Bela docker container which will include cross-compilation tools, precompiled IREE tools and build system for cross-compiling IREE projects: GitHub - ezrapierce000/bela-iree-container: Dockerized cross-compilation and Machine Learning Tools for Bela

TODO this week:

Updated TODO for the week, in order of priority:

  • Updating README for IREE specific instructions, how to compile, benchmark and profile models
  • Bela IREE runtime/backend proof of concept
  • Investigate Tracy profiler further, try sampling instead of instrumented profiling and any Bela specific configuration
  • Try out and document PyTorch import to IREE
  • GitHub Actions/CI pipeline to DockerHub for docker image
  • Investigate new microkernel implementations (Lower VMVX linalg ops to microkernels. by stellaraccident · Pull Request #9662 · iree-org/iree · GitHub) and compare benchmarks
  • Improve issue management of project on github
  • Setup BBAI64

Week 6 Report

Completed

  • Documented setup of docker container and benchmarking instructions for IREE in README
  • Setup IREE builds in docker image, can cross compile all IREE device runtime components and sample projects for Bela, running into a CMake error for building host projects so just pulling in prebuilt images from the IREE github instead for now
  • Added some Python package installs for importing models in docker
  • Added installs for Tracy profiler dependencies, flags for building instrumented runtime and samples. Able to profile models running on Bela in the iree-benchmark-module with instrumented Tracy. Running the instrumented binary is causing the Bela to crash intermittently, will try sampling profiling this week.
  • Included simple script in docker container to verify the docker image is able to run IREE benchmarks
  • Started to write an IREE runtime, for now just starting with building the demo from the IREE repo and compiling it with the Bela library. Need to update docker image to make sure it includes or builds libbelafull.so this week. I’ve found these docs were helpful understanding how the runtime is designed.

Blockers

  • Need to investigate why simple PyTorch models are failing to be imported via torch-mlir, will need to do some background reading on the Torch dialect
  • Bela SD card issues, switched to using Bela mini for now

TODO

  • Finish basic runtime and create full demo of benchmarking, profling and running an IREE model
  • Reorganize how to docker image is built so that it can be added to GitHub Actions
  • Add exporting of vmfb files from model zoo and document how to use them
  • Clean up CMake configuration so that it follows best practices

Week 7 Report

Took a couple days off so not many updates this week.

Completed

  • Continued work on IREE runtime, don’t understand the scoping of the runtime yet to only able to run inferences in the setup() function not render()
  • Built torch-mlir from source, to be added to docker container

Blockers

  • Unable to access IREE HAL buffer views from render(), need to better understand how the runtime scopes variables and manages memory

TODO

  • Finish basic runtime and create full demo of benchmarking, profling and running an IREE model
  • Reorganize container image so that it can be built without a Bela connected and so that it passes through model zoo from host instead of copying over when building
  • Add exporting of vmfb files from model zoo and document how to use them
  • Clean up CMake configuration so that it follows best practices

Week 8 Report

Completed

  • Completed a prototype of an IREE runtime for Bela, now able to run IREE ML models in real-time (at a reduced sample rate).
  • Added VMFB exporting to embedded-model-zoo repo, will be adding a PR soon

Blockers

  • Importing of most models from model zoo is failing in Torch-MLIR

Todo

  • Taking performance measurements with the Tracy profiler
  • Debugging importing failures from Torch-MLIR
  • Document use of the runtime so that others can try out IREE models on Bela

Week 9 Report

Completed

  • Cross-compiled IREE binaries with tracing enabled to allow for profiling
  • Started setting up Tracy tools in container for host capture
  • Reading about how to add profiler support directly into the runtime instead of using the iree-benchmark-module utility

Example of Tracy being used on Bela:
IMG_20220720_004607

Blockers

  • None

Todo

  • Begin creating a public dashboard with benchmarking results & profiling traces of models that have been able to run on Bela
  • Try profiling in sampling mode
  • Document use of Tracy on Bela with demo

Week 10 Report

Completed

  • Updated IREE to latest release, updated docker image accordingly

  • Took benchmarks of various models on Bela:

Model Export to VMFB Bela IREE Benchmark
basic_mlp Yes 24.8ms
resnet_1d No - IREE compile fails NA
simple_conv_1d Yes 2549ms
simple_rnn No - Can’t export to TF or TOSA NA
single_mm Yes 19.7ms
siren_mlp Yes 802ms
transformer_block Yes Segmentation fault
variational_encoder No - Can’t export to TF or TOSA NA
  • Began writing automated process to be able to update the above table without needing a manual process
  • Installed linux-perf on Bela

Blockers

  • Tracy profiler is unstable on Bela, intermittently crashing, will need to debug
  • Transformer block segfaults on Bela, will reduce to simplest reproduction of the bug and file an issue with IREE

Todo

  • Finish an automated process for model benchmarking dashboard for Bela, including TFLite
  • Try out linux-perf as an alternative to Tracy
  • File issue with IREE regarding transformer block segfault
1 Like

Update

Back from vacation this week, the deadline for this project has been extended to September 26th. The following three weeks will be focused on documentation, organization and demos of the different IREE & ML tools set up during this project. With the final deliverable allowing people to benchmark and record profiles of their ML workloads on Bela/BBB. I will also be updating a dashboard for current model benchmarks on the BBB.

Week 13 Report

Completed

  • Consolidated scripts in to docker image for compilation and benchmarking of models
  • Added perf visualization tools to docker image, fixed script to save /proc/kallsyms after each benchmark
  • Cleaned up docker image build and VSCode setup, now mounts working directory instead of copying
  • Updating benchmarks
Model Export to VMFB Bela IREE Benchmark
basic_mlp_1024 Yes 24.8ms
resnet_1d_1024 Yes NA
simple_conv_1d_1024 Yes 2549ms
simple_rnn No - Can’t export to TF or TOSA NA
single_mm_1024 Yes 19.7ms
siren_mlp_1024 Yes 802ms
transformer_block Yes Segmentation fault
variational_encoder No - Can’t export to TF or TOSA NA
single_mm_256 Yes 0.593ms
simple_conv_1d_256 Yes 1176ms
siren_mlp_256 Yes 200ms

Blockers

  • perf crashes when enabling --call-graph

Todo

  • Focus on profile visualization and which profiling data is most pertinent
  • Work on demo example
  • Debug model architectures that fail
1 Like