Weekly Progress Report: Machine Learning on Bela

This thread will serve as a place to follow along with the Running Machine Learning Models on Bela GSoC 2022 project.

Intro video: Ezra Pierce - Deep Learning for Bela - Workshop on Embedded AI for NIME - NIME 2022 - YouTube

Summary of Weeks 1-3

This project aims to develop performance analysis tooling for Bela, this will eventually include profiling and benchmarking. For the first few weeks the focus has been on setting up a developer workflow to be able to translate high-level models (PyTorch, TFLite) to the Bela platform. The initial investigations have been involving the Intermediate Representation Execution Environment from Google (GitHub - iree-org/iree: 👻). The IREE project uses MLIR (https://mlir.llvm.org/) to build an ML compiler and runtime environment for heterogeneous hardware. This tool was chosen for evaluation due to it’s flexible nature (able to provide support for both the BBB and newer archs like BBAI64 & it’s accelerators) and support for various high-level frontends. It also has support for benchmarking using google benchmark and profiling using the Tracy profiler. The IREE runtime will be compared to other approaches so far attempted on the Bela, namely the DeepLearningForBela project, GitHub - rodrigodzf/DeepLearningForBela.

The first few weeks were used to setup a toolchain for building ML models on Bela using libtorch, TFLite and IREE, with the goal of comparing the different options.

  • Set up Bela cross compilation docker container
  • Compiled libtorch & TFLite for Bela
  • Recreated simple benchmark of a MLP model using GitHub - rodrigodzf/DeepLearningForBela
  • Started investigation of ML compilers and runtimes
  • Background reading getting up to speed on MLIR/LLVM and compiler design
  • Background reading on modern build systems, CMake
  • Built IREE host compiler, to be used in Docker cross compilation container
  • Presented project at NIME Embedded AI workshop (see video link above)
  • Cross-compiled IREE runtime components, needed to write custom CMake Toolchain, set up configuration of IREE target HAL backends that works with BBB/Bela
  • Combined approaches from IREE examples of project setup for Cortex-M and Android to find options that work for the BBB/Bela architecture
  • Background research into IREE architecture, compiler options, build options
  • Setup working repo, structure still WIP, will be updated with IREE example soon GitHub - ezrapierce000/bela-ai-toolbench: bela-ai-toolbench: a collection of ai tools for the bela platform

Current task being worked on: Running ‘iree-benchmark-module’ with test MLP model on Bela. Planning on making a decision next Wednesday the 13th on whether or not to pursue IREE support.

Week 4 updates

Current task: Writing IREEFrontend class to be used for benchmarking in DeepLearningForBela project.

This task will be done by Wednesday July 13th to allow for the comparing of benchmarks between IREE and other approaches.

Week 5 updates

  • Successfully built iree-benchmark-module for Bela and benchmarked simple MLP model, resulting in a measurement of 26.2 ms over 1000 iterations. For comparison the lowest latency seen thus far using DeepLearningForBela measured for the same model was 20.59ms using TFLite & XNNPACK, while libtorch with mobile optimizations on was 59.27ms. This benchmarking looks promising so will be pursued further by trying out different model architectures this week.

  • Started new git repo for an IREE on Bela docker container which will include cross-compilation tools, precompiled IREE tools and build system for cross-compiling IREE projects: GitHub - ezrapierce000/bela-iree-container: Dockerized cross-compilation and Machine Learning Tools for Bela

TODO this week:

Updated TODO for the week, in order of priority:

  • Updating README for IREE specific instructions, how to compile, benchmark and profile models
  • Bela IREE runtime/backend proof of concept
  • Investigate Tracy profiler further, try sampling instead of instrumented profiling and any Bela specific configuration
  • Try out and document PyTorch import to IREE
  • GitHub Actions/CI pipeline to DockerHub for docker image
  • Investigate new microkernel implementations (Lower VMVX linalg ops to microkernels. by stellaraccident · Pull Request #9662 · iree-org/iree · GitHub) and compare benchmarks
  • Improve issue management of project on github
  • Setup BBAI64

Week 6 Report

Completed

  • Documented setup of docker container and benchmarking instructions for IREE in README
  • Setup IREE builds in docker image, can cross compile all IREE device runtime components and sample projects for Bela, running into a CMake error for building host projects so just pulling in prebuilt images from the IREE github instead for now
  • Added some Python package installs for importing models in docker
  • Added installs for Tracy profiler dependencies, flags for building instrumented runtime and samples. Able to profile models running on Bela in the iree-benchmark-module with instrumented Tracy. Running the instrumented binary is causing the Bela to crash intermittently, will try sampling profiling this week.
  • Included simple script in docker container to verify the docker image is able to run IREE benchmarks
  • Started to write an IREE runtime, for now just starting with building the demo from the IREE repo and compiling it with the Bela library. Need to update docker image to make sure it includes or builds libbelafull.so this week. I’ve found these docs were helpful understanding how the runtime is designed.

Blockers

  • Need to investigate why simple PyTorch models are failing to be imported via torch-mlir, will need to do some background reading on the Torch dialect
  • Bela SD card issues, switched to using Bela mini for now

TODO

  • Finish basic runtime and create full demo of benchmarking, profling and running an IREE model
  • Reorganize how to docker image is built so that it can be added to GitHub Actions
  • Add exporting of vmfb files from model zoo and document how to use them
  • Clean up CMake configuration so that it follows best practices

Week 7 Report

Took a couple days off so not many updates this week.

Completed

  • Continued work on IREE runtime, don’t understand the scoping of the runtime yet to only able to run inferences in the setup() function not render()
  • Built torch-mlir from source, to be added to docker container

Blockers

  • Unable to access IREE HAL buffer views from render(), need to better understand how the runtime scopes variables and manages memory

TODO

  • Finish basic runtime and create full demo of benchmarking, profling and running an IREE model
  • Reorganize container image so that it can be built without a Bela connected and so that it passes through model zoo from host instead of copying over when building
  • Add exporting of vmfb files from model zoo and document how to use them
  • Clean up CMake configuration so that it follows best practices

Week 8 Report

Completed

  • Completed a prototype of an IREE runtime for Bela, now able to run IREE ML models in real-time (at a reduced sample rate).
  • Added VMFB exporting to embedded-model-zoo repo, will be adding a PR soon

Blockers

  • Importing of most models from model zoo is failing in Torch-MLIR

Todo

  • Taking performance measurements with the Tracy profiler
  • Debugging importing failures from Torch-MLIR
  • Document use of the runtime so that others can try out IREE models on Bela