TIDL/EdgeAI benchmarks on the AI-64

kaelinl · October 9, 2022, 5:11am

Ok! I’ve gotten the benchmarks to run… well, kind of.

Firstly, here’s my setup, modified from the original post. No promises on it being perfect but it works well enough for me.

The virtualenv ends up being a bit of a pain, it might be easier to get rid of it and install things globally.

Prerequisites

I had to use a 128GB SD card. A USB drive or M.2 SSD would probably also work. 64GB for my root file system plus all the benchmarking datasets and models was not enough.

I started with a fresh install of the following image, on an SD card. bbai64-debian-11.5-xfce-edgeai-arm64-2022-10-01-10gb.img.xz from: Debian 11.x (Bullseye) - Monthly Snapshots (ARM64).

System prep

Expand rootfs (if using SD card)

NOTE: Robert reports below that this is an inappropriate way to expand the SD card rootfs, and you should instead allow it to happen naturally through a reboot. I have not yet verified that this works.

The default rootfs is 10GB for this image. To expand it to use your full SD card:

wget https://raw.githubusercontent.com/RobertCNelson/boot-scripts/master/tools/grow_partition.sh
chmod +x grow_partition.sh
sudo ./grow_partition.sh
sudo reboot

Disable web servers

If unused, you can disable these:

sudo systemctl disable --now bb-code-server
sudo systemctl disable --now nodered

TODO: Disable DLR “phone home”

Running EdgeAI benchmark suite

/opt has some of these files already in the EdgeAI image, but not all of them and it isn’t clear what they’re for.

# Specific commits are my best guess to match TIDL 8.2 that seems to come with this image
git clone https://github.com/TexasInstruments/edgeai-benchmark.git && git -C edgeai-benchmark/ checkout 07fa801596d44ebd97c1e3b2a963faf374677808
git clone https://github.com/TexasInstruments/edgeai-modelzoo.git && git -C edgeai-modelzoo/ checkout 20ef897df41198201a88e6250901934466303b57

cd edgeai-benchmark

sudo apt update
sudo apt-get install python3-venv
pip install -U setuptools

# Note: benchmarks need global (pre-existing) install of onnxruntime. We'd like to use --system-site-packages here, but it breaks the "pycocotools" package. I haven't figured out how to work around it. So we hack it later.
python -m venv benchmarkenv
source ./benchmarkenv/bin/activate

# Required for onnx
sudo apt install protobuf-compiler
pip install --upgrade onnx

# Required for my "hacks" patch below
pip install requests

pip install -r requirements_evm.txt

# Manual step: open "benchmarkenv/pyvenv.cfg" in your editor of choice. Set "include-system-site-packages = true".
deactivate

# Manual step: apply the "edgeai-benchmark-hacks.patch" included below

# Root required for raw mounting of /dev/mem
sudo su
source ./benchmarkenv/bin/activate
./run_benchmarks_evm.sh

edgeai-benchmark-hacks.patch (1.9 KB)

I found the following issues in the EdgeAI setup which I had to manually correct using the patch above:

The NYU depth dataset’s host was down for a while. Now that it’s back, I got non-obvious hdf5 errors when loading that dataset. So I just disabled it.
The downloads from TI’s servers of the ONNX models fail with 403 errors, for reasons that are unclear. resuests.get/curl/etc. works but urllib.request.urlopen gets an error. It seems it’s either a failure to handle their 302 redirects or a block of some user agent signature. So I replaced it with (mostly) equivalent requests.get which works fine.

Results

I have succeeded in running the first ~40 benchmarks. The results are these:

report_20221009-024150.csv (9.3 KB)

A few failed due to miscellaneous packages that I am missing (it seems TI’s requirements.txt skipped some required packages). More frustratingly, my board has a habit of resetting at inopportune times. I was originally using a generic Chinese 3A barrel power supply, on which it couldn’t get through the first one or two benchmarks; I swapped it out for a 3A USB-C supply and it got as far as I’ve posted here. It died at some point during the day pretty soon after I left it to go out. I’ve ordered a Raspberry PI branded USB-C supply to see if that improves reliability. I’m not sure I’m going to bother to re-run the suite since I don’t care about the full suite.

I was running with the stock image mentioned above and its default device tree, i.e. Linux only had ~2GB of RAM to work with. I used the default heatsink assembly but have a small fan pointing at it so the CPU has stayed very cool, around 33 degrees C according to the SoC’s internal sensors. I am not sure there’s any thermal throttling or shutdown logic enabled anyway, but I figure it’s worth mentioning.

The only official results of these benchmarks on the TDA4VM I’ve found are in this whitepaper: https://www.ti.com/lit/an/spracz2/spracz2.pdf. Regardless, my numbers look to be about what TI reports. They claim 162 FPS for a ResNet-50 at 224x224 resolution, and 385 FPS for SSD MobileNet V1 at the same resolution. My benchmark run didn’t survive long enough to run that particular MobileNet, but taking the reciprocal of their reported infer_time_core_ms gives an FPS of 154 for ResNet-50. That’s within 10FPS of their numbers so it does seem I’ve managed to reproduce it.

Note that this is still using their own models and test harness, which is surely the most optimistic configuration one can use. I’ll try next with my own ONNX model and see how that performs in a more realistic app.