TIDL/EdgeAI benchmarks on the AI-64

I’m attempting to run TI’s official EdgeAI benchmarks on my AI-64 to replicate their numbers as well as provide a POC before moving on to my own models.

The benchmarks are here: GitHub - TexasInstruments/edgeai-benchmark: EdgeAI Deep Neural Network Models Benchmarking

I don’t have any real background knowledge of how this is supposed to be done, so I’d appreciate any guidance. I’m just guessing how this is supposed to work from their (IMO rather poor) documentation. But I’ll walk through what I’ve tried and why it isn’t working.

My belief is that the “edgeai” version of the OS images includes a copy of TI’s builds of their libraries, and that this ought to enable me to run EdgeAI apps on it.

I started with a fresh install of the following image, on an SD card. bbai64-debian-11.4-xfce-edgeai-arm64-2022-09-02-10gb.img.xz from: Debian 11.x (Bullseye) - Monthly Snapshots (ARM64).

Here’s my process so far. (Written as notes to self, so any "if"s are de facto true here.)

System setup and prep (click to expand)

Setup

Expand rootfs (if using SD card)

The default rootfs is 10GB for this image. To expand it to use your full SD card:

wget https://raw.githubusercontent.com/RobertCNelson/boot-scripts/master/tools/grow_partition.sh
chmod +x grow_partition.sh
sudo ./grow_partition.sh
sudo reboot

Disable web servers

If unused, you can disable these:

sudo systemctl disable --now bb-code-server
sudo systemctl disable --now nodered

Running EdgeAI benchmark suite

/opt has some of these files already in the EdgeAI image, but

# Specific commits are my best guess to match TIDL 8.2 that seems to come with this image
git clone https://github.com/TexasInstruments/edgeai-benchmark.git -b 3342c09c56006f847a4907e2e930991bc2af4a21
git clone https://github.com/TexasInstruments/edgeai-modelzoo.git -b 20ef897df41198201a88e6250901934466303b57

sudo apt update
sudo apt-get install python3-venv
# Note: Retroactively added --system-site-packages because it needs the global install of onnxruntime.
python -m venv --system-site-packages benchmarkenv
source ./benchmarkenv/bin/activate

# Required for onnx
sudo apt install protobuf-compiler
pip install --upgrade onnx

pip install -r requirements_evm.txt

# See notes below before continuing

./run_benchmarks_evm.sh

I found the following issues in the EdgeAI setup which I had to manually correct:

  • The NYU depth dataset’s normal host seems to be unavailable. I had to disable that dataset.
  • The downloads from TI’s servers of the ONNX models failed with 403 errors, for reasons that are unclear. resuests.get/curl/etc. works but urllib.request.urlopen gets an error. I manually downloaded the files into the right places.
Current problem (click to expand)

My current problem

<other output snipped>
libtidl_onnxrt_EP loaded 0xfaf27d0
Final number of subgraphs created are : 1, - Offloaded Nodes - 60, Total Nodes - 60
APP: Init ... !!!
APP_LOG: ERROR: Unable to open /dev/mem !!!
APP_LOG: ERROR: Unable to map memory @ 0xa90000 of size 512 bytes !!!
APP_LOG: ERROR: Unable to mmap gtc (0xa90000 of 512 bytes) !!!
APP: ERROR: Global timer init failed !!!
APP_LOG: ERROR: Unable to open /dev/mem !!!
APP_LOG: ERROR: Unable to map memory @ 0xb2000000 of size 262144 bytes !!!
APP: ERROR: Log writer init failed !!!
APP: Init ... Done !!!
./run_benchmarks_evm.sh: line 52:  1182 Segmentation fault      python3 ./scripts/benchmark_modelzoo.py ${settings_file} "$@"
-------------------------------------------------------------------

Initial investigation

strings tells me that these errors come from /usr/lib/libtivision_apps.so. This binary seems to be closed-source – is that right?

Running under gdb, I am seeing the segfault at:

#0  __new_sem_wait (sem=0x0) at sem_wait.c:39
#1  0x0000ffffa6df67e8 in tivxPlatformSystemLock () from /usr/lib/libtivision_apps.so.8.2.0
#2  0x0000ffffa6dd43b4 in tivxObjDescAlloc () from /usr/lib/libtivision_apps.so.8.2.0
#3  0x0000ffffa6de3000 in vxCreateContext () from /usr/lib/libtivision_apps.so.8.2.0
#4  0x0000ffffa917a71c in TIDLRT_create () from /usr/lib/libvx_tidl_rt.so
#5  0x0000ffffe5a2a788 in TIDL_subgraphRtCreate () from /usr/lib/libtidl_onnxrt_EP.so
#6  0x0000ffffe5a29018 in TIDL_createStateInferFunc () from /usr/lib/libtidl_onnxrt_EP.so
#7  0x0000ffffa930a184 in ?? () from /usr/lib/python3.9/dist-packages/onnxruntime/capi/onnxruntime_pybind11_state.so
<cut to omit CPython function call internals>

I found docs for the two top (non-std) call frames:

If GDB is correctly reporting the arg to sem_wait as 0x0, then I guess that’d be the problem. I’m not 100% sure I trust its stack analysis and haven’t dug into it manually.

I don’t have a great theory for what’s going on here so far. I imagine it’s likely that the mmap failings reported via stdout then cascade into an attempt to unlock a garbage mutex, but it could also be that those mmap errors have perfectly functional fallbacks – I don’t know.

I also imagine this wouldn’t be specific to their benchmarks, but I haven’t yet looked for a means of smoke testing TIDL with less “stuff” in the way.

I also found a post which references the same error, although it doesn’t seem to give a specific misconfiguration to look for other than “incorrect memory mappings”: TDA4VM: APP_LOG: ERROR: Unable to map memory @ 0xb2000000 of size 262144 bytes !!! - Processors forum - Processors - TI E2E support forums. Perhaps someone from BB.org knows what the Beagle device tree looks like vs. what TI’s does.

dmesg:
ai64-tidl-initial-testing-dmesg.txt (48.3 KB)

Questions

Have others gotten TIDL/EdgeAI running on the AI-64? Is there something obvious I’m doing wrong, e.g. is a Debian 10 image or a different build of their libraries expected to work better?

I imagine that there’s either a device tree memory mapping in the Beagle OS distribution that doesn’t match what TI expects, or I’ve made a versioning error. Can anyone point me in a good direction re: resolving these errors? If that binary is indeed closed-source, it seems it’ll be a big pain to reverse-engineer the problem.

When I next have some time to look at this I’ll probably see if those mmap-failed addresses have particular significance in TI’s default device trees (e.g. are peripheral MMIO of interest); maybe that’ll uncover a lead. Although if what’s actually failing is whatever it’s trying to do with /dev/mem, maybe that’s a waste of time.

Are you still booting without the shared memory ? If so, that is probably why the mmaps are failing.

I think you can find the source for the vision apps in the Ti PROCESSOR-SDK-RTOS-J721E package.

I also believe you can only build the vision apps on a Linux host.
I haven’t built them so I don’t know how easy it is or if there are any issues.

Run as… root…

I’ve had this error with both device tree variants. Currently it’s with the shared memory disabled but the error here has been seen on both.

Hmmm, I can’t remember if I’ve tried that. I’ll give it a shot when I’m home from work. I did try overriding the file system permissions of /dev/mem ephemerally (to allow non-root r/w, super secure) and that didn’t have an effect.

My vague recollection is that when I ran it as root it failed in an entirely unrelated way due to Python module path and as a result I opted to try the above instead.

TI’s sdk and all demo’s assume root for /dev/mem access, i tried to make it work as debian…

Yeah, TI has a git fork of those libraries, we have a pre-packaged version pre-installed… running pip install as a normal user gets in the way…

Regards,

1 Like

Ok, progress!

When running as root, I get:

libtidl_onnxrt_EP loaded 0x3cf5fff0
Final number of subgraphs created are : 1, - Offloaded Nodes - 60, Total Nodes - 60
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=5) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
  3747.474942 s: GTC Frequency = 200 MHz
APP: Init ... Done !!!
  3747.481848 s:  VX_ZONE_INIT:Enabled
  3747.481886 s:  VX_ZONE_ERROR:Enabled
  3747.481893 s:  VX_ZONE_WARNING:Enabled
  3747.482749 s:  VX_ZONE_INIT:[tivxInitLocal:130] Initialization Done !!!
  3747.482892 s:  VX_ZONE_INIT:[tivxHostInitLocal:86] Initialization Done for HOST !!!
  3747.509410 s:  VX_ZONE_ERROR:[ownContextSendCmd:815] Command ack message returned failure cmd_status: -1
  3747.509442 s:  VX_ZONE_ERROR:[ownContextSendCmd:851] tivxEventWait() failed.
  3747.509463 s:  VX_ZONE_ERROR:[ownNodeKernelInit:538] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  3747.509473 s:  VX_ZONE_ERROR:[ownNodeKernelInit:539] Please be sure the target callbacks have been registered for this core
  3747.509483 s:  VX_ZONE_ERROR:[ownNodeKernelInit:540] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  3747.509497 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl ... failed !!!
  3747.509513 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
  3747.509522 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
TIDL_RT_OVX: ERROR: Verifying TIDL graph ... Failed !!!
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
infer 1/116: cl-6060_onnxrt_imagenet1k_edgeai-tv_mobilenet_v|          |     0% 0/10000| [< ]  3747.589662 s:  VX_ZONE_ERROR:[ownContextSendCmd:815] Command ack message returned failure cmd_status: -1
  3747.589695 s:  VX_ZONE_ERROR:[ownContextSendCmd:851] tivxEventWait() failed.
  3747.589711 s:  VX_ZONE_ERROR:[ownNodeKernelInit:538] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  3747.589720 s:  VX_ZONE_ERROR:[ownNodeKernelInit:539] Please be sure the target callbacks have been registered for this core
  3747.589727 s:  VX_ZONE_ERROR:[ownNodeKernelInit:540] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  3747.589741 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl ... failed !!!
  3747.589756 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
  3747.589766 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
  3747.589873 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:820] graph is not in a state required to be scheduled
  3747.589885 s:  VX_ZONE_ERROR:[vxProcessGraph:755] schedule graph failed
  3747.589891 s:  VX_ZONE_ERROR:[vxProcessGraph:760] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
  3747.617219 s:  VX_ZONE_ERROR:[ownContextSendCmd:815] Command ack message returned failure cmd_status: -1
  3747.617252 s:  VX_ZONE_ERROR:[ownContextSendCmd:851] tivxEventWait() failed.
  3747.617267 s:  VX_ZONE_ERROR:[ownNodeKernelInit:538] Target kernel, TIVX_CMD_NODE_CREATE failed for node TIDLNode
  3747.617277 s:  VX_ZONE_ERROR:[ownNodeKernelInit:539] Please be sure the target callbacks have been registered for this core
  3747.617290 s:  VX_ZONE_ERROR:[ownNodeKernelInit:540] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
  3747.617301 s:  VX_ZONE_ERROR:[ownGraphNodeKernelInit:583] kernel init for node 0, kernel com.ti.tidl ... failed !!!
  3747.617314 s:  VX_ZONE_ERROR:[vxVerifyGraph:2055] Node kernel init failed
  3747.617324 s:  VX_ZONE_ERROR:[vxVerifyGraph:2109] Graph verify failed
  3747.617430 s:  VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:820] graph is not in a state required to be scheduled
  3747.617441 s:  VX_ZONE_ERROR:[vxProcessGraph:755] schedule graph failed
  3747.617447 s:  VX_ZONE_ERROR:[vxProcessGraph:760] wait graph failed
ERROR: Running TIDL graph ... Failed !!!
<...repeats ad nauseam...>

No additional messages are printed in dmesg beyond the initial boot (14 seconds in).

This is running the “with shared memory” (i.e., stock) device tree.

I’m working on debugging, but conveniently, TI has taken their forums offline for five days for maintenance. And I’m getting minimal hits in internet archive/Google cache/etc. So I might be SOL.

For reference, current dmesg
ai64-tidl-better-testing-dmesg.txt (52.2 KB)

Digging into this a bit, my initial guess was that it might be trying to communicate with one of the DSP cores and failing. This excerpt from the kernel log suggests the important ones are up and functioning:

[    9.845636]  remoteproc12#vdev0buffer: assigned reserved memory node vision-apps-c66-dma-memory@a9000000
[    9.845736]  remoteproc12#vdev0buffer: registered virtio0 (type 7)
[    9.845740] remoteproc remoteproc12: remote processor 4d80800000.dsp is now up
[    9.872252] remoteproc remoteproc13: powering up 4d81800000.dsp
[    9.872271] remoteproc remoteproc13: Booting fw image vision_apps_eaik/vx_app_rtos_linux_c6x_2.out, size 1461012
[    9.874445] k3-dsp-rproc 4d81800000.dsp: booting DSP core using boot addr = 0xa9d9a000
[    9.885563]  remoteproc13#vdev0buffer: assigned reserved memory node vision-apps-c66-dma-memory@a8000000
[    9.885667]  remoteproc13#vdev0buffer: registered virtio1 (type 7)
[    9.885671] remoteproc remoteproc13: remote processor 4d81800000.dsp is now up
[    9.941591] omap_rng 4e10000.rng: Random Number Generator ver. 241b34c
[   10.191006] remoteproc remoteproc14: powering up 64800000.dsp
[   10.191020] remoteproc remoteproc14: Booting fw image vision_apps_eaik/vx_app_rtos_linux_c7x_1.out, size 13242432
[   10.191075] remoteproc remoteproc14: unsupported resource 65538
[   10.195258] k3-dsp-rproc 64800000.dsp: booting DSP core using boot addr = 0xaa200000
[   10.209698]  remoteproc14#vdev0buffer: assigned reserved memory node vision-apps-c71-dma-memory@aa000000
[   10.209812]  remoteproc14#vdev0buffer: registered virtio2 (type 7)
[   10.209817] remoteproc remoteproc14: remote processor 64800000.dsp is now up

The only issue I see here is the “remoteproc remoteproc14: unsupported resource 65538”.

I was hoping there’d be an environment variable or similar for enabling more verbose logging, but it seems it requires in-process API calls: TIOVX User Guide: Debug Tools for TIOVX

Logs from the remote cores:

remote_proc_logs.txt (31.0 KB)

(captured via /opt/vision_apps/vx_app_arm_remote_log.out tool)

This appears to be the key:

[C7x_1 ]    278.608964 s:  VX_ZONE_ERROR:[tivxKernelTIDLCreate:644] Network version - 0x20220823, Expected version - 0x20211201

So it looks like the version of the TIDL framework is older than the models I’m using, by around four months. I thought I had checked out an appropriate version. I’ll look into this.

Ok! I’ve gotten the benchmarks to run… well, kind of.

Firstly, here’s my setup, modified from the original post. No promises on it being perfect but it works well enough for me.

The virtualenv ends up being a bit of a pain, it might be easier to get rid of it and install things globally.

Prerequisites

I had to use a 128GB SD card. A USB drive or M.2 SSD would probably also work. 64GB for my root file system plus all the benchmarking datasets and models was not enough.

I started with a fresh install of the following image, on an SD card. bbai64-debian-11.5-xfce-edgeai-arm64-2022-10-01-10gb.img.xz from: Debian 11.x (Bullseye) - Monthly Snapshots (ARM64).

System prep

Expand rootfs (if using SD card)

NOTE: Robert reports below that this is an inappropriate way to expand the SD card rootfs, and you should instead allow it to happen naturally through a reboot. I have not yet verified that this works.

The default rootfs is 10GB for this image. To expand it to use your full SD card:

wget https://raw.githubusercontent.com/RobertCNelson/boot-scripts/master/tools/grow_partition.sh
chmod +x grow_partition.sh
sudo ./grow_partition.sh
sudo reboot

Disable web servers

If unused, you can disable these:

sudo systemctl disable --now bb-code-server
sudo systemctl disable --now nodered

TODO: Disable DLR “phone home”

Running EdgeAI benchmark suite

/opt has some of these files already in the EdgeAI image, but not all of them and it isn’t clear what they’re for.

# Specific commits are my best guess to match TIDL 8.2 that seems to come with this image
git clone https://github.com/TexasInstruments/edgeai-benchmark.git && git -C edgeai-benchmark/ checkout 07fa801596d44ebd97c1e3b2a963faf374677808
git clone https://github.com/TexasInstruments/edgeai-modelzoo.git && git -C edgeai-modelzoo/ checkout 20ef897df41198201a88e6250901934466303b57

cd edgeai-benchmark

sudo apt update
sudo apt-get install python3-venv
pip install -U setuptools

# Note: benchmarks need global (pre-existing) install of onnxruntime. We'd like to use --system-site-packages here, but it breaks the "pycocotools" package. I haven't figured out how to work around it. So we hack it later.
python -m venv benchmarkenv
source ./benchmarkenv/bin/activate

# Required for onnx
sudo apt install protobuf-compiler
pip install --upgrade onnx

# Required for my "hacks" patch below
pip install requests

pip install -r requirements_evm.txt

# Manual step: open "benchmarkenv/pyvenv.cfg" in your editor of choice. Set "include-system-site-packages = true".
deactivate

# Manual step: apply the "edgeai-benchmark-hacks.patch" included below

# Root required for raw mounting of /dev/mem
sudo su
source ./benchmarkenv/bin/activate
./run_benchmarks_evm.sh

edgeai-benchmark-hacks.patch (1.9 KB)

I found the following issues in the EdgeAI setup which I had to manually correct using the patch above:

  • The NYU depth dataset’s host was down for a while. Now that it’s back, I got non-obvious hdf5 errors when loading that dataset. So I just disabled it.
  • The downloads from TI’s servers of the ONNX models fail with 403 errors, for reasons that are unclear. resuests.get/curl/etc. works but urllib.request.urlopen gets an error. It seems it’s either a failure to handle their 302 redirects or a block of some user agent signature. So I replaced it with (mostly) equivalent requests.get which works fine.

Results

I have succeeded in running the first ~40 benchmarks. The results are these:

report_20221009-024150.csv (9.3 KB)

A few failed due to miscellaneous packages that I am missing (it seems TI’s requirements.txt skipped some required packages). More frustratingly, my board has a habit of resetting at inopportune times. I was originally using a generic Chinese 3A barrel power supply, on which it couldn’t get through the first one or two benchmarks; I swapped it out for a 3A USB-C supply and it got as far as I’ve posted here. It died at some point during the day pretty soon after I left it to go out. I’ve ordered a Raspberry PI branded USB-C supply to see if that improves reliability. I’m not sure I’m going to bother to re-run the suite since I don’t care about the full suite.

I was running with the stock image mentioned above and its default device tree, i.e. Linux only had ~2GB of RAM to work with. I used the default heatsink assembly but have a small fan pointing at it so the CPU has stayed very cool, around 33 degrees C according to the SoC’s internal sensors. I am not sure there’s any thermal throttling or shutdown logic enabled anyway, but I figure it’s worth mentioning.

The only official results of these benchmarks on the TDA4VM I’ve found are in this whitepaper: https://www.ti.com/lit/an/spracz2/spracz2.pdf. Regardless, my numbers look to be about what TI reports. They claim 162 FPS for a ResNet-50 at 224x224 resolution, and 385 FPS for SSD MobileNet V1 at the same resolution. My benchmark run didn’t survive long enough to run that particular MobileNet, but taking the reciprocal of their reported infer_time_core_ms gives an FPS of 154 for ResNet-50. That’s within 10FPS of their numbers so it does seem I’ve managed to reproduce it.

Note that this is still using their own models and test harness, which is surely the most optimistic configuration one can use. I’ll try next with my own ONNX model and see how that performs in a more realistic app.

3 Likes

This is already built-in to the image, via these 2 systemd service files…

grow_partition.service 
resize_filesystem.service

They are pre-enabled for first bootup, and require a system reboot, the old “grow_partition.sh” doesn’t know about the bbai64’s partition layout…

Regards,

1 Like

Initial signs of life on running inference with TIDL. This software is a complete mess – if you look at it funny, it prints uninitialized memory contents to the terminal or segfaults when you return from main. It also segfaults any time it fails to open a file, dislikes the format of a file, or just wants to for the entertainment value. Their training code also crashes when running under Docker for unknown reasons.

Repo with minimal compilation+inference code: GitHub - WasabiFan/tidl-custom-model-demo: WIP

I haven’t actually demonstrated that the quantized model is performing appropriately, but at least it seems to run and output values. It takes around 7ms as measured from a Python caller to run inference on a ResNet-50 quantized to 8 bit precision.


Thanks for the tip. On my first install I had rebooted a few times and the rootfs was still small, but perhaps that was an anomaly/transient error. I didn’t wait the second time I flashed a card before I did the above. I’ll keep this in mind the next time I set up a card.

1 Like

@RobertCNelson Is producing an image which includes TIDL 8.4 still being looked into? Just wondering if it’s going to appear at some point and what the blockers are.

Talking with TIDL team at TI, they’d like us to wait till 8.5… 8.4 was mostly a cleanup, where they took things we learned from porting their 8.2 yocto stack to debian…

Regards,

1 Like