I’ve been experimenting with training models for TIDL. As far as I have been able to tell, they have no end-to-end documentation on this process. The various sample repos are omnibus “do-it-all” scripts, but aren’t practical for personal reference.
I have put together a collection of scripts and associated notes which is minimal and self-contained. It goes through the process of training a YOLOv5 detection model, compiling it with the TIDL tools, and running it on a BB AI-64.
I’d be happy to discuss my learnings or provide help where I can. Feel free to open issues or comment here with questions or suggestions. Everything I’ve written is based on experimentation, so if anything is wrong or could be improved please do let me know.
I’ve been running the smallest models, so yolov5s6. I’m timing from Python (run inference in a loop 2000 times, divide wall clock time taken by 2000). At 640x640 I was getting an average of 15ms and at 384*384 it’s around 7.5ms. I haven’t tried C++ to see closer to native performance. And as I note in the README, the number of detections affects the time taken due to postprocessing; I’ve seen a swing of 1ms or so due to this.
PSA: I’ve updated the samples to include a C++ inference app. It mirrors the Python implementation, printing the time per forward pass and drawing boxes on images. I used onnxruntime, the same API as is available in Python.
There is also a “tidlrt” available in C++, which may be faster, but I haven’t tried. Since it’s TI’s proprietary API it’s probably harder to use; at a glance, it doesn’t look pleasant.
Ah, yeah, I forget the details on why the example files don’t include this key, but the solution is to add it to your dataset YAML file where it specifies augmentations:
Is this from the training script, export script, or compilation script? And are you able to run the model on a board/compile it/etc. despite these errors (e.g. does it actually output the files it’s supposed to)?
I don’t recall seeing this error before but I know the export script prints a handful of similar ones where it can’t find modules that implement extra (unnecessary) features so skips them. I might just be forgetting CoreML among them. Certainly, I don’t think CoreML is involved in this pipeline. So if the model still works I’m confident it’s fine, and if not there’s probably some minor tweak needed to avoid using that module at all.
Thank you very much for this compilation, I have been struggling to run Object Detection on BB AI-64.
I have modified your code a little bit to run pre-compiled models from TIDL.
I am currently able to run ONR-OD-8040-ssd-lite-regNetX-200mf-fpn-bgr-mmdet-coco-320x320 from TIDL model_zoo.
However, if I try to run any other model for example, ONR-OD-8030-ssd-lite-mobv2-fpn-mmdet-coco-512x512 or ONR-OD-8050-ssd-lite-regNetX-800mf-fpn-bgr-mmdet-coco-512x512 or ONR-OD-8230-yolox-m-lite-mmdet-coco-640x640 the BB AI-64 crashes and my terminal hangs, without giving any error or warning.
Basically, I am not able to run model larger than input size 320x320. I guess there is some memory problem, which I am not able to figure out.
This was also the case when I was running edge_ai_app from TIDL. But, your code made it possible atleast to run 320x320 model.
Note: I am using the latest BB AI-64 image from here
Please help me as I am struggling with this for long time.
EDIT: here is the output in hanged terminal before crashing:
That’s unfortunate. I have been able to run 640x640 YOLOv5 models, but haven’t otherwise experimented with larger ones or different architectures.
It could be a memory capacity issue, but it’s hard to say. My experience has been similar in that the software is buggy and sometimes crashes, but usually it’s not reproducible.
I see you have gstreamer and other additions to the script. Perhaps try removing as much as possible aside from the inference to isolate it. And confirm that the errors printed aren’t an issue.
Can you post a new thread so that beagleboard.org folks see it and can provide assistance? Also, it would probably be helpful to check the serial console to see if there are any kernel panic messages, but I understand most people don’t have the hardware handy.
Have you set both TIDL_TOOLS_PATH and LD_LIBRARY_PATH? I anticipate you need both. You can append them to your bashrc as shown (to run them on all new terminals) or just run the two export commands directly in the active terminal for the current session like you showed for the first one.
If that doesn’t work, a screenshot including both export commands would be helpful.
Alternately, in the last few weeks I figured out how to get Docker working for model compilation. I haven’t updated the main branch since I am hoping to do it along with upgrading TI’s tools version. But the instructions in this unpublished version should hopefully work on the existing main branch: GitHub - WasabiFan/tidl-yolov5-custom-model-demo at edgeai-8-6
Docker will avoid the need to fiddle with the tools on your host machine.
It sounds like the value set for TIDL_TOOLS_PATH isn’t a correct path to the extracted tidl_tools directory. Did you modify the value I showed to match where you extracted it on your machine?
Interesting. Yeah that would be an issue. Can you download the tar archive in a browser instead? Paste the URL into the browser instead of using wget, to see if there’s some other access error. The file size looks correct so I suspect it’s fine but it would be worth a shot.