Weekly Progress Report Thread: Enhanced Media Experience with AI-Powered Commercial Detection and Replacement

Aryan_Nanda · May 16, 2024, 3:45pm

This thread will contain my weekly progress report for the project Enhanced Media Experience with AI-Powered Commercial Detection and Replacement.

My Mentors for the project are @jkridner, @lorforlinux and @KumarAbhishek
My proposal link - Enhanced Media Experience with AI-Powered Commercial Detection and Replacement - Aryan Nanda — gsoc.beagleboard.io documentation

Aryan_Nanda · May 16, 2024, 8:58pm

Links to:-

Aryan_Nanda · May 18, 2024, 6:36am

Also, I have set up a Blog Website to document my progress, share my research, discuss the challenges I encounter and how I overcome them, and provide ‘how to’ guides. I’ll be posting biweekly blogs.

Aryan_Nanda · May 30, 2024, 3:59pm

Since the GSoC coding period started on May 27th, I am considering this a Pre-GSoC update. I’ve set up my blog website, completed the intro video presentation, and had it reviewed by my mentors. Additionally, I created code repositories on GitHub and GitLab. Due to my ongoing exams, I haven’t made much progress on the coding part. I will catch up on this after my exams conclude next week.

For the upcoming week, I plan to make and upload the intro video, start with commercial data acquisition, and set up my BeagleBone AI-64. Although I am unsure how much time I can dedicate in the upcoming week due to my exams, I will strive to accomplish as much as possible.

Aryan_Nanda · June 6, 2024, 3:33pm

Week0 updates:

Intro video done
Flashed the BeagleBone AI-64
Went through documentation of the YouTube 8M dataset.

Blockers:

Getting errors on running demo ml models present in BeagleBone AI-64

Upcoming week goals:

Getting the dataset ready.
Solving the error and running demo models.

Aryan_Nanda · June 13, 2024, 3:07pm

Week1 Updates:

Currently collecting datasets.
Extracted audio-visual features from approximately 1400 commercial videos in the YouTube 8m dataset.
Resolved last week’s issue. Will use a monitor instead of VNC Viewer to run demo ML models.

Blockers:

Downloading the dataset is time-consuming due to its size (1.50TB). Downloading 1/10th of the dataset at a time to manage.

Next Week Goals:

Complete dataset collection (both Commercial and Non-Commercial).
Begin preparing the model pipeline.

Aryan_Nanda · June 13, 2024, 3:10pm

Links to commit:

Aryan_Nanda · June 17, 2024, 5:00pm

Minutes of Meeting(17-06-2024)

Attendees:

Key Points:

I gave updates on Dataset Collection and Feature Extraction.
The necessary cables are currently out for delivery.
Alternative Dataset Collection Method: @lorforlinux suggested an alternative method for dataset collection using a SetUp box, in case the YouTube 8M dataset provides insufficient accuracy for real-time media processing.
He also proposed a video pipeline:
- Input from SetUp box to laptop via HDMI-USB cable.
- Processing on BeagleBone AI-64.
- Display output on the laptop using HDMI-mini-DP cable.
Model Training: He recommended using Google Colab for model training so that all mentors could get involved
Discussion on Google Colab’s paid services:
- I need to create a Google Colab notebook and share it with @lorforlinux.
- He will check if he can use his Google Colab paid services on that notebook and make it available for all.
- If not, I will purchase the paid services myself and be reimbursed.

Aryan_Nanda · June 20, 2024, 3:37pm

Week2 Updates:

Dataset collection done
Started with dataset preprocessing. (Merged Audio-Visual features)
Week 0-1 Blog out - Link

Blockers:

While I was downloading the dataset in chunks and continuously updating the same file, the Jupyter notebook kernel restarted while processing the second-to-last chunk, which corrupted the file. As a result, I had to re-download the entire commercial features.

Next Week Goals:

Complete dataset preprocessing.
Begin writing the model code, starting with a simple model (SVM), as I already have audio-visual features.
Implement a CI/CD pipeline to ensure the code is reproducible.
Connect the Beaglebone AI-64 to the monitor and run demo models on it.

lorforlinux · June 20, 2024, 3:37pm

@Aryan_Nanda

Set-top box HDMI output to input of HDMI to USB converter connected to one of the USB ports of BeagleBone AI-64 or Laptop to record the training dataset.
Upload the dataset to Google Collab Pro account storage and train your model.
When you make the inference on BeagleBone AI-64, connect the set-top box or media player thingy HDMI to input the HDMI to USB converter input and that to your BeagleBone AI-64 for inferencing. Use active miniDP to HDMI cable from your BeagleBone AI-64 and connect to a monitor to see the output generated by your model.

For 1st step, you can also use HDMI to CSI converter with BeagleBone AI-64 to record the dataset and for 3rd step use the same HDMI to CSI setup to get the video feed and connect active miniDP to HDMI with your HDMI to USB and that to your laptop so that you don’t need any additional monitor.

jkridner · June 20, 2024, 4:03pm

The issue I’ve found with that is capturing the audio. I’ve got one of those converters and setting up the audio capture looks a bit painful. Audio capture will be easier using the USB drivers. With a superspeed HDMI to USB, I think there will be good bandwidth.

lorforlinux · June 20, 2024, 4:05pm

In that case, I can ship another HDMI to USB to @Aryan_Nanda so that he can use his laptop to monitor the output.

Aryan_Nanda · June 24, 2024, 8:31pm

Minutes of Meeting(24-06-2024)

Attendees:

Key Points:

Provided updates on the model: completed dataset preprocessing and merged audio-visual features. Currently working on generating uniform features and converting them to np.array() format for model input.
Purchased a Google Colab Pro account and received reimbursement.
@jkridner explained the quantization process and shared this reference link.
Discussed the next steps for BeagleBone AI-64, starting with this guide.
He also shared this webinar link on BeagleBone AI-64.
Spent 30 minutes attempting to resolve the CI pipeline failure issue mentioned on Discord.

(All links pasted here for future reference)

Aryan_Nanda · June 27, 2024, 3:12pm

Week 3 Updates:

Completed dataset preprocessing, including padding and trimming to four different video sequence lengths.
Generated uniform features and converted them into ndarray.
Preprocessed the model by standardizing the data, performing a train-test split, and adding labels.
Shared the model training notebook with mentors.
Booted BeagleBone AI-64 on the monitor and ran demo ML models.
Upgraded to Google Colab Pro.
Added GitLab CI pipeline.

Blockers:

The CI pipeline is currently failing and throwing errors. I plan to resolve it by the weekend.

Next Week’s Goals:

Develop the model pipeline, starting with SVM and then moving to LSTMs.
Train the model.
Evaluate the model on test data and share the results with mentors.
Going throught the links provided in last meet and getting more familiar with BeagleBone AI-64.
Resolving the Ci pipeline issue
Week 2-3 Blog.

Important Commits:-

Aryan_Nanda · July 1, 2024, 5:09pm

Minutes of Meeting(01-07-2024)

Attendees:

Key Points:

I will attempt to SSH into the SRA lab GPU, which is an Nvidia GeForce RTX 4090 with 64 GB RAM. If unsuccessful, I will proceed with purchasing more compute units in Google Colab Pro account.
With one model ready, the mentors recommended starting inference on the BeagleBone AI-64. Therefore, I will begin the inferencing process in Week 5.(one week ahead of the proposed timeline! )
We discussed the dataset preprocessing and model pipeline process and discussed the results of the confusion matrix.

Aryan_Nanda · July 4, 2024, 12:45pm

Week 4 Updates:

Implemented unidirectional LSTMs, bidirectional LSTMs, CNNs, LSTM + CNN model.
Trained different models and evaluated them.
Best accuracy on test data is given by bidirectional LSTMs (97.1%).
Confusion Matrix of this model:

Screenshot 2024-07-04 at 5.43.30 PM1186×1050 48.1 KB
Bought 500 more compute units in Google Colab.
Created a Dockerfile and built an image that provides conda functionalities along with Cuda drivers and pushed it here. This image will be used as a base image in ci pipeline.
Also, resolved last week’s ci pipeline failure error by adding few steps before running the main script.

Blockers
TensorFlow is unable to use the GPU in the CI pipeline because CuDNN is missing. I will add CuDNN to the Docker image with a new tag.

Next Weeks goals

Generate model artifacts by referring to forum threads and Edge AI Tidl Tools.
Import the trained model to BeagleBone AI-64 by refering to Import Custom Models guide.
Running inferencing on BeagleBone AI-64.
Getting into details of Edge AI TIDL Tools.(TI Model Zoo, Quantization process etc.)
Train the transformers model.

Important Commits

Created Dockerfile
Implemented CNNs+LSTMs Model.
Implemented CNNs model .
Implemented Unidirectional LSTMs.
Implemented Bidirectional LSTMs.

Aryan_Nanda · July 4, 2024, 9:46pm

Week 2-3 Blog out - Dataset Preprocessing
(This was missed in the weekly updates)

Aryan_Nanda · July 8, 2024, 4:20pm

Minutes of Meeting(08-07-2024)

Attendees:

Key Points:

Hardware Requirement - Ethernet cable 5m long.
After training different models, the best test accuracy was given by bidirectional LSTMs model (97%) but TIDL doesn’t support compilation of LSTMs so I will rather start inferencing on BeagleBone AI-64 with the CNNs model which gave an accuracy of (94%).
Confusion matrix of the CNNs model:

Screenshot 2024-07-08 at 9.46.06 PM591×526 16 KB
I will start with the compilation process in which the DNN gets imported to TI’s internal exchange format. This stage also performs the quantization of the DNN. It also calls TI’s DNN graph compiler and prepares the execution plan of the DNN on the target SOC.

Screenshot 2024-07-08 at 9.47.27 PM1209×283 80.1 KB

Source of above image
Before this I will first convert the model into .tflite format

Aryan_Nanda · July 11, 2024, 2:44pm

Week 5 Updates:

Resolved Tensorflow not detecting Cuda error in ci pipeline by downloading Tensorflow with GPU dependencies(CuDNN).
Went through TI Deep Learning Product User Guide. Also learned about different quantization methods: Post training quantization and quantization aware training.
Trained a Transformers model(gave an accuracy of 94%). Wanted to check if it gives better accuracy then other models.
Exported the trained CNN model in .keras format.
Converted the Keras model into .tflite and did inferencing on .tflite model with test data.
Started compilation process.

Blockers:
none

Next Week Goals:

Converting the model into TI’s internal exchange format using TIDL-RT Import (translation) tool.

This involves:

Parsing the model to generate a model in uniform exchange format
optimizing which involves multiple graph level transformations and optimizations.
Quantization of floating point model to fixed point.
Graph compiler to plan execution order of layers.
Graph visualizer to create visualization of graph.
After compilation, several artifacts are generated and using these artifacts a DNN can be inferred on TI SOCs optimally

Aryan_Nanda · July 18, 2024, 3:31pm

Week 6 Updates:

Watched the Program an Edge AI “Hello World” Application Using Free Online Tools webinar video by @Srik_Gurrapu, which clarified the process from compilation to inferencing on edge AI devices with a practical example.
Experimented with sample notebooks on the TI Edge AI cloud to understand the compilation process.
Downloaded the test data for inferencing the model after compilation.
Learned that the test data can’t serve as calibration data, so I’ll download calibration data separately.
Week 4-5 Blog out - Link
Thanks to @Illia_Pikin for giving an example on model compilation here.

Blockers:

Faced initial errors setting up the branch 08_02_00_05 edgeai-tidl-tools repository but resolved them.

Next Week Goals:

Getting the compilation done.
Conducting inferencing using the compiled model on the BeagleBone AI-64.