Enhanced Media Experience with AI-Powered Commercial Detection and Replacement

Aryan_Nanda · March 22, 2024, 8:32am

Yes Sure!!
The Beaglebone-AI-64 features C66x Digital Signal Processor (DSP) cores, ideal for speeding up deep learning tasks like convolutions and matrix multiplications. Since our video classification model heavily relies on these operations, we’ll leverage these DSP cores to enhance performance.

Aryan_Nanda · March 30, 2024, 9:31am

Hello, @jkridner and @lorforlinux and BeagleBoard Community!!
I’ve made some improvements to my proposal.
Could you kindly review it and share your thoughts?

pdf version :- click here
rendered version:- click here

foxsquirrel · March 30, 2024, 11:51am

Your presentation is excellent, very good work on that.

@jkridner This person would be a good candidate for creating the docs that connect BB products to the end user.

Aryan_Nanda · April 11, 2024, 6:36pm

Update → I have started learning about GStreamer in a structured way by referring to their documentation. I am currently on the basic tutorials. Last week, I started with learning about GObjects and then the basic GStreamer tutorials.

I Will try to complete it by Sunday and after that, I’ll move on to the Plugin Writer’s Guide.

Aryan_Nanda · April 20, 2024, 12:35pm

Hello @jkridner and @lorforlinux,
Recently, I was exploring the “ResearchGate” site, and I came across this research paper “Audio-Visual_CNN_using_Transfer_Learning_for_TV_Commercial_Break_Detection”.
They have used Pre-trained CNN models like MobileNetV2 to detect commercials at the shot level. They have extracted audio features with Mel-spectrogram representation and the visual features are picked from a video frame. But, their Model/Code is not open source.

Fusing Audio and video features and then doing classification on it was my first thought when I came across this problem of “Commercial Detection” but I didn’t find any reference for it so I didn’t consider it further.
In the proposal, the methods that I have included, focus more on video features and are not much focused on audio features due to a lack of references but now that I have a reference I can try it and according to their paper it gave them better results as compared to only video features.

I’ve reached out to the paper authors for access to their code and trained model. If they don’t respond promptly, I can attempt to implement the approach myself. Their model is specifically trained on Indonesian language channels. I will try to make an English version of it.

What are your views on this model and the approach they have used?

Should I update my proposal on the site with the addition of this approach as well?

Aryan_Nanda · May 2, 2024, 4:18am

Thank you @jkridner and @lorforlinux and other GSOC mentors for believing in me. This summer is going to be fun!

jkridner · May 10, 2024, 2:10pm

It sounds promising.

I wouldn’t get overly married to any particular model, but rather, focus on creating the infrastructure for training, validating and utilizing a model.

Is it clear what I mean by that?

Training

Sources of training data
Methodologies for annotating the training data
Annotation of the provenance and legal obligations of using any particular training data
Infrastructure for housing the data and feeding it to the training software
Execution infrastructure for the training itself
Collection of personally curated data based on representative usage (ie., how do I collect data on the videos that I watch?)

Validation

How do we automate the process of observing the model performance?

Utilization

How do we easily execute the trained model in a practical environment
Can we introduce reinforcement training?
Can we capture “faults” to introduce new training data sets?

Aryan_Nanda · May 16, 2024, 3:47pm

I will post the weekly progress report here.

Aryan_Nanda · May 16, 2024, 8:46pm

Links to:-

Aryan_Nanda · May 16, 2024, 8:56pm

Issue for tracking Hardware Requirement and Availability

FredEckert · May 16, 2024, 11:58pm

@Aryan_Nanda Congrats on GSOC24!

I saw Jason suggested possibly using a USB HDMI capture dongle:

Is this your planned input device? If so, do you know what device you will acquire? I may want to get the same input device so I can follow the project.

Best regards,
FredE

Aryan_Nanda · May 17, 2024, 7:19am

Thanks @FredEckert!

Yes, this is the planned input device. As @KumarAbhishek mentioned in Discord, I will get a USB-HDMI dongle.

Aryan_Nanda · May 17, 2024, 7:22am

I’m glad to hear that you’re following the project!

Aryan_Nanda · May 18, 2024, 6:38am

Link to Blog Website

Aryan_Nanda · May 23, 2024, 4:37pm

@mentors Please review my presentation slides at this Canva link and provide your feedback.

Aryan_Nanda · May 25, 2024, 3:06pm

@lorforlinux, @jkridner, @KumarAbhishek, do you have any feedback on my presentation slides, or should I go ahead and start making the intro video?

lorforlinux · May 25, 2024, 9:47pm

@Aryan_Nanda I have added some comments.

Aryan_Nanda · May 28, 2024, 2:18pm

Thank you for your input, @lorforlinux! I’ve made the necessary updates to the slides. Now, I’ll move on to creating the introduction video.

jkridner · May 28, 2024, 4:33pm

In general, I’d say you could use less text on some of the slides that are very full of text and just hit key words. You can speak the full amount and also provide notes in the video comments.

Also, I’d remind people that this is meant to live on after you have completed your GSoC task. I hope that what you create can be customized by anyone to suit their needs and can be generally improved through community contributions.

Aryan_Nanda · May 28, 2024, 6:01pm

Alright!! Thanks for the feedback.