Enhanced Media Experience with AI-Powered Commercial Detection and Replacement

Yes Sure!!
The Beaglebone-AI-64 features C66x Digital Signal Processor (DSP) cores, ideal for speeding up deep learning tasks like convolutions and matrix multiplications. Since our video classification model heavily relies on these operations, we’ll leverage these DSP cores to enhance performance.

2 Likes

Hello, @jkridner and @lorforlinux and BeagleBoard Community!!
I’ve made some improvements to my proposal.
Could you kindly review it and share your thoughts?

pdf version :- click here
rendered version:- click here

Your presentation is excellent, very good work on that.

@jkridner This person would be a good candidate for creating the docs that connect BB products to the end user.

2 Likes

Update → I have started learning about GStreamer in a structured way by referring to their documentation. I am currently on the basic tutorials. Last week, I started with learning about GObjects and then the basic GStreamer tutorials.

I Will try to complete it by Sunday and after that, I’ll move on to the Plugin Writer’s Guide.

2 Likes

Hello @jkridner and @lorforlinux,
Recently, I was exploring the “ResearchGate” site, and I came across this research paper Audio-Visual_CNN_using_Transfer_Learning_for_TV_Commercial_Break_Detection”.
They have used Pre-trained CNN models like MobileNetV2 to detect commercials at the shot level. They have extracted audio features with Mel-spectrogram representation and the visual features are picked from a video frame. But, their Model/Code is not open source.

Fusing Audio and video features and then doing classification on it was my first thought when I came across this problem of “Commercial Detection” but I didn’t find any reference for it so I didn’t consider it further.
In the proposal, the methods that I have included, focus more on video features and are not much focused on audio features due to a lack of references but now that I have a reference I can try it and according to their paper it gave them better results as compared to only video features.

I’ve reached out to the paper authors for access to their code and trained model. If they don’t respond promptly, I can attempt to implement the approach myself. Their model is specifically trained on Indonesian language channels. I will try to make an English version of it.

What are your views on this model and the approach they have used?

Should I update my proposal on the site with the addition of this approach as well?

Thank you @jkridner and @lorforlinux and other GSOC mentors for believing in me. This summer is going to be fun!

4 Likes

It sounds promising.

I wouldn’t get overly married to any particular model, but rather, focus on creating the infrastructure for training, validating and utilizing a model.

Is it clear what I mean by that?

Training

  • Sources of training data
  • Methodologies for annotating the training data
  • Annotation of the provenance and legal obligations of using any particular training data
  • Infrastructure for housing the data and feeding it to the training software
  • Execution infrastructure for the training itself
  • Collection of personally curated data based on representative usage (ie., how do I collect data on the videos that I watch?)

Validation

  • How do we automate the process of observing the model performance?

Utilization

  • How do we easily execute the trained model in a practical environment
  • Can we introduce reinforcement training?
  • Can we capture “faults” to introduce new training data sets?
1 Like

I will post the weekly progress report here.

Links to:-

@Aryan_Nanda Congrats on GSOC24!

I saw Jason suggested possibly using a USB HDMI capture dongle:

Is this your planned input device? If so, do you know what device you will acquire? I may want to get the same input device so I can follow the project.

Best regards,
FredE

1 Like

Thanks @FredEckert!

Yes, this is the planned input device. As @KumarAbhishek mentioned in Discord, I will get a USB-HDMI dongle.

I’m glad to hear that you’re following the project! :blush:

Link to Blog Website

1 Like

@mentors Please review my presentation slides at this Canva link and provide your feedback.

@lorforlinux, @jkridner, @KumarAbhishek, do you have any feedback on my presentation slides, or should I go ahead and start making the intro video?

@Aryan_Nanda I have added some comments.

Thank you for your input, @lorforlinux! I’ve made the necessary updates to the slides. Now, I’ll move on to creating the introduction video.

In general, I’d say you could use less text on some of the slides that are very full of text and just hit key words. You can speak the full amount and also provide notes in the video comments.

Also, I’d remind people that this is meant to live on after you have completed your GSoC task. I hope that what you create can be customized by anyone to suit their needs and can be generally improved through community contributions.

1 Like

Alright!! Thanks for the feedback.