Embedded differentiable logic gate networks for real-time interactive and creative applications

jarm · March 13, 2024, 5:46pm

Goal:

Develop an embedded machine learning system on BeagleBoard that leverages Differentiable Logic (DiffLogic) for real-time interactive music creation and environment sensing. The system will enable on-device learning, fine-tuning, and efficient processing for applications in new interfaces for musical expression (see http://nime.org).

Hardware Skills:

Audio and sensor IO with http://bela.io

Software Skills:

Machine learning, deep learning, BB PRU programming

Possible Mentors:

@jarm, Chris Kiefer (see bios and links below)

Expected size of project:

350 hour

Rating:

medium

Upstream Repository:

Early experiments using DiffLogic on Bela: GitHub - jarmitage/DiffLogicBela: Differentiable logic gate networks on Bela Platform

See also examples of compiled models deployed on RP2040: uSEQ/uSEQ at main · lnfiniteMonkeys/uSEQ · GitHub

References:

Deep Differentiable Logic Gate Networks [2210.08277] Deep Differentiable Logic Gate Networks
Embedded AI for NIME Workshop 2022 https://embedded-ai-for-nime.github.io/
Agential Instrument Design Workshop AIMC 2023 Agential Instruments Design Workshop · AIMC 2023

Project Overview

This project seeks to explore the potential of embedded AI, specifically using Differentiable Logic (DiffLogic) for NIME, by creating a system that can perform tasks like machine listening, sensor processing, sound and gesture classification, and generative AI. The focus will be on leveraging the BeagleBoard’s hardware, specifically its PRU, for fast, efficient processing of tasks like FFT and MFCC calculations for real-time interactive applications.

Objectives

Develop a framework for embedding DiffLogic models on BeagleBoard for real-time audio, sensing and other creative AI tasks.
Create examples of machine listening, using the PRU for fast FFT and MFCC processing.
Implement sensor processing, sound and gesture classification algorithms using embedded AI.
Explore other possible applications and make interactive demos to share with the community (see Usage Scenarios).

Possible Usage Scenarios

Learning or fine-tuning on-device
Pre-process training on Beagle’s PRU / PIO
Machine listening, Sound classification, Gesture classification
Sound-gesture mapping via interactive machine learning (see Wekinator)
Generative AI, logic-based VAE (How to represent continuous numbers)
Integer randomness - parameterised integer distributions
Computer vision - QR code hacking
Neural cellular automatas Growing Neural Cellular Automata

Expected Outcomes

GitHub repo
Documentation in the form of Python notebooks, Bela examples, and more
Demos of real-world usage in musical and artistic contexts of your choosing
Possibly a conference paper submission depending on progress (see our last GSOC project publication here: Bela-IREE: An Approach to Embedded Machine Learning for Real-Time Music Interaction · AIMC 2023)

Potential Challenges

PRU programming
Ensuring models are efficient enough for real-time performance on embedded systems (see Real-time audio programming 101: time waits for nothing)
Developing intuitive interfaces for interacting with AI-driven musical and sensing applications.

Community and Educational Benefits

This project will contribute to the NIME and embedded AI communities by providing a novel approach to integrating advanced AI models into interactive music and environmental sensing applications, expanding the possibilities for artists, musicians, and developers.

Mentorship and Collaboration

We welcome additional mentors from the community who are interested in this project!

Mentor backgrounds:

Dr Jack Armitage (jack@lhi.is), Intelligent Instruments Lab, University of Iceland: I am a postdoctoral research fellow at the Intelligent Instruments Lab. I have a doctorate in Media and Arts Technologies from Queen Mary University of London, where I studied in Prof. Andrew McPherson’s Augmented Instruments Lab. During my PhD I was a Visiting Scholar at Georgia Tech under Prof. Jason Freeman. Before then, I was a Research Engineer at ROLI after graduating with a BSc in Music, Multimedia & Electronics from the University of Leeds. My research interests include embodied interaction, craft practice and design cognition. I also produce, perform and live code music as Lil Data, as part of the PC Music record label. Website: https://jackarmitage.com, Code: jarmitage (Jack Armitage) · GitHub.
Dr Chris Kiefer (C.Kiefer@sussex.ac.uk), EMUTE Lab, University of Sussex: I am a musician and musical instrument designer, specialising in musician-computer interaction, physical computing, and machine learning. Currently my research focuses on feedback (or multistable) musicianship, machine learning and livecoding. I co-ran the AHRC Feedback Musicianship Network, and I was recently awarded an AHRC Fellowship grant, Musically Embodied Machine Learning. I’m also involved in projects on robot music theatre and ecoacoustics. My main approach to research is to learn from building and performing with new musical instruments, both through my own experience, and through engaging in participatory design projects with others. Recent instruments I have developed include The Nalima, a membrane-based feedback instrument, and the Feedback Cello, a hacked cello based on the halldorophone. I’m also very interested in analogue/hybrid sound synthesis, and have been developing a eurorack module for livecoding, uSEQ. Underpinning my research is a focus on complex and dynamical systems, and signal processing with machine learning. Website: http://luuma.net, Code: chriskiefer (Chris Kiefer) · GitHub.

luuma · March 16, 2024, 12:24am

Please let us know any questions you have about this project. We’ve had some exciting results already doing sound classification with the bela and drum sequence prediction on RP2040, and looking forward to exploring these networks further

mclemcrew · March 19, 2024, 3:23am

If y’all have some time, I would definitely appreciate any feedback for my proposal based off these ideas. Thank you for your time!

jarm · March 19, 2024, 8:34am

Thanks for sharing this proposal and your enthusiasm for this idea. It’s interesting to see the inspirational projects behind it.

To strengthen the proposal, consider diving deeper into the technical specifics of your implementation strategy. A number of possible techniques to explore are named, but it’s not clear to me how you imagine these all fitting together, and I’m already left wondering if it sounds too complex for a summer project! Demonstrating a clear understanding of the techniques involved, or if direct experience is limited, outlining a plan to acquire the necessary skills would be beneficial.

Regarding the idea itself, while the vocal fingerprinting concept is intriguing in some ways, the proposal doesn’t really get into exploring novel interactions that would appeal to the NIME community. There are a number of projects in the space that explore vocal-based interaction, that could be worth reviewing for inspiration. I would recommend reviewing the work of Dan Stowell and Courtney Reed to get started.

mclemcrew · March 19, 2024, 3:39pm

Thanks for the actionable feedback on the proposal and for gathering some resources for me review! Helps a ton with how to better target the proposal and proceed from here. Will update the thread with a more detailed/specific proposal after working on it for the next few days

mclemcrew · March 25, 2024, 4:53am

Thanks again for the feedback! I’ve updated my project proposal quite a bit after reviewing some more literature and trying to come up with a reasonable amount of work for the timeline.

Would love to hear your feedback if you get a chance. Thanks for all the help with this!

mattd · March 27, 2024, 5:20pm

Hi All, I’ve created an initial draft of a proposal based around this project idea.

The proposal is centred around using DiffLogic for acoustic sensing with Bela – where tactile interaction can be sensed by detecting changes in an object’s resonances.

Please let me know if you have any feedback or additions things to consider. Thanks very much!

mclemcrew · March 27, 2024, 6:16pm

This is rad idea and I love the full pipeline build for this! Would be 10/10 for community!

I’m interested in:

An example project consisting of a digital musicial instrument design that maps changes in the user’s grip and hand positions (using acoustic sensing) to timbral characteristic of a synthesis algorithm

Do you have any references/examples of this just for me to understand this a bit more. We talking neural audio model style or something else?

No. 3 here is where you can finish the merge request.

Rad project idea and sounds like you have awesome experience/support for this type of work! Best of luck and would love to read over some more references for that example project if you have time! Thanks!

jarm · March 29, 2024, 9:19am

This is an excellent proposal, Matt!

The technical approach is sound so I don’t have much to add there. What I would encourage you to consider more perhaps is the artistic side of the project. While of course this project would mostly involve technical development, there is a risk (as with any NIME project) of the technical capabilities misaligning with creative needs and interests, and this only being discovered after much of the development period has passed. And as I’m sure you are aware from Andrew’s current work, engineering decisions propagate through to cultural outcomes in unexpected ways, with unintentional secondary impacts. For example, you mention the sine sweep approach - would this be audibly distracting for an end-user? Or if you’re only considering ultrasonic sweeps - what are the technical and creative implications?

There are (at least) two ways you could mitigate this effect:

For yourself, spend a little more time thinking through the design of a NIME demo project that would make use of DiffLogic and note down what the potentially problematic technical aspects might be, along with the interesting creative constraints. To make this as concrete as possible, formulate a creative context around the demo (even if fictitious), which could be a short improvised performance (this is what I did with the creative briefs in my NIME paper that you reference).
Consider ways of introducing creative constraints earlier and throughout the project. You could start building aspects of the demo project earlier on without using DiffLogic, a simulation of sorts. You could also collaborate with a user/artist/musician through discussions and design sketching.

jarm · March 29, 2024, 9:27am

Great work! This proposal now reads as being much more focused and realistic, although still with some very interesting technical challenges to address. So far, @luuma and I have only explored few-shot classification on one-dimensional signals, so I would be extremely curious about how DiffLogic performs in the scenario you describe.

Similarly to Matt, I don’t have that much to add since your proposal is also now looking comprehensive and well-referenced. With the time remaining, you could start trying to proof-of-concept simplified versions of your ideas in code using DiffLogic, to attain a clearer sense of the technical difficulties that lie ahead.

ijc · April 1, 2024, 2:09pm

Hello! Here’s another proposal for the pile. I realize that it’s getting close to the deadline, so I’ll appreciate whatever feedback you can offer. Thanks!