BeagleMind

Hey @Aryan_Nanda ,

First off, I want to say how incredibly honored I am by your request for me to mentor you for the Google Summer of Code AI project. It truly means a lot that you would think of me for such an important role.

However, I have to respectfully decline. The reality is that the Blue Torpedo landed a direct hit on my company, and we’ve been fighting to recover for the last four years. I was supposed to retire a few years back, but all that BS turned into a nightmare that we’re still digging out of. Fortunately, things are finally looking up—our customer base is more optimistic, and we’re seeing an uptick in activity.

Right now, I have to stay fully focused on delivering our own products. It’s a classic case of “making hay while the sun is shining.” We have a narrow window of opportunity, and I can’t afford any distractions at this stage.

I truly appreciate the offer, and I have no doubt that you’ll find an amazing mentor who can guide you through this project. Wishing you the absolute best with it, and I’ll still be cheering for your success from the sidelines!

Stay in touch,
Fred

1 Like

Hello everyone,

I’m highly Interested in contributing to this project. My background includes Embedded Systems and Design, C programming, Python and ML / DL, and I’ve worked on projects Integrating AI with hardware on STM32-L476RG and was Finalist at Edge AI Innovation Challenge 2024, where I worked on AI/ML integration with Edge Devices… Additionally, I have strong experience in Backend Web Development using (Node.js, MongoDB, Express.js) and Build and Integrate APIs, authentication handling, rate limiting, and Cloud Computing (AWS).

From my understanding, this project focuses on fine-tuning an existing LLM (like Llama 3, Mixtral, or Gemma) with BeagleBoard-specific data to create an AI-powered assistant. I believe Llama 3.1 could be a great fit due to its efficiency, flexibility, and strong contextual understanding.

I’d love to contribute by:

  • Fine-tuning Llama 3.1 using LoRA/QLoRA for better efficiency and domain-specific responses.
  • Deploying the model on Hugging Face and optimizing it for local inference on BeagleBone AI-64.
  • Building a full-stack application with a Gradio-based web interface for interaction.
  • Integrating backend features like authentication, API rate limiting, and database storage for user interactions.
  • Leveraging cloud infrastructure (AWS) to optimize inference and ensure scalability

I’m excited to contribute and explore how we can enhance this project further. Looking forward to feedback and guidance!

Best regards,
Sahil Jaiswal

3 Likes

BeagleMind

We’ve successfully built BeagleMind, and it’s up and running! It’s a great solution for chatting with the BeagleBoard documentation using Qwen2.5-Instruct and RAG (Retrieval-Augmented Generation). Here’s how I set it up and why I think it’s a good solution:

Key Features

1. Ease of Use

  • The commands are super intuitive:
    • :h for help
    • :q to quit
    • :cl to clear the screen
    • And more!
  • It’s a breeze to navigate and interact with.

2. Flexibility

  • You can tweak the configuration (e.g., :conf =) or run it on CPU with --cpu-only.
  • It adapts to your needs and updates with the docs, unlike fine-tuning the model weights itself.

3. Performance

  • The first run takes a moment to embed the docs, but after that, it’s fast.
  • Want more context? Just adjust the k value in inference.py’s retrieve(query, k=2).

4. Community Fit

  • Tailored for the BeagleBoard community—whether you’re curious about BeagleBone, or anything in docs.beagleboard.io, it’s got you covered.

Tips

  • Be patient on the first run—it embeds the docs, but it’s worth the wait.
  • Check out the full command list (:h) to make the most of it.

Conclusion

This setup is elegant, effective, and exactly what we needed. Thanks for the push to get this built—it’s a helpful tool for anyone diving into BeagleBoard projects!

Can work on getting this hosted later!

GitHub Repository

Check out the project on GitHub: vovw/beagle-mind

4 Likes

This looks cool! Awesome job!

3 Likes

Hello @vovw,
First of all, I apologize for the late reply. I was caught up with a few task deadlines and a tight exam schedule. Secondly, you’ve done a great job. Building a working prototype during the contributors’ period demonstrates some real effort.

My feedback

I found it challenging to set up initially because there are no instructions on how to download the model. Additionally, there are some missing dependencies, such as faiss-cpu and sentence-transformers . It took some extra time to sort all this out. This can be corrected, so it’s not a big problem.

Moving on to test results:-

I tried testing the model on both Mac and Linux. On Mac, it just refused to give any result. The time taken to produce a single token was huge, and I’m not sure why. On Linux, things were decent, but the time to first token, latency, and throughput all need to be refined to a good level. Currently, a single prompt is taking 2-5 minutes to produce a result for me (16GB RAM, CPU only).

I tested the same prompt from @leoh but got the below result. Maybe you can explain it better why this happened.


Then, I tested a few more prompts:


This looks decent. I want to do more tests, but the latency is quite high.

What needs to be done:-

  • Obviously, running Qwen2.5-Instruct on BeagleBoards is not possible. So, you would need to host it on the cloud so that members of the community can use it in the easiest way possible and provide their feedback.

  • We would like to include more documentation in the model, so this is definitely not the final product.

  • We would also like to explore if fine-tuning combined with RAG is a better option than RAG alone in order to produce more accurate and contextually relevant responses.

  • Currently, if you want to do all this, you are very welcome to. However, we want to inform you that the final decision on who will be taking this project in GSoC will be made according to the official timeline mentioned here. But your commitment to the project is well noted.

1 Like

Hello @sahil7741,

You can start by creating a clear, well-defined proposal that outlines your detailed approach and explains why you are choosing this particular approach over other possible alternatives. Also, describe how you plan to involve the BeagleBoard community members for feedback and testing, along with a detailed timeline.

1 Like

Hey @Aryan_Nanda,
I have made this architecture for the RAG project, so check it out do let me know if this is the way you’re planning to roll out the project?

This architecture looks nice and shows you have a basic understanding of the project.
Detailing how you will perform each step will be the key here.

1 Like

Hi @Aryan_Nanda ,
I am Ashish Upadhyay, a second year undergraduate student at IIT Kanpur. I have made a basic implementation pipeline for this project. @Aryan_Nanda can you please correct me whether I am thinking in the correct direction or not?
Also can you please suggest on what should I work upon?

(1) Dataset Preparation:
Data Sources-Collect data from BeagleBoard forums, official docs, GitHub repos, Discord discussions, etc.
Scraping-Use BeautifulSoup/scrapy to extract text data.
Cleaning-Remove HTML tags, code blocks, redundant info.
Structuring-Convert data into Q&A pairs, tutorials, and troubleshooting dialogues.
Chunking-Split long documents into manageable, contextual segments.
Tokenization-Use the target LLM’s tokenizer to prepare the dataset.
Hugging Face Dataset-Convert into a datasets.Dataset object for training.

(2) Model Fine-tuning:
Model Choice-LLaMA 3
Domain Adaptation-Train on cleaned BeagleBoard documentation & community content.
Instruction Tuning-Format data into instruction-following Q&A style for chat-like responses.
PEFT Techniques-Use LoRA or QLoRA to reduce compute cost.
Evaluation Metrics-BLEU, ROUGE, METEOR, Exact Match, Latency
Frameworks-Transformers, PEFT, Accelerate, Datasets, Evaluate

(3) Model Deploy and Interaction:
Model Hosting-Deploy on Hugging Face Inference Endpoints for scalable inference.
Web Interface-Build a Gradio app with query input, response display, and optional chat history.
Cost Tracking-Estimate inference costs based on model size and usage patterns.
Performance Testing-Evaluate latency & throughput on Inference Endpoint.
Local Inference-Test quantized model (4-bit or int8).

Hey @Aryan_Nanda,

I’m done with the proposal draft please review it & let me know if any changes required.