BeagleMind

foxsquirrel · February 13, 2025, 1:23pm

First off, I want to say how incredibly honored I am by your request for me to mentor you for the Google Summer of Code AI project. It truly means a lot that you would think of me for such an important role.

However, I have to respectfully decline. The reality is that the Blue Torpedo landed a direct hit on my company, and we’ve been fighting to recover for the last four years. I was supposed to retire a few years back, but all that BS turned into a nightmare that we’re still digging out of. Fortunately, things are finally looking up—our customer base is more optimistic, and we’re seeing an uptick in activity.

Right now, I have to stay fully focused on delivering our own products. It’s a classic case of “making hay while the sun is shining.” We have a narrow window of opportunity, and I can’t afford any distractions at this stage.

I truly appreciate the offer, and I have no doubt that you’ll find an amazing mentor who can guide you through this project. Wishing you the absolute best with it, and I’ll still be cheering for your success from the sidelines!

Stay in touch,
Fred

sahil7741 · February 16, 2025, 9:11am

Hello everyone,

I’m highly Interested in contributing to this project. My background includes Embedded Systems and Design, C programming, Python and ML / DL, and I’ve worked on projects Integrating AI with hardware on STM32-L476RG and was Finalist at Edge AI Innovation Challenge 2024, where I worked on AI/ML integration with Edge Devices… Additionally, I have strong experience in Backend Web Development using (Node.js, MongoDB, Express.js) and Build and Integrate APIs, authentication handling, rate limiting, and Cloud Computing (AWS).

From my understanding, this project focuses on fine-tuning an existing LLM (like Llama 3, Mixtral, or Gemma) with BeagleBoard-specific data to create an AI-powered assistant. I believe Llama 3.1 could be a great fit due to its efficiency, flexibility, and strong contextual understanding.

I’d love to contribute by:

Fine-tuning Llama 3.1 using LoRA/QLoRA for better efficiency and domain-specific responses.
Deploying the model on Hugging Face and optimizing it for local inference on BeagleBone AI-64.
Building a full-stack application with a Gradio-based web interface for interaction.
Integrating backend features like authentication, API rate limiting, and database storage for user interactions.
Leveraging cloud infrastructure (AWS) to optimize inference and ensure scalability

I’m excited to contribute and explore how we can enhance this project further. Looking forward to feedback and guidance!

Best regards,
Sahil Jaiswal

vovw · February 19, 2025, 3:12pm

BeagleMind

We’ve successfully built BeagleMind, and it’s up and running! It’s a great solution for chatting with the BeagleBoard documentation using Qwen2.5-Instruct and RAG (Retrieval-Augmented Generation). Here’s how I set it up and why I think it’s a good solution:

Key Features

1. Ease of Use

The commands are super intuitive:
- :h for help
- :q to quit
- :cl to clear the screen
- And more!
It’s a breeze to navigate and interact with.

2. Flexibility

You can tweak the configuration (e.g., :conf =) or run it on CPU with --cpu-only.
It adapts to your needs and updates with the docs, unlike fine-tuning the model weights itself.

3. Performance

The first run takes a moment to embed the docs, but after that, it’s fast.
Want more context? Just adjust the k value in inference.py’s retrieve(query, k=2).

4. Community Fit

Tailored for the BeagleBoard community—whether you’re curious about BeagleBone, or anything in docs.beagleboard.io, it’s got you covered.

Tips

Be patient on the first run—it embeds the docs, but it’s worth the wait.
Check out the full command list (:h) to make the most of it.

Conclusion

This setup is elegant, effective, and exactly what we needed. Thanks for the push to get this built—it’s a helpful tool for anyone diving into BeagleBoard projects!

Can work on getting this hosted later!

GitHub Repository

Check out the project on GitHub: vovw/beagle-mind

leoh · February 19, 2025, 3:13pm

This looks cool! Awesome job!

Aryan_Nanda · March 8, 2025, 10:32am

Hello @vovw,
First of all, I apologize for the late reply. I was caught up with a few task deadlines and a tight exam schedule. Secondly, you’ve done a great job. Building a working prototype during the contributors’ period demonstrates some real effort.

My feedback

I found it challenging to set up initially because there are no instructions on how to download the model. Additionally, there are some missing dependencies, such as faiss-cpu and sentence-transformers . It took some extra time to sort all this out. This can be corrected, so it’s not a big problem.

Moving on to test results:-

I tried testing the model on both Mac and Linux. On Mac, it just refused to give any result. The time taken to produce a single token was huge, and I’m not sure why. On Linux, things were decent, but the time to first token, latency, and throughput all need to be refined to a good level. Currently, a single prompt is taking 2-5 minutes to produce a result for me (16GB RAM, CPU only).

I tested the same prompt from @leoh but got the below result. Maybe you can explain it better why this happened.

Then, I tested a few more prompts:

This looks decent. I want to do more tests, but the latency is quite high.

What needs to be done:-

Obviously, running Qwen2.5-Instruct on BeagleBoards is not possible. So, you would need to host it on the cloud so that members of the community can use it in the easiest way possible and provide their feedback.
We would like to include more documentation in the model, so this is definitely not the final product.
We would also like to explore if fine-tuning combined with RAG is a better option than RAG alone in order to produce more accurate and contextually relevant responses.
Currently, if you want to do all this, you are very welcome to. However, we want to inform you that the final decision on who will be taking this project in GSoC will be made according to the official timeline mentioned here. But your commitment to the project is well noted.

Aryan_Nanda · March 8, 2025, 10:38am

Hello @sahil7741,

You can start by creating a clear, well-defined proposal that outlines your detailed approach and explains why you are choosing this particular approach over other possible alternatives. Also, describe how you plan to involve the BeagleBoard community members for feedback and testing, along with a detailed timeline.

ketanthorat.ai · March 11, 2025, 8:56pm

Hey @Aryan_Nanda,
I have made this architecture for the RAG project, so check it out do let me know if this is the way you’re planning to roll out the project?

Aryan_Nanda · March 12, 2025, 6:33pm

This architecture looks nice and shows you have a basic understanding of the project.
Detailing how you will perform each step will be the key here.

Ashish_Upadhyay · March 16, 2025, 8:35pm

Hi @Aryan_Nanda ,
I am Ashish Upadhyay, a second year undergraduate student at IIT Kanpur. I have made a basic implementation pipeline for this project. @Aryan_Nanda can you please correct me whether I am thinking in the correct direction or not?
Also can you please suggest on what should I work upon?

(1) Dataset Preparation:
Data Sources-Collect data from BeagleBoard forums, official docs, GitHub repos, Discord discussions, etc.
Scraping-Use BeautifulSoup/scrapy to extract text data.
Cleaning-Remove HTML tags, code blocks, redundant info.
Structuring-Convert data into Q&A pairs, tutorials, and troubleshooting dialogues.
Chunking-Split long documents into manageable, contextual segments.
Tokenization-Use the target LLM’s tokenizer to prepare the dataset.
Hugging Face Dataset-Convert into a datasets.Dataset object for training.

(2) Model Fine-tuning:
Model Choice-LLaMA 3
Domain Adaptation-Train on cleaned BeagleBoard documentation & community content.
Instruction Tuning-Format data into instruction-following Q&A style for chat-like responses.
PEFT Techniques-Use LoRA or QLoRA to reduce compute cost.
Evaluation Metrics-BLEU, ROUGE, METEOR, Exact Match, Latency
Frameworks-Transformers, PEFT, Accelerate, Datasets, Evaluate

(3) Model Deploy and Interaction:
Model Hosting-Deploy on Hugging Face Inference Endpoints for scalable inference.
Web Interface-Build a Gradio app with query input, response display, and optional chat history.
Cost Tracking-Estimate inference costs based on model size and usage patterns.
Performance Testing-Evaluate latency & throughput on Inference Endpoint.
Local Inference-Test quantized model (4-bit or int8).

ketanthorat.ai · March 16, 2025, 9:24pm

Hey @Aryan_Nanda,

I’m done with the proposal draft please review it & let me know if any changes required.

FAYEZ_ZOUARI · March 20, 2025, 6:43am

Hi everyone! I’m Fayez, and I’m interested in participating in GSoC 2025, particularly in the project “A Conversational AI Assistant for BeagleBoard using RAG and Fine-tuning.”

To better understand the project, I built a small demo where I:

Scraped the documentation repository from OpenBeagle (docs.beagleboard.io) using gitingest
Embedded the data using OpenAI’s ADA model
Stored the embeddings in ChromaDB vector database
Used LLaMA 3.3 (via the Groq Cloud API for fast inference) to create a chatbot that guides users through BeagleBoard documentation

The chatbot is still a work in progress. I’m considering ingesting more OpenBeagle repositories so the assistant can become more familiar with the different BeagleBoard boards, improving both its accuracy and troubleshooting capabilities.

Feel free to check it out:

Chatbot Demo: https://bb-gsoc.fayez-zouari.tn
Hugging Face Repo: fayezzouari/BeagleBoard-Chat at main

Looking ahead, I plan to:

Expand the vector DB further
Implement advanced RAG techniques (along with RAG evaluation methods)
Fine-tune LLMs using approaches like LoRA, QLoRA, and SFT for improved performance and efficiency

Would love to hear your thoughts, suggestions, or feedback.
Thanks, and looking forward to your insights!
@Aryan_Nanda

Aryan_Nanda · March 23, 2025, 12:48pm

This is nice! You can also add RAG here to provide more context. You can submit a merge request with a detailed proposal here

Aryan_Nanda · March 23, 2025, 12:59pm

Your proposal is good, and the timeline is well-structured. Do more research and add details about the methods you will be using. Adding a workflow diagram will also be helpful

Aryan_Nanda · March 23, 2025, 1:03pm

I am not getting any good answers, @FAYEZ_ZOUARI

Please look into this and also provide metrics on how this performs better than an LLM without RAG."

foxsquirrel · March 23, 2025, 2:47pm

Setting the context is very critical, maybe some type of “preprocessor” to clean it up and refine or possibly open the scope like in this example.

Please find the TOPs of beaglebone ai-64, provide an evaluation of other SBC in the same class

The BeagleBone® AI-64 is equipped with a Deep Learning Matrix Multiply Accelerator (MMA) capable of delivering up to 8 TOPS (Tera Operations Per Second) at 1.0 GHz.

When evaluating other single-board computers (SBCs) in the same class, consider the following options:

1. NVIDIA Jetson Xavier NX

AI Performance: Up to 21 TOPS
Processor: 6-core NVIDIA Carmel ARMv8.2 64-bit CPU
GPU: 384-core NVIDIA Volta GPU with 48 Tensor Cores
Memory: 8 GB LPDDR4x
Description: The Jetson Xavier NX is designed for edge computing and AI workloads, offering high performance in a compact form factor.

2. Google Coral Dev Board

AI Performance: 4 TOPS
Processor: NXP i.MX 8M SoC
Edge TPU: Google’s Edge TPU coprocessor
Memory: 1 GB LPDDR4
Description: Tailored for machine learning applications, the Coral Dev Board integrates Google’s Edge TPU, facilitating efficient inferencing for AI models.

3. Orange Pi AIpro (20T)

AI Performance: Up to 20 TOPS
Processor: 4-core 64-bit processor combined with an AI processor
Memory: Options of 12 GB or 24 GB LPDDR4X
Description: Developed in collaboration with Huawei, the Orange Pi AIpro is designed for various AI applications, including robotics and edge computing.

4. ASUS Tinker Edge T

AI Performance: 4 TOPS
Processor: NXP i.MX 8M Cortex-A53
Edge TPU: Integrated Google Edge TPU
Memory: 1 GB LPDDR4
Description: The Tinker Edge T is optimized for AI applications, leveraging the integrated Edge TPU for accelerated machine learning tasks.

5. Raspberry Pi 4 with Google Coral USB Accelerator

AI Performance: 4 TOPS (with USB Accelerator)
Processor: Broadcom BCM2711, Quad-core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz
Memory: Options of 2 GB, 4 GB, or 8 GB LPDDR4-3200 SDRAM
Description: By connecting the Coral USB Accelerator, the Raspberry Pi 4 can efficiently run AI models, making it a flexible and cost-effective solution for AI projects.

Each of these SBCs offers unique features and performance levels. When selecting an SBC for AI applications, consider factors such as AI performance requirements, power consumption, community support, and compatibility with your specific use case.

FAYEZ_ZOUARI · March 24, 2025, 3:06am

Thank you for checking! Currently, the chatbot’s knowledge is limited to the documentation available in the OpenBeagle repository. However, I’m actively working on enhancing its performance by adding information from a broader range of resources. While it can already answer the questions you provided in the screenshot, (I made some adjustments and embedded more data into the vectordb), further improvements will make it even more effective. For testing purposes, we can focus on queries that are directly relevant to the existing documentation to ensure accuracy and reliability.

Ashish_Upadhyay · April 4, 2025, 10:36am

I have opened a PR #68.
Can you please review it?

FAYEZ_ZOUARI · April 4, 2025, 6:06pm

Hi @Aryan_Nanda ,
I wanted to update you on my progress, I’ve finished my proposal for the project, and it’s getting really exciting! Recently, I worked on a CLI that features a RAG chat with memory and troubleshooting advice, along with a log file addition. This was more of a proof of concept to demonstrate how the implementation could work.

For now, I used an LLM from a provider, but my next major step is fine-tuning an LLM, which is the most crucial part of the project. The goal is to host the model on HF Inference Endpoints. I’d love to share the architecture and diagrams outlining the approach I’m planning to take.

As a starting point, I’m considering the standard approaches using PEFT and Low-Rank Adaptation for fine-tuning. In this example, the setup requires Jina and Groq API keys, which are available on their platforms.

I’d really appreciate your feedback on my updated proposal. I took into account the remarks mentioned in the Discord server and focused on highlighting key points that align with the project objectives.

Looking forward to hearing your thoughts!

GitHub Repo: GitHub - fayezzouari/beaglemind-cli
Proposal Link: A Conversational AI Assistant for BeagleBoard using RAG and Fine-tuning - Fayez Zouari — gsoc.beagleboard.io documentation

Aryan_Nanda · April 5, 2025, 5:31pm

@FAYEZ_ZOUARI, please submit a merge request of your proposal.

Aryan_Nanda · April 5, 2025, 8:11pm

If you could add a few example runs in the README, that would be helpful.