Beaglemind, Conversational AI Assistant for BeagleBoard documentations

Hi, I will be posting my weekly updates in this thread, maybe in less than a week, for my GSoC project, Beaglemind,

Thank you for this opportunity!

I am currently working on enhancing and concretizing my diagrams, searching for advanced ways to scrape and collect data, especially to make my dataset take into consideration the metadata of the information, fetching the benchmarks to choose the the best LLM, vectorstore and finetuning techniques for our case, and I will finally start preparing for the introductory video, and a proof of concept for the chatbot.

See you soon!

1 Like

Hi everyone,

I hope you’re all doing well. Here’s a brief update on my progress for Week 0 of the BeagleMind project:

Model Evaluation:

I’ve assessed several language models for our project’s needs (still working on the model selection), and these ones showed most potential for the moment in terms of weight and performance:

  • Mistral-Small-3.1-24B-Instruct-2503-GGUF
  • Phi-4 (14B)
  • Qwen2.5-Coder-7B
  • Qwen3 30B-A3B

Fine-Tuning Environment:

I’m exploring Google Colab Pro/Pro+ for initial fine-tuning, leveraging powerful GPUs like the A100. Additionally, I’m considering using Unsloth to enable fast and memory-efficient fine-tuning on consumer hardware. For deployment, dedicated virtual machines (e.g., Hugging Face Inference Endpoints) will be reserved to ensure consistent uptime and performance.

Data Collection Pipeline:

I’ve initiated data collection from the BeagleBoard GitHub Organization, focusing on:

  • Cloning repositories and extracting text from code, markdown files, and PDFs.
  • Gathering information from community blogs, datasheets, hardware files, and forums.

The collected data will include metadata from these documents to enrich our dataset and also enhance response quality.

For more detailed information, please refer to the full document: Week 0 Updates

I’m looking forward to any feedback or suggestions you might have.

I’m updating the doc daily, so stay tuned for more updates!

Best regards,

2 Likes

Hey Fayez,

Thanks for sharing the update.
looking forward to the weekly updates on the project!
Good work so far, keep it up :100:

Regards,
Deepak Khatri

1 Like

Hello everyone, sorry for the delayed update!

This week, I focused on preparing the demo and the presentation for the introductory video. Kumar suggested a really cool feature for the CLI tool, code generation based on the documentation, and I decided to explore it as a proof of concept in the video. I tested it by generating a simple blinker using both shell commands and Python scripts.

I’m currently waiting for the video review, and once it’s approved, I’ll post it in the forum.

In the meantime, I’ve completed the Milestone 0 Updates document, you can check it out here:
:page_facing_up: Milestone 0 Updates

I’ve moved the diagrams task to the next milestone. I’ll also be creating a new document titled Milestone 1 Updates, which I’ll begin updating shortly.

You can check out the CLI repo here: https://github.com/fayezzouari/beaglemind-cli

Best regards,

1 Like

@FAYEZ_ZOUARI
If you maintain a single document for all milestones, with the latest updates listed first and tracked by week/day, it would be better than having separate documents for each milestone.

2 Likes

Alright will take that into account!

Hello everyone,

This is the official introductory video for my project:
https://youtu.be/pC97HKFRKUI

Hope you enjoy it ^^

Best regards,

1 Like

Hi, I’ll be sharing the notes from yestderday’s meeting with @Aryan_Nanda @KumarAbhishek

Discussion Points:

  • Benchmark various LLMs to evaluate their performance in a RAG setup.
  • Test the system with both relevant and irrelevant questions to assess reliability and robustness.
  • Explore LLM distillation methods to create a lightweight model suitable for deployment on BeagleBoard hardware.
  • Define evaluation criteria for the system, including relevance, accuracy, and latency.
  • Specify a format for training data and a strategy for storing metadata to improve retrieval efficiency.
  • Investigate hosted vector database solutions, specifically Milvus, for embedding storage and retrieval.

Updated Objectives:

  • Benchmark the current RAG system using a wide variety of questions.
  • Assess the reliability of responses based on documentation relevance.
  • If the system proves unreliable, consider fine-tuning the base model with domain-specific data.
  • If results are satisfactory, proceed with MVP development and prepare for deployment.
  • Create a repository on OpenBeagle.
  • Investigate the possibility to run LLM inference locally on a BeagleY-AI.
1 Like

@FAYEZ_ZOUARI, Please post your week 2 updates here when you get time.

Week 2 Updates

This week, I focused on completing a fully functional RAG system. The chatbot is now able to retrieve relevant information based on user queries. A reranking model is used to select the top 10 most relevant chunks (I’m considering reducing this to 5 and lowering the context window for better performance). Metadata is displayed correctly (links and images), and I’ve also started working on a repository-specific chatbot for beagley-ai.

I created a GitHub repository and linked it with GitLab repository with a simple GitHub Action so they both get updated the same time. Since OpenBeagle is down I guess, I’ll be updating these repositories.

I’ll update the document shortly with screenshots from the chatbot responses. In the meantime, I’d like to include your suggested prompts inside this document: BeagleMind Prompts.

Blockers

  • The reranking model’s results aren’t always accurate, so I’ve been experimenting with different open source models.
  • Setting up the metadata display was a bit challenging, especially formatting the LLM output using Markdown and raw GitHub links.
  • For the repository specific chatbot, (I’ll mention in the updates document how does that look like), it still doesnt deliver very accurate response because most of it is images only and there’s not much text for the chatbot to learn from, so I’m figuring out other ways to fix this issue,

I’ll keep you updated.

Next Steps

My goal is to finalize the reranking model and ensure the chatbot is fully functional, with improved metadata display. I’ll also be investigating why the beagley-ai specific chatbot isn’t performing as expected. Finally, I plan to begin integrating a knowledge graph into the architecture.
I’ll update the updates document shortly so you can see the results.

1 Like

Hello Fayez,
Please follow @jkridner guidelines mentioned on discord for this.

Week 3 Updates

Accomplishments

  • Finalized and optimized the core RAG (Retrieval-Augmented Generation) system.
  • Enhanced the chunking strategy to improve retrieval accuracy.
  • Improved metadata handling to support the display of links or images as references.
  • Conducted testing to ensure system stability and performance.
  • Started evaluating the Qwen3:1.7B model for query response accuracy.
  • Created a thread to collect a variety of prompts for broader chatbot testing.

Blockers

  • Uncertainty about whether the BeagleBoard edge AI framework supports running the Qwen3:1.7B LLM architecture.

Next Week Objectives

  • Save the vectorstore collection online
  • Dockerize the project
  • Start developing the CLI and web app interfaces for the chatbot.
  • Discuss implementation details and system integration with mentors.
  • Explore fallback options if needed.

Additional Notes

  • Current work is available in this repository: BeagleMind-RAG-PoC
  • Plan to create the official OpenBeagle repository right after the Saturday meeting, once the MVP scope is further clarified.
1 Like

Meeting Notes

In today’s meeting, we went through the following points:

  • Checked the RAG performance, and it seemed to be working efficiently.
  • I should start working on the CLI tool.
  • I need to prepare a full report about the costs of hosting the web app.
  • The chatbot demo should be hosted on Hugging Face Spaces for public testing.
1 Like

Hello everyone, here’s my progress for this week.

Weekly 4 Update

Progress:

  • This week, I switched the data sources from the repository files to the actual documentation hosted on docs.beagle.cc.,
  • I also integrated forum threads into the vector store to enhance the chatbot’s ability to assist with hardware troubleshooting.,
  • After achieving promising results, I deployed the chatbot on Hugging Face Spaces. I’d love for you to try it out here: BeagleMind Chatbot,
  • Please feel free to share feedback or critiques on this forum thread.,

CLI Work:

  • I started working on a CLI tool, drawing inspiration from the Gemini-CLI and Claude-Code CLIs, especially their innovative approaches to developer tools.,
  • Initial work focused on enabling file editing and code generation using tool calling. The edit_file tool is functional but still requires refinement and testing.,

Blockers:

  • Scraping content proved to be time-consuming, particularly because I aimed to extract specific data in a structured format suitable for LLM understanding and easy chunking.,
  • The edit_file tool lacks reliability and needs further maintenance to improve accuracy and stability, the bright side is that it is currently showing some good potential.,

Next Steps:

  • Continue building out CLI capabilities, with a focus on adapting useful and innovative features from existing tools like the Gemini CLI.,
  • Improve the robustness of the file editing tool and expand CLI functionality to better serve developers.,
1 Like

Meeting Notes

Date & Time: Saturday, June 5th, 5:30 PM CET
Attendees: Fayez Zouari, Aryan Nanda, Kumar Abhishek
Purpose: Progress update and discussion on CLI development

Discussion Points

  • Weekly Progress:

    • Discussed the idea derived from the Gemini CLI.

    • Implemented and tested core tools, including:

      • read_file
      • write_file
      • edit_file
      • run_shell_command
    • Initial CLI functionality is in place.

  • Demo Recap:

    • A brief demonstration of the CLI was presented.

    • Tools functioned as expected; however, the results were not consistently accurate.

    • Identified that the performance issues were primarily due to model compatibility:

      • OpenAI SDK works best with OpenAI models due to optimized tool-calling support.
      • Groq and Ollama models demonstrated significantly lower performance in this context.
  • Objectives for Next Week:

    • Test the CLI on beagley-ai.
    • Improve the chat command to be more interactive and user-friendly.
    • Try to enhance the tool calling and report progress as soon as possible.
1 Like

Week 5 Updates

Progress

  • Improved the user experience by implementing an interactive chat interface within the BeagleMind CLI. Users can now engage in a continuous conversation without needing to re-run a command for each prompt.
  • Conducted multiple rounds of testing, with a focus on improving the performance and accuracy of tool calling. While the results are not fully consistent yet, they show improvement.
  • Made general refinements across the CLI to streamline functionality and enhance usability.

Blockers

  • Faced challenges while setting up the edit_lines tool. The integration process turned out to be more complex than anticipated, causing some delays. Debugging is still in progress to get it fully operational.

Objectives

  • Spend the rest of the weekend setting up Beagley AI and testing the CLI in that environment.
  • Experiment with the Qwen3 1.7B model to evaluate its performance within the CLI workflow.
2 Likes

Weekly Update

Hello, Here’s a quick summary of my progress for this week:

  • I focused on testing the CLI on the Beagley AI device. Online inference using Groq Cloud worked really well. However, I still need to look into local inference.,
  • I separated the CLI from the retrieval component, as the retrieval dependencies are quite heavy and take up significant time and storage to install.

Blockers:

  • The main blocker was around self-hosted models. I installed Ollama on the BeagleY-AI and pulled three models to test (running on CPU), but none provided satisfactory results so far. The thing is that when I tried to test an LLM with 1.5b parameters (DeepSeek-R1) + 128K token as context window, the inference time was very high (the ollama server timed out), so i tried a smaller context window with 355M parameter (SmollLM) and it also didnt work because of the small context size. (I could maybe reduce the input size and the size of the retrieved context and see what happens).

Plans for next week:

  • Properly dockerize the API.,
  • Create QA pairs to evaluate model performance and help in selecting the best-fit model.,
  • Quantize the tested models and run separate evaluations.,
  • Explore the possibility of using hardware accelerators on the BeagleY-AI for running the LLMs.

Meeting Notes

The meeting was held on July 16th at 5:15 PM CET on Google Meet.
Attendees: Fayez Zouari, Aryan Nanda, Jason Kridner and Kumar Abhishek.

The meeting agenda involved the following keypoints:

1. Objective Overview:

  • Quick introduction to the purpose of the meeting,
  • Recap of previous meeting highlights,

2. Project Progress Review :

  • Discuss updates and what’s new since the last meeting,
  • Present the new approach:
    • Separation of retrieval into a standalone API,
    • Integration plan with the web app,

4. Blockers & Challenges:

  • Main blocker:
    • Finding an LLM that supports tool calling , offers a large context window , and remains lightweight in parameters
  • Open discussion for suggestions and potential solutions

5. Next Steps & Action Items:

  • Define what needs to be done for the next few days

6. Open Floor:

  • Time for any additional points, suggestions, or questions

Open discussion:

In the open discussion, everyone suggested some quite good ideas:

  • Create QA pairs to evaluate model performance and help in selecting the best-fit model.
  • Quantize the tested models and run separate evaluations.
  • Explore the possibility of using hardware accelerators on the BeagleY-AI for running the LLMs.
1 Like

Weekly Update

Progress:

  • Created QA pairs to evaluate our system using BLEU, ROUGE, and METEOR metrics.
  • Attempted to run the model on CPU but encountered persistent issues. I also tried converting it to an ONNX Runtime format, but the model’s download size exceeded my internet capacity. It should take some few more days.
  • Began Dockerizing the API, it’s nearly complete.

Blockers:

  • Spent significant time trying to figure out how to utilize BeagleY-AI’s hardware accelerators to run the LLM, but haven’t succeeded yet.

Next Steps:

  • Finalize the Docker setup for the API.
  • Test inference using Unsloth models as a potential alternative.

Questions:

  1. Will I have access to a virtual machine to host the API? I’ll be focusing on preparing an image for others to run on their own hardware.
  2. Could you please review the QA pairs I’ve generated? I’d like to know whether they’re good to proceed with or if any improvements are needed before running evaluations.
1 Like

Minutes of Meeting [29/07/2025]

During today’s meeting, Aryan and I reviewed the codebase and outlined the next set of tasks:

  1. Model Optimization & Deployment
  • Explore quantized models from Unsloth.
  • Attempt to convert open-source LLMs to ONNX runtime models.
  1. Benchmarking Preparation
  • Add up to 100 QA pairs to support benchmarking of the RAG pipeline.
  1. Documentation
  • Draft a user onboarding guide for BeagleMind.
  • Create a contributor guide for new developers joining the project.
  1. Technical Investigation
  • Document the rationale for running the Ollama server on CPU instead of hardware accelerators.
  • This is tied to TIDL version compatibility issues on the BeagleBone AI-64 and BeagleY-AI.
  • Research and suggest potential solutions to address this hardware limitation.
  1. Architecture Visualization
  • Create a diagram that simplifies and clarifies the overall architecture of the project to make it easier to understand and communicate.

Let me know if I missed anything or if you’d like to add more context to any point.

1 Like