Hi, I will be posting my weekly updates in this thread, maybe in less than a week, for my GSoC project, Beaglemind,
Thank you for this opportunity!
I am currently working on enhancing and concretizing my diagrams, searching for advanced ways to scrape and collect data, especially to make my dataset take into consideration the metadata of the information, fetching the benchmarks to choose the the best LLM, vectorstore and finetuning techniques for our case, and I will finally start preparing for the introductory video, and a proof of concept for the chatbot.
See you soon!
1 Like
Hi everyone,
I hope you’re all doing well. Here’s a brief update on my progress for Week 0 of the BeagleMind project:
Model Evaluation:
I’ve assessed several language models for our project’s needs (still working on the model selection), and these ones showed most potential for the moment in terms of weight and performance:
- Mistral-Small-3.1-24B-Instruct-2503-GGUF
- Phi-4 (14B)
- Qwen2.5-Coder-7B
- Qwen3 30B-A3B
Fine-Tuning Environment:
I’m exploring Google Colab Pro/Pro+ for initial fine-tuning, leveraging powerful GPUs like the A100. Additionally, I’m considering using Unsloth to enable fast and memory-efficient fine-tuning on consumer hardware. For deployment, dedicated virtual machines (e.g., Hugging Face Inference Endpoints) will be reserved to ensure consistent uptime and performance.
Data Collection Pipeline:
I’ve initiated data collection from the BeagleBoard GitHub Organization, focusing on:
- Cloning repositories and extracting text from code, markdown files, and PDFs.
- Gathering information from community blogs, datasheets, hardware files, and forums.
The collected data will include metadata from these documents to enrich our dataset and also enhance response quality.
For more detailed information, please refer to the full document: Week 0 Updates
I’m looking forward to any feedback or suggestions you might have.
I’m updating the doc daily, so stay tuned for more updates!
Best regards,
2 Likes
Hey Fayez,
Thanks for sharing the update.
looking forward to the weekly updates on the project!
Good work so far, keep it up 
Regards,
Deepak Khatri
1 Like
Hello everyone, sorry for the delayed update!
This week, I focused on preparing the demo and the presentation for the introductory video. Kumar suggested a really cool feature for the CLI tool, code generation based on the documentation, and I decided to explore it as a proof of concept in the video. I tested it by generating a simple blinker using both shell commands and Python scripts.
I’m currently waiting for the video review, and once it’s approved, I’ll post it in the forum.
In the meantime, I’ve completed the Milestone 0 Updates document, you can check it out here:
Milestone 0 Updates
I’ve moved the diagrams task to the next milestone. I’ll also be creating a new document titled Milestone 1 Updates, which I’ll begin updating shortly.
You can check out the CLI repo here: https://github.com/fayezzouari/beaglemind-cli
Best regards,
1 Like
@FAYEZ_ZOUARI
If you maintain a single document for all milestones, with the latest updates listed first and tracked by week/day, it would be better than having separate documents for each milestone.
2 Likes
Alright will take that into account!
Hello everyone,
This is the official introductory video for my project:
https://youtu.be/pC97HKFRKUI
Hope you enjoy it ^^
Best regards,
1 Like
Hi, I’ll be sharing the notes from yestderday’s meeting with @Aryan_Nanda @KumarAbhishek
Discussion Points:
- Benchmark various LLMs to evaluate their performance in a RAG setup.
- Test the system with both relevant and irrelevant questions to assess reliability and robustness.
- Explore LLM distillation methods to create a lightweight model suitable for deployment on BeagleBoard hardware.
- Define evaluation criteria for the system, including relevance, accuracy, and latency.
- Specify a format for training data and a strategy for storing metadata to improve retrieval efficiency.
- Investigate hosted vector database solutions, specifically Milvus, for embedding storage and retrieval.
Updated Objectives:
- Benchmark the current RAG system using a wide variety of questions.
- Assess the reliability of responses based on documentation relevance.
- If the system proves unreliable, consider fine-tuning the base model with domain-specific data.
- If results are satisfactory, proceed with MVP development and prepare for deployment.
- Create a repository on OpenBeagle.
- Investigate the possibility to run LLM inference locally on a BeagleY-AI.