Beaglemind, Conversational AI Assistant for BeagleBoard documentations

Well written, just one update:-

Document why we can’t use TIDL to accelerate transformer layers on BeagleBone AI-64 and BeagleY-AI, and outline the potential next steps.

2 Likes

Weekly Update

Here is my update for this week:

Progress

  • Finalized the docker-compose setup to host the API, enabling users to deploy the retrieval system locally with minimal dependencies.
  • Worked on reducing the size of an LLM to produce a lightweight version with a reasonable context window, though the results were not yet successful.
  • Continued exploration of options to optimize model deployment for limited-resource environments using quantized models from Unsloth.

Blockers

  • Inference latency remains a significant issue when running LLMs on CPU, often leading to timeouts.
  • Still in the process of correctly converting LLMs to smaller formats; the current approaches haven’t produced functional models yet.

Next Steps

  • Explore quantized models from Unsloth.
  • Attempt conversion of open-source LLMs to ONNX runtime models.
  • Add up to 100 QA pairs for benchmarking the RAG pipeline.
  • Draft user onboarding and contributor guides.
  • Document the rationale for using CPU inference on Ollama, given TIDL compatibility issues on BeagleBone AI-64 and BeagleY-AI and why we can’t use it to accelerate transformers layers on these boards.
  • Research alternative solutions to the current hardware constraints.
  • Create a clear architectural diagram of the system for easier communication and understanding.
  • Prepare deployment plans for the API

Note
I will be unavailable for next week’s update as I will be traveling to China tomorrow to lead the Tunisian delegation participating in the International Olympiad in Artificial Intelligence (IOAI).

1 Like

Hi, this is a small update on the deployment phase, here’s how I’m planning to deploy each component of the project:

1 Like

Hi, in this message I would like to share the preffered specification for the deployment machine

1. Overview

This message outlines the deployment specifications for a virtual machine (VM) that will host a FastAPI-based API endpoint with a Milvus vector database. The system will use Docker Compose to manage four containers:

  1. Milvus – Vector database for storing and querying embeddings.
  2. etcd – Distributed key-value store for metadata management.
  3. MinIO – Object storage for Milvus data persistence.
  4. FastAPI – API server handling client requests and interacting with Milvus.

The VM must be configured to ensure smooth operation, scalability, and sufficient storage for vector embeddings (up to 30 GB).


2. Virtual Machine Specifications

2.1. General Requirements

  • Operating System: Ubuntu 22.04 LTS (Recommended for stability and Docker support).
  • Docker & Docker Compose: Pre-installed to manage containers.
  • Networking:
    • Public IP with ports 8000 (FastAPI), 9001 (MinIO), 2379 (ETCD) and 19530 (Milvus) exposed.
    • Firewall configured to allow inbound traffic on necessary ports.

2.2. Hardware Requirements

Resource Minimum Spec Recommended Spec Notes
CPU 2 Cores 4 Cores Milvus benefits from multiple cores for query processing.
RAM 8 GB 16 GB Milvus and etcd are memory-intensive.
Storage 50 GB SSD 100 GB SSD 30 GB for data + buffer for logs, MinIO, and OS.
Swap 4 GB 8 GB Helps with memory spikes.

2.3. Storage Breakdown

  • Milvus Data: ~20 GB (Vector embeddings, indexes).
  • MinIO: ~10 GB (Stores persisted Milvus segments).
  • etcd: ~5 GB (Metadata storage).
  • FastAPI & System: ~5 GB (Logs, Docker images, OS).

Total Recommended Storage: 50 GB (Minimum), 100 GB (Scalable).

Note: I think we can go for the minimum specs since the fetched data from the repositories don’t take that much space and require just a few gega bytes of storage.


3. Security Recommendations

  • Firewall Rules: Restrict access to API and Milvus ports.
  • HTTPS: Add a reverse proxy (Nginx + Let’s Encrypt) for FastAPI.

4. Conclusion

This setup ensures a scalable and efficient deployment of a FastAPI endpoint with Milvus for vector search. Adjust resources as data grows.

Next Steps:

  • Setup reverse proxy.
  • Set up automated backups.
  • Monitor resource usage (htop, docker stats, or even using Kubernetes for high level monitoring).

Let me know if you need adjustments! ^^

1 Like

Nice!

2 Likes

TIDL Support on BeagleY-AI and BeagleBone AI-64

Both boards are based on TI’s Jacinto 7 SoCs (BeagleY-AI: AM67A (J722S); BeagleBone AI‑64: TDA4VM (J721E)). In TI’s Processor-SDK Edge AI, both processors support the TI Deep Learning (TIDL) library and tools github.com. For example, a user reported that on the AM67A (BeagleY-AI) board using Processor-SDK Linux v10.00.00.08, the included TIDL version is 10.00.04.00 e2e.ti.com. (The TDA4VM board similarly runs SDK 10.x, which includes a comparable TIDL release.) In short, TIDL 10.x is the supported toolchain on both boards, enabling the TI DL runtime to leverage the C7x DSP and MMA accelerators when properly enabled github.come2e.ti.com.

Transformer-Layer Support in TIDL

TI’s TIDL library has limited support for transformer operations, primarily oriented toward vision models. As TI’s documentation explains, starting in SDK 9.1 TIDL added support for basic Vision Transformer (ViT) models and layers: it supports multi-head attention, LayerNorm, SoftMax and similar blocks used in image-classification transformers e2e.ti.com. Release notes for newer TIDL versions (v10.1.4) explicitly highlight improvements for “transformer-based architectures”: e.g. fixed-point acceleration of LayerNorm, SoftMax, Concat and Add, and enhanced accuracy for vision transformer backbones like Swin, DeiT and LeViT github.com. However, this support is aimed at vision transformers (image tasks). TIDL does not provide a general-purpose transformer accelerator for large language models; it does not, for example, implement all the attention/embedding/gather operations typical of an LLM. In practice, complex LLM architectures like Qwen2.5‑Coder (a large code-generation transformer) contain operators beyond TIDL’s current supported set. TI’s own forums note that only “basic transformer (classification) ViT” and “partial support of SwinT” are supported e2e.ti.com, implying that a full LLM is outside the optimized use case.

Documentation and Community Notes on Compatibility

TI’s official notes and the BeagleBoard community confirm these limitations. The TI forums explicitly state that TIDL’s transformer support is restricted to certain vision tasks e2e.ti.com, and urge use of TIDL’s model-import tools with specific workarounds (e.g. ONNX opset downgrades) for ViT/DeiT networks. Moreover, BeagleBoard users have reported practical issues enabling TIDL on these boards. For instance, BeagleY-AI’s default device tree disables the C7x DSP (necessary for TIDL acceleration) in the stock image forum.beagleboard.org. Enabling the DSP requires custom overlays or modified firmware – workarounds that are nontrivial. As one community post summarized: “You will have to create a device tree overlay because the C7x is disabled in the k3-j722s base device tree. I don’t see any of that in the beagley‑ai base tree, yet.” forum.beagleboard.org. By mid-2025 no straightforward solution was reported. In short, there are no official TI/BeagleBoard statements claiming full transformer support on these boards; all evidence suggests that (a) TIDL is only partly transformer-compatible and (b) on BeagleY‑AI/AI‑64 the DSP accelerator isn’t enabled by default, so TIDL cannot transparently speed up a general LLM.

Implications: Why CPU Inference Is Used on Ollama

Because the TIDL toolchain on these boards cannot fully accelerate a Qwen2.5‑Coder model, the only reliable execution path is CPU-only. Ollama (the LLM runtime) falls back to running the model on the ARM cores. This is “most compatible” because it does not rely on any specialized hardware or unsupported operators. In contrast, attempting to offload Qwen2.5‑Coder to TIDL would hit unsupported layers or fail to run entirely. Running on CPU avoids all these compatibility issues, at the expense of throughput. In summary: TIDL on BeagleY-AI and AI‑64 is limited to vision-optimized models and often isn’t fully enabled on the boards, so CPU inference remains the practical choice for Ollama deployments e2e.ti.com forum.beagleboard.org.

Sources: TI Processor-SDK documentation and forums e2e.ti.com e2e.ti.com; Texas Instruments edgeai-tidl-tools release notes github.com; BeagleBoard community forums forum.beagleboard.org (see above citations).

1 Like

Weekly Progress Update

Completed this week:

  • Added up to 100 QA pairs for benchmarking the RAG pipeline.
  • Drafted user onboarding and contributor guides here.
  • Documented the rationale for using CPU inference on Ollama, highlighting TIDL compatibility issues on BeagleBone AI-64 and BeagleY-AI, and explaining why transformer layers cannot currently be accelerated on these boards.
  • Created a clear architectural diagram of the system to improve communication and understanding.
  • Prepared deployment plans for the API.

Current challenges:

  • Exploring quantized models from Unsloth.
  • Attempting conversion of open-source LLMs to ONNX Runtime models.
  • Researching alternative solutions to overcome the current hardware constraints.

Plan for the next two days:

  • Focus on resolving the model compatibility and performance issue on the hardware as the top priority.
1 Like

Weekly Progress Update

Focus this week:

  • Worked on establishing a decent inference pipeline on BeagleY-AI.
  • Explored ORT and Unsloth open-source quantized models.
  • Completed the setup of the Docker Compose YAML to run the Beaglemind Platform along with the API and vector store here.
  • Added an on startup action for the API server so it automatically stores data in the vector store whenever the container is built (if it’s not already present).
  • Published beaglemind-cli on PyPI here.

Blockers:

  • Still working on inference with BeagleY-AI.
  • Ollama quantized models are timing out, and setting up/testing other open-source models has been challenging. Most of this week was spent experimenting with different models on the hardware.

Next Steps:

  • Continue working on stable inference for BeagleY-AI (optimistic it’ll be resolved in the next few days).
  • Finalize the full platform setup and identify potential additional features.
  • Run retrieval evaluations using QA pairs.
1 Like

Quick Update:
I’ve been working on getting inference running with Ollama. After experimenting and eventually setting a higher timeout limit, I finally got it working! Beaglemind can now generate responses locally through Ollama inference.

1 Like

Minutes of Meeting

Date: 23/08/2025
Attendees: Fayez Zouari, Aryan Nanda

Discussion Points

  • Weekly Progress: Reviewed the updates posted over the past couple of days.
  • Chatbot Enhancements: Discussed the need to improve chatbot performance and apply final touches before deployment.
  • Beaglemind Wizard: Introduced the new feature designed to facilitate onboarding of new contributors.

Action Items (Until Monday)

  1. Platform Development: Prioritize and finalize development of the platform to prepare for final deployment.
  2. RAG Evaluation: Run evaluations on the prepared QA pairs.
  3. Final Report Draft: Begin drafting the final report for the GSoC project, as submission will open soon.
  4. Demo Preparation: Prepare a demo showcasing both the CLI and the platform.
1 Like

Progress this week

  • Completed the first draft of my final report (will update by end of day). Thanks to Aryan for the feedback!

  • Implemented login functionality + conversation history storage for each user.

  • Nearly finished with the evaluations; I’ll share them by end of day and include results in the updated report.

  • Used RAGAS (Retrieval-Augmented Generation Assessment Suite), a framework for evaluating Retrieval-Augmented Generation (RAG) systems.

  • Extracted and analyzed the following metrics:

  1. Faithfulness → measures whether generated answers are consistent with the retrieved documents.

  2. Relevance → checks if the retrieved documents are relevant to the user’s query.

  3. Recall → evaluates the system’s ability to retrieve all the necessary information for a complete answer.

  4. Precision → assesses how much of the retrieved content is actually useful/needed for the answer.

  5. Correctness → evaluates whether the final generated response accurately answers the query.

Blockers

During evaluation, I concluded that ROUGE / BLEU / METEOR are not effective for RAG systems.

These metrics were originally designed for natural language generation evaluation (e.g., machine translation or summarization), where the output is expected to closely match reference text.

In RAG, however, correctness doesn’t always mean lexical overlap, the system can generate accurate answers in different wordings, which these metrics penalize heavily.

As a result, the scores were misleadingly poor.

To address this, I added the BERTScore / BertF1 metric, which measures semantic similarity rather than exact word overlap. This provided much more realistic and encouraging scores, aligning better with the actual system quality.

Next goals

  • Record and deliver a YouTube presentation summarizing the conclusions of my GSoC project.
  • Discuss with the mentors which version if the platform should we deploy.
  • Finalize the report ASAP.

Hi, Here’s the link to my final report : report
Also the link to my final demo video: video

Updates!!

Hello mentors and BeagleBoard community,

I’m sharing an update on BeagleMind, our AI assistant for BeagleBoard documentation. Our initial work with a RAG (Retrieval-Augmented Generation) system provided a solid foundation, but its performance in evaluations has been average. To achieve a higher standard of accuracy and reliability, I’ve begun exploring a more powerful approach: Supervised Fine-Tuning (SFT).

This method involves specializing a base language model by training it on a custom dataset tailored to our specific domain. My current focus is on building the pipeline for this, which involves three core steps:

1. Creating a High-Quality Dataset:
The first step is to generate a robust set of Question-Answer pairs directly from the BeagleBoard documentation. This dataset will teach the model our specific terminology, concepts, and use-cases.

2. Structuring the Training Data:
The QA pairs will be formatted using Chat Markup Language (ChatML). This involves defining a clear prompt template that includes a system role, tool definitions, and the expected conversation flow, ensuring the model learns to operate within a structured assistant framework.

3. Executing the Fine-Tune:
The final step is to use this carefully prepared dataset to fine-tune the model, aligning its responses closely with the needs of our community.

Current Progress & A Key Resource:

I am actively working on the first step of dataset creation. I’ve discovered a highly relevant research paper and framework that directly addresses this challenge:

“EasyDataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents”
(Link: https://arxiv.org/abs/2507.04009)

This tool is designed to automate the generation of QA pairs by processing our documentation in chunks, with configurable overlap to ensure context is preserved. I am currently experimenting with it to produce our initial training dataset.

Next Steps & Open Call:

My immediate goal is to generate a high-quality dataset using EasyDataset and proceed with the initial fine-tuning experiments. I will share the outcomes and learnings as they become available.

This is an exploratory phase, and community insight is invaluable. I am very open to suggestions, feedback, or contributions. If you have experience with model fine-tuning, data synthesis, or thoughts on what constitutes an effective QA pair for our documentation, please reach out.

Thank you for your ongoing support and guidance.

1 Like