BeagleMind

A Conversational AI System for BeagleBoard

Goal: Fine-tuning an existing open-source model (like Llama 3, Mixtral, or Gemma) on domain-specific BeagleBoard data.
Software Skills: Python, Scraping techniques, Fine tuning LLMs(techniques like LoRA and QLoRA), Gradio, Hugging Face Inference Endpoints, NLTK/spaCy, Git
Hardware Skills: Ability to test applications on BeagleBone AI-64/BeagleY-AI and optimize for performance using quantization techniques.
Possible Mentors: @Aryan_Nanda
Expected size of project: ~175 hours
Project Difficulty: Intermediate to Advanced
Upstream Repository: TBD
References:

Deliverables:

  • A fine-tuned LLM trained on BeagleBoard data.
  • A detailed report on the usability and scope of the model, including:
    • Quantitative metrics (e.g., benchmarks like BLEU, response time).
    • Qualitative metrics (e.g., feedback from BeagleBoard community experts).
    • The cost of managing the model when deployed on Hugging Face.
  • A web interface (Gradio app) for interacting with the model deployed on Hugging Face.
  • Detailed setup instructions.
  • Testing whether the model runs locally using quantization techniques (advanced).
  • A full-stack application that includes authentication a database for saving history (advanced).

Technical Approach

  • Dataset Preparation: Gather data from diverse sources (forums, docs, discord etc.), extract text, clean it (remove noise, handle code), structure it (QA pairs, contextual data), chunk large documents, tokenize using the LLM’s tokenizer, and create a Dataset object (using datasets).

  • Model Fine-tuning: Fine-tune a pre-trained LLM (e.g., Llama 3/Gemma2) using domain adaptation (on contextual data) and instruction tuning (on QA and instruction datasets). Employ PEFT (LoRA/QLoRA) for efficiency. Evaluate using metrics like BLEU, ROUGE, METEOR, and exact match.

  • Model Interaction: Deploy the fine-tuned model on Hugging Face Inference Endpoints. Develop a Gradio web interface for user interaction, sending queries to the hosted model and displaying responses.

Impact

BeagleBoard currently lacks an AI-powered assistant to help users troubleshoot errors. This project aims to address that need while also streamlining the onboarding process for new contributors, enabling them to get started more quickly.

7 Likes

The basic plan for getting this idea working would be :

  1. Scrape docs beagle board and all related documents in LLM friendly format.
  2. Tokenize the data properly.
  3. Choose the finetuning strategy for eg: LoRa, DoRa and etc.
  4. Host the finetune weights on some platform. I would prefer hugging face but we have kaggle model too.
  5. Then we can pull this hosted weigths on beagle and try to inference the model on beagle board.
1 Like

Are you sure the ai64 has enough ram and gpu to make this happen.

1 Like

Running the model locally won’t be efficient. I tried running Llama 3B and Gemma 2 using Ollama, but the RAM is not enough. A smaller model with 375M parameters worked, but the results were pretty bad. So, deploying the model would be best.

1 Like

What does domain specific BeagleBoard data refer to?

If you ask popular GPTs about BeagleBoard, they will have some knowledge about it for sure, but not to the extent that a BeagleBoard expert would have. This domain-specific BeagleBoard data refers to information that would enable GPT to have knowledge equivalent to that of a BeagleBoard expert. This would help in solving errors easily, quickly understanding concepts without referring to documentation, or asking basic questions in forums.

This data would include BeagleBoard documentation. Feeding this data into the model would provide it with a contextual understanding of BeagleBoards and how they work. Additionally, QA pairs could be created using forum threads where messages are marked as solutions, such as [1] and [2]. This would allow the model to answer effectively.

The data could also include Discord threads and forum messages from reliable members.

This project is very data driven. The better the data we have the better will be final outcome.

1 Like

Thank you

1 Like

I am using django framework and api access. The 49%'r tripped the red flags so I stopped that project. Just waiting for another big AI startup to roll out and we will test their services.

That might be a good project for you, it is something that can be completed within your time allotment and will benifit others. I can see a BB specific gpt being hosted on a server running django that is querying an AI provider. Only thing is you would need some one to manage it periodically and pay the bill…

1 Like

The primary goal of this project would be to develop a minimal UI that allows us to test the model and evaluate its usability and scope. Initially, the focus will be on verifying the model’s performance against specific benchmarks. If the model meets these benchmarks, we can then consider expanding the project into a full-stack application. This expanded version could include features like authentication, rate limits, user access tiers, and history saving. However, for the scope of this project, I believe it would be more effective to limit our efforts to a minimal UI using Gradio.

This version gives a clearer flow from the project’s goal to potential future steps for setting up a full-stack application.

What do you think about this?

@Aryan_Nanda what criteria are we gonna use decide to transition from the minimal ui to a full stack application and how will we measure the model’s usability and scope during this phase.

The idea sounds fun to work on! I think we could make it even more efficient by using a smaller language model like llama 1b/3B or Gemma 2b combined with RAG over BeagleBoard docs. This would be faster to fine-tune, easier to deploy, and more resource-efficient.

We could also set up an automated pipeline that:

  1. Ingests new doc changes
  2. Updates the vector store
  3. Periodically fine-tunes the base model on new data

This way the assistant stays up-to-date with minimal manual intervention. I’d love to contribute to build this

Hello @Aryan_Nanda,
I am interested in this project. I have past experience in fine-tuning LLMs. How should I start this project?
How can I assure you that I am the right person for the job and can complete this project successfully?

I suppose smaller models like llama3.2 1B and 3B should run without any issue.

More than likely it would be a total waste of your time, however it would be nice if you could prove me wrong on that.

Can you please give me an idea where can we deploy the model? As these boards might not be able to run the models effectively

I have updated the references and deliverables. Please check them out as well.

This should answer your question @Karn

Great ideas, @vovw! think over overall workflow of project in that case.

Go through references.

didn’t work for me, you can try quantization techniques and then try again.

Please read this @prithvitambewagh

Sure, let’s see if we get good proposals for this project. It looks to me that people are interested.

AI is in, problem is they are going after boards the can run CUDA on a GPU and have plenty of ram. If you had some really cool plans for a project that would run on the Ti stuff and would perform well enough to garner interest that would be a plus.

I strongly feel it should be a basic level and every step presented fully along with the “why” behind it. Hidden levels and nodes are not concepts that are intuitive so it takes some preceding work to introduce that material. Much of this hinges on exactly how much the ti hardware can handle and still produce results that are not lame.

Creating a learning experience for everyone would be a win-win for your own personal development and the development of others in this space. Pretty sure you have discovered the industry paradigm regarding the SoC world, breaking that and bringing what is being hidden and obscured would be great.

Something that can be run locally, without eating up everything is using a Bert model, fine tuned on the forum data. Unlike generative AI tools, this approach does not create responses but instead excels at intelligently retrieving relevant question-answer pairs based on a given topic or query. It will be like a context-aware search tool, Rather intelligent one, something like Fine Tuned google search (It’s worth noting that Google employs BERT in its own search algorithm to enhance query understanding.) That’s probably the best we can do on minimal hardware requirements.

Or, We can use a Rule-Based Chat bot (Decision Tree) which doesn’t make sense in the context here due to large size of docs(It doesn’t make sense even with small docs due to missing semantics).

1 Like