Offline SmartSpeaker with BeagleBone. My approach, comments and suggestions?

Hi everyone!

I became instantly interested in the project idea Offline SmartSpeaker with BeagleBone, it co-incidentally happens that I am in process of implementing a very similar idea with a raspberry pi. I have made a significant advances till now i have already configured an external audio card with alsamixer and a few changes in the kernel working on CMUsphinx now, I plan to switch to beagle board. this would be an extensive project, I am planning to aim for

  • I would be using CMUsphinx for voice recognition and Espeak voice synthesizer platform for feedback and talkback functionality for the smartspeaker.
  • all of this will be offline.
  • The smart speaker will include features like trigger words, ability to understand and do tasks like control music, alarms, calendars, etc.
  • I would also like to aim for creating a home automation API which would allow users to attach there own controllable electrical appliances with different operation states to the beagle bone with ease and thus control them with voice
  • yes all beagle A8 platforms

I would like the mentors and the community to comment on the relevance of my approach and give suggestions.

Thanks

Apoorv Gupta

IRC-apoorvtintin

Hi everyone!

I became instantly interested in the project idea Offline SmartSpeaker with
BeagleBone, it co-incidentally happens that I am in process of implementing
a very similar idea with a raspberry pi. I have made a significant advances
till now i have already configured an external audio card with alsamixer
and a few changes in the kernel working on CMUsphinx now, I plan to switch
to beagle board. this would be an extensive project, I am planning to aim
for

   - I would be using CMUsphinx for voice recognition and Espeak voice
   synthesizer platform for feedback and talkback functionality for the
   smartspeaker.
   - all of this will be offline.
   - The smart speaker will include features like trigger words, ability to
   understand and do tasks like control music, alarms, calendars, etc.
   - I would also like to aim for creating a home automation API which
   would allow users to attach there own controllable electrical appliances
   with different operation states to the beagle bone with ease and thus
   control them with voice
   - yes all beagle A8 platforms

I would like the mentors and the community to comment on the relevance of
my approach and give suggestions.

There are a few more things to consider for this:
- Have you looked in to how well CMUsphinx works on a processor like what is
on the Beagle family (Cortex-A8)?
- Have you considered pocketsphinx?
- Does the combination of CMUsphinx/Espeak leave enough free CPU to do other
things?
- How do you plan to implement trigger words in an efficient/useable fashion?

The goal here is not necessarily to have a "product" but to have a framework
showing it is possible to have an open-eco system smart speaker.

Thanks for the comments Hunyue Yau. I would be glad to answer them

  1. Pocket sphinx is the choice, since it is best suited for embedded applications
  2. According to my experience working with pocket sphinx on similar embedded systems so far, I think for an offline speaker with some limitations pocketsphinx would work just fine on the A8 and ram available on beaglebone.
  3. Also in accordance to figures I’ve come across! pocketsphinx on idle listening uses much less CPU(single core multithreading enabled) power(approx. 10%) than when it is actually processing the input audio(approx. 60-70%) which happens in burst, so considering the voice/audio will be processed only on specific moments, yes the A8 can be used to perform other tasks as well.
  4. Pocketsphinx already has some support for keywords. Pocket sphinx is always listening for voice, when it detects audio it looks up in the key wordlist, dictionary and language model which I will provide, this is not very accurate unless tuned properly which will be one of my biggest task. After the keyword is successfully detected it will invoke the complete voice recognition and feedback part.
  5. Yes I totally have the same thing in mind, goal will be to create a framework system showing it is possible to have an open-eco system smart speaker, as well as creating robust flexible framework on which other hobbyists and the beagle community can build upon/customize for various voice controlled applications. that would involve clear documentation and readable code as well.