Week 1 Progress Report

Hi, This is my progress report of week 1.

Works Done:

  1. Worked on a script to keep a check on the accuracy statistics of pocketsphinx…based on the characteristics of the audio input - it’s sampling rate…chunk-size…etc. This can in turn help in accurately decoding the speech to text later on.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Speech_Processing/accuracy_check

  1. Made a script to record the audio from the microphone…and based on the intervals of silence…it splits the audio. As soon as the audio file is created…it pauses the recording process…and then performs the tasks to be done with the audio input…only to resume the process later. This code has been used in many different forms in many parts of the project…as and when required.

  2. Worked on using the script mentioned in 2 in order to launch other processes… especially the games. It is successfully able to launch the games…if the keywords are recognized well.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Game/Game_launcher

  1. Had to revamp the previously created Spell It! game. The newly made game takes inputs smoothly and operates on them.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Game/Games/Spell_It!

  1. Did the same for the Hangman game too.

https://github.com/AnirbanBanik1998/Modern_Speak_and_Spell/tree/master/Game/Games/Hangman

  1. Working on scraping more words to enhance the wordlist.

Issues faced:

  1. Had problems with running the recording script parallel to the games…as tried by me previously. Solved the issue by pausing the games in between to run the recording script as an when required.

  2. Set up most of the hardware as required to deploy the project…but am having some problems setting the headset…the pocketbeagle won’t recognize it.

Works to be done next week:

  1. Solving the hardware problems…and checking the progress on the board.

  2. Working on the third game…will be done soon.

  3. Have to start the 4th game(crossword) from scratch.

  4. Some recording scripts have been copied and used in different forms in different directories…want to reduce project space by reusing the same code in other places.

Regards,
Anirban

Hi Anirban
As per our discussion on IRC, I have used AWS Polly to create sound samples for you to test your work with. There are a around 1500 samples in total (~40 voices saying 36 words).

  • The data can be found attached. I didn’t actually play all the files in this .zip, but I made sure that they were non-null files and checked a few at random.
  • The script I used can be found here. You need aws cli installed and provisioned with access to the polly service. I understand that this might not be possible for you to setup as it’s behind a paywall. If you need any more stuff generated please let me know.
  • I had second thoughts about using this data as I felt that it might not be wise to use “computer generated” voice samples to test the hit rate of a voice recognition piece. But the data that you have provided here is straight forward - simple words and alphabets. Something I am kind off sure Amazon used a person to generate. Its only the sentences that Amazon stitches together using AI to the best of my knowledge. (Other mentors would love to hear your thoughts on this).

Regards
Anuj

samples.zip (5.46 MB)