2022-10-19 Speech to text transcription

Early morning, I hear the electric blinds being raised by the neighbours in our building…

A long time ago, I tried to generate transcriptions of my Halberds & Helmets Podcast using Pocketsphinx. I wasn’t very happy about it.

Halberds & Helmets Podcast

Oh wow. This would require *a lot* of editing! – 2019-10-15 Speech to text using Pocketsphinx!

2019-10-15 Speech to text using Pocketsphinx!

Recently, I tried Whisper.

Whisper

It wasn’t as easy to install as I had hoped because my distribution came with an older version of numpy. This is how I installed a newer one:

pip3 install numpy==1.21.6

When I ran it, it was extremely slow. The results were pretty good! I used it to transcribe 2018-12-19 Episode 02. Most of the editing was due to how I speak, like enthusiastically beginning sentences with “and! and, …”

2018-12-19 Episode 02

Today I got a recommendation for nerd-dictation which I don’t need but which uses vosk-api.

nerd-dictation

vosk-api

Again, I had to force install a newer version of something:

pip3 install pip==22.3

When I ran it, I realised it is *much* faster than Whisper. Sadly, the quality is not very good. No capitalisation, no punctuation, and a lot of mistakes.

For comparison…

Pocketsphinx:

ffmpeg -i 03-halberds-and-helmets.mp3 -acodec pcm_s16le \
    -ac 1 -ar 16000 03-halberds-and-helmets.wav
pocketsphinx_continuous -infile 03-halberds-and-helmets.wav \
    -hmm /usr/share/pocketsphinx/model/en-us/en-us \
    -lm /usr/share/pocketsphinx/model/en-us/en-us.lm.bin \
    -dict /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict \
    > 03-halberds-and-helmets-pocketsphinx.txt
so does our legs and that is a third part of the office and out of pocket cost yea and for me to carry off race has cast the net means there is no free combining off your species and your jaw so there’s no more from magic users there’s just more so it is no fees else is just the elves and else can fight and cost spells so they’re always basically multi class fighters imagine use know how things are just a just coughing spare pair small vehicles and they fight like fighters and of course as more people and if i could find is that that bit of extra for this and that and they can to know that about stone works and they can sneak out in the wilderness and nice find simple to understand

This is so bad I wonder whether there is something about the parameters and dictionaries I could provide in order to improve this.

Vosk API:

vosk-transcriber -i 03-halberds-and-helmets.mp3 \
                 -o 03-halberds-and-helmets.txt
hello this is alex and dad does it does a third part of their hobbits and helmets podcast yea i’m firmly into camp of race as class that means there’s no free combining of your species and your job so there’s no and magic users does just dwarfs as no thief elves just just elves and else can fight and cast spells so they’re always basically multi class fighters and magic users the how flings or just a just half links dare dare small people and they fight like fighters and that was us more people and they fight like fighters the have little bit of extra of this and that and taken the know a bit about stone works and that can sneak bout in the wilderness and and that’s fine and simple to understand simple time to explain at the table and it works for me

Whisper:

whisper 03-halberds-and-helmets.mp3
Hello this is Alex and this is the third part of the Hobbits and Helmets podcast. Yay! I’m firmly in the camp of race as class and that means there is no free combining of your species and your job. So there’s no dwarven magic uses, there’s just dwarves, there’s no thief elves, there’s just elves and elves can fight and cast spells so they’re always basically multi-class fighters and magic users. The halflings are just halflings, they’re very small people and they fight like fighters and the dwarves are small people and they fight like fighters, they have a little bit of extra of this and that and they know a bit about stoneworks and they can sneak about in the wilderness and that’s fine and it’s simple to understand, simple to explain at the table and it works for me.

It just takes so loooong.

​#Podcast ​#Speech to Text

Comments

(Please contact me if you want to remove your comment.)

I just checked, and they released a new Pocketsphinx 5.0.0 just a few days ago. Perhaps I should give it another go?

PocketSphinx 5.0.0 is released! – CMUSphinx

CMUSphinx

– Alex 2022-10-19 06:50 UTC

---

How slow is Whisper? On my laptop it took about 209min of user time, at about 140% CPU, or about 145min of elapsed time to encode 16min 30s of podcast, or nearly 9 min of transcription per minute of podcast.

– Alex 2022-10-19 09:41 UTC

---

@ajroach42 asked me about my hardware and said they were getting “transcriptions on 20 minutes of audio in 5-10 minutes.”

@ajroach42

I’m just using my regular laptop. Model name as reported by lscpu is “Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz.”

Apparently it really benefits from a graphics processing unit (GPU).

– Alex 2022-10-19 13:59 UTC