💾 Archived View for gemini.patatas.ca › posts › text-to-speech.gmi captured on 2024-12-17 at 09:45:12. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

2024-10-23

The failure of 'better' text-to-speech

The Android document reader app i've been using has a text-to-speech function that I like, but the $20 price to enable listening with the screen off is too steep for me - so I went looking for others that I could use while walking or doing chores.

There's a frustrating phenomenon with these apps though, and it corresponds to how 'advanced' the app claims to be: the more the reader tries to guess at the flow & cadence of a passage, the harder it is to understand it.

The reader I like only does a little bit of this guessing - downward pitch on the end of statements, upward at the end of a question, slight pauses on commas, etc - but doesn't try to do much more. It's great, because it's relatively easy to approach the listening as if you were reading the words - you can mentally create the proper cadence for yourself on top of the neutral reading.

But having the reader get it actively, confidently wrong, rather than just not get it right, makes this mental process (at least for me!) nearly impossible, because you have to somehow *un-hear* the wrong cadence before you can even start doing the parsing yourself.

It's like putting touch-screens in cars. Maybe at first glance it looks like the future, and then you quickly realize it's a massive step backward. Why do they do it? I'll never understand.

(if anyone has an app recommendation, btw... ;) )