💾 Archived View for bbs.geminispace.org › u › requiem › 18668 captured on 2024-08-19 at 01:02:07. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Comment by 💀 requiem

Re: "post text, not audio"

In: s/permacomputing

Podcast transcripts are now being pushed very "aggressively" at Apple podcasts for example. They are SUCH a good thing! But I do agree, discoverability is an issue with audio content.

I do think LLM's are a very good use for this. Transcripts are fairly accurate, they are not "generated" but rather "translated", and there are no copyright issues really I can think of when AI is used for such cases, the output will not infringe on anyone's copyright like when using it for text or images.

Of course all the usual caveats should apply - e.g. use only locally hosted LLM's (Whisper!), do not feed data back into training sets - unless it is all your own and really want to do so - no scraping, etc.

But otherwise, there is a big need for audio content. It's accessible: really good for those with learning disabilities or visual impairments; and ultimately we are a more auricular species as such, there is something different about how we process information we hear than what we read.

I consume a lot of my content via audio these days - mostly because I am stuck in the car quite a lot lately. I use screen readers often to read text content and they are just not as accurate really as "proper" audio content. I can make do with it, but I do prefer an audio book to simply a screen reader reading out the same content.

💀 requiem

Jul 19 · 4 weeks ago

1 Later Comment

🐙 norayr [OP] · Jul 19 at 10:49:

first of all, let me bring my excuses for not formulating the thoughts by putting enough effort, and for expressing it in a way i myself dislike. i tried to explain what is the problem and failed.

also, don't get me wrong. i don't only listen to podcasts, i host one myself with my friend. and just yesterday we had a 2 hour long live, discussing different tech and art issues.

also, it consided with flipperzero publishing on /s/music and i am so grateful for also posting the same tracks via gemini! but it also consided with me and my friend finishing our live session and me realizing again that i have no strength to make the transcript.

i already listened a part and i am extremely interested to listen the rest.

i also realize that not only text has advantages to audio, but the form of the audio has some advantages to text as well. not only because we can listen to it while, let's say driving or doing some work. but also because the format, the recording, may for example mean, that two people are talking, discussing things, that's not a monologue, that's a speech, that's not a written text, that is spoken text, and it conveys important information in intonations etc as well.

another disclaimer: what was also wrong on my side was to express in a way i am telling people to not do something or to do something. people have lots of reasons to do whatever they enjoy doing, feel necessary or find meaningful, etc.

so i tried to express the problem. i learnt lots of things from the internet. and i found lots of necessary information by searching.

search today is not something only corporation can implement. i use https://s.cybernuk.es i found out from the openbsd zine, and it seems that even i may have the resources to host one searxng instance.

however, i cannot train ai. that's my pain here. and that's also why i posted it to /s/permacomputing. that requires huge amounts of data, storage, and perhaps years of computations even if i get some gpu (i have zero suitable gpus).

and i think relying on ai is problematic on many levels, i can express it in another post. also it is not perma, i believe.

so i have no resources to make a transcript of our podcast. even if there was an ai i could run on my machine (i do run a couple of models occasionally), no model is trained to make a transcript of armenian, we do the podcast in.

so i feel that many audio files contain the informatio i would want to know desperately but would be unable to find, if i didn't know about the podcast. and where would i know from about it? maybe today i know, but i can imagine that isolated kid who i was, who didn't know where and how to even start searching answers for the questions, and what are the questions.

so we need podcasts. i learned lots of things from podcasts and i had the habit of listening podcasts every day. just i have no idea how to solve the problem by being perma. not sure just metadata is enough.

now we are encouraged to write good alt descriptions for images. that's very good, i think. i was doing that years ago, but later gave up. now i see that there is a consensus about it, and i am glad, already made an alt description in recently posted via markdown picture.

and excuse me so much again.

Original Post

🌒 s/permacomputing

post text, not audio — publishing audio is convenient, but how to find it on the internet? we even agree on that images should have alt descriptions. otherwise we should rely on ai (which is not lowtech) to find us audio or video files that have the information we search for. p. s. that also relates to 'voice messages' in chats. it is easy to message, but it is not possible later to find the information in the chat log. again, ai may help, but do we want it to help? also, while it is easy...

💬 norayr · 5 comments · Jul 18 · 4 weeks ago