💾 Archived View for bbs.geminispace.org › s › permacomputing › 18663 captured on 2024-08-19 at 00:53:55. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

post text, not audio

publishing audio is convenient, but how to find it on the internet?

we even agree on that images should have alt descriptions.

otherwise we should rely on ai (which is not lowtech) to find us audio or video files that have the information we search for.

p. s. that also relates to 'voice messages' in chats. it is easy to message, but it is not possible later to find the information in the chat log. again, ai may help, but do we want it to help?

also, while it is easy to send a voice message, it is not always easy to listen to it. in some environments, during meetings, or in noisy places in is not possible or very hard to listen to those voice messages.

Posted in: s/permacomputing

🐙 norayr

Jul 18 · 4 weeks ago

5 Comments ↓

😎 flipperzero · Jul 18 at 22:33:

I don't understand your topic. Is this referring to the argument on whether or not audio can be given description or metadata aside from what's available in the waveform? There are many MIMEtype formats which support metadata like mp3 and ogg, and as a matter of fact they're even appended into the encoding of the file. It's a matter of having a server which is able to parse that information, and regardless, you can also link to an audio and personally type that description out as well. Images actually also have metadata that, if you wanted to, I'm sure could be parsed from a server if it implemented with a way to read through and print what's encoded. I don't believe audio shouldn't be posted

🚀 agh · Jul 18 at 23:27:

I think norayr is referring to the fact that audio files are not archivable like text messages, so you unable search or quickly scan audio messages like text messages. AI could be used to make a transcript I suppose.

😎 flipperzero · Jul 18 at 23:47:

@agh that's what I'm saying though, that's not necessarily true as audios are able to have metadata appended to their encoding. If you wanted your engine to parse through that it's a matter of scanning through the appending in the encoded file and fetching it back as a result. anyways, also not true as there's also filename to take in to account as well as description given in the posted published page itself via typed detail as a text field and then with a link to the content in question. If the file isn't picked up, the page with the description and a link to the file will. I don't find that to be a sound rational argument for keeping media away from smallnet, I'd say that hinders it.

💀 requiem · Jul 19 at 06:27:

Podcast transcripts are now being pushed very "aggressively" at Apple podcasts for example. They are SUCH a good thing! But I do agree, discoverability is an issue with audio content.

I do think LLM's are a very good use for this. Transcripts are fairly accurate, they are not "generated" but rather "translated", and there are no copyright issues really I can think of when AI is used for such cases, the output will not infringe on anyone's copyright like when using it for text or images.

Of course all the usual caveats should apply - e.g. use only locally hosted LLM's (Whisper!), do not feed data back into training sets - unless it is all your own and really want to do so - no scraping, etc.

But otherwise, there is a big need for audio content. It's accessible: really good for those with learning disabilities or visual impairments; and ultimately we are a more auricular species as such, there is something different about how we process information we hear than what we read.

I consume a lot of my content via audio these days - mostly because I am stuck in the car quite a lot lately. I use screen readers often to read text content and they are just not as accurate really as "proper" audio content. I can make do with it, but I do prefer an audio book to simply a screen reader reading out the same content.

🐙 norayr [OP] · Jul 19 at 10:49:

first of all, let me bring my excuses for not formulating the thoughts by putting enough effort, and for expressing it in a way i myself dislike. i tried to explain what is the problem and failed.

also, don't get me wrong. i don't only listen to podcasts, i host one myself with my friend. and just yesterday we had a 2 hour long live, discussing different tech and art issues.

also, it consided with flipperzero publishing on /s/music and i am so grateful for also posting the same tracks via gemini! but it also consided with me and my friend finishing our live session and me realizing again that i have no strength to make the transcript.

i already listened a part and i am extremely interested to listen the rest.

i also realize that not only text has advantages to audio, but the form of the audio has some advantages to text as well. not only because we can listen to it while, let's say driving or doing some work. but also because the format, the recording, may for example mean, that two people are talking, discussing things, that's not a monologue, that's a speech, that's not a written text, that is spoken text, and it conveys important information in intonations etc as well.

another disclaimer: what was also wrong on my side was to express in a way i am telling people to not do something or to do something. people have lots of reasons to do whatever they enjoy doing, feel necessary or find meaningful, etc.

so i tried to express the problem. i learnt lots of things from the internet. and i found lots of necessary information by searching.

search today is not something only corporation can implement. i use https://s.cybernuk.es i found out from the openbsd zine, and it seems that even i may have the resources to host one searxng instance.

however, i cannot train ai. that's my pain here. and that's also why i posted it to /s/permacomputing. that requires huge amounts of data, storage, and perhaps years of computations even if i get some gpu (i have zero suitable gpus).

and i think relying on ai is problematic on many levels, i can express it in another post. also it is not perma, i believe.

so i have no resources to make a transcript of our podcast. even if there was an ai i could run on my machine (i do run a couple of models occasionally), no model is trained to make a transcript of armenian, we do the podcast in.

so i feel that many audio files contain the informatio i would want to know desperately but would be unable to find, if i didn't know about the podcast. and where would i know from about it? maybe today i know, but i can imagine that isolated kid who i was, who didn't know where and how to even start searching answers for the questions, and what are the questions.

so we need podcasts. i learned lots of things from podcasts and i had the habit of listening podcasts every day. just i have no idea how to solve the problem by being perma. not sure just metadata is enough.

now we are encouraged to write good alt descriptions for images. that's very good, i think. i was doing that years ago, but later gave up. now i see that there is a consensus about it, and i am glad, already made an alt description in recently posted via markdown picture.

and excuse me so much again.