💾 Archived View for bbs.geminispace.org › u › norayr › 18672 captured on 2024-12-17 at 15:41:58. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Comment by 🐙 norayr

Re: "post text, not audio"

In: s/permacomputing

first of all, let me bring my excuses for not formulating the thoughts by putting enough effort, and for expressing it in a way i myself dislike. i tried to explain what is the problem and failed.

also, don't get me wrong. i don't only listen to podcasts, i host one myself with my friend. and just yesterday we had a 2 hour long live, discussing different tech and art issues.

also, it consided with flipperzero publishing on /s/music and i am so grateful for also posting the same tracks via gemini! but it also consided with me and my friend finishing our live session and me realizing again that i have no strength to make the transcript.

i already listened a part and i am extremely interested to listen the rest.

i also realize that not only text has advantages to audio, but the form of the audio has some advantages to text as well. not only because we can listen to it while, let's say driving or doing some work. but also because the format, the recording, may for example mean, that two people are talking, discussing things, that's not a monologue, that's a speech, that's not a written text, that is spoken text, and it conveys important information in intonations etc as well.

another disclaimer: what was also wrong on my side was to express in a way i am telling people to not do something or to do something. people have lots of reasons to do whatever they enjoy doing, feel necessary or find meaningful, etc.

so i tried to express the problem. i learnt lots of things from the internet. and i found lots of necessary information by searching.

search today is not something only corporation can implement. i use https://s.cybernuk.es i found out from the openbsd zine, and it seems that even i may have the resources to host one searxng instance.

however, i cannot train ai. that's my pain here. and that's also why i posted it to /s/permacomputing. that requires huge amounts of data, storage, and perhaps years of computations even if i get some gpu (i have zero suitable gpus).

and i think relying on ai is problematic on many levels, i can express it in another post. also it is not perma, i believe.

so i have no resources to make a transcript of our podcast. even if there was an ai i could run on my machine (i do run a couple of models occasionally), no model is trained to make a transcript of armenian, we do the podcast in.

so i feel that many audio files contain the informatio i would want to know desperately but would be unable to find, if i didn't know about the podcast. and where would i know from about it? maybe today i know, but i can imagine that isolated kid who i was, who didn't know where and how to even start searching answers for the questions, and what are the questions.

so we need podcasts. i learned lots of things from podcasts and i had the habit of listening podcasts every day. just i have no idea how to solve the problem by being perma. not sure just metadata is enough.

now we are encouraged to write good alt descriptions for images. that's very good, i think. i was doing that years ago, but later gave up. now i see that there is a consensus about it, and i am glad, already made an alt description in recently posted via markdown picture.

and excuse me so much again.

🐙 norayr [OP]

Jul 19 · 5 months ago

4 Later Comments ↓

🌲 Half_Elf_Monk · Nov 14 at 22:34:

I appreciate thinking about this topic, and the comments put here so far. I also like listening to audio, which works well in the workflow (or lack thereof) in a given day. The capacity to generate accurate transcriptions is helpful.

turboscribe.ai blew my mind when I found it. This was a way to generate reasonably accurate text transcription of the audio, and do it relatively easily. I haven't found a way to do this locally, but @requiem mentioned something about Whisper so maybe that's the answer I need.

I don't know why podcasting services couldn't do this automatically for all the podcasts they host/serve. Or why someone couldn't train an AI (using their GPU's), and then distribute the model for others to use on their less-intensive machines. Maybe I'm not understanding the complexity.

I'd be interested to hear @norayr 's thoughts on why that isn't a permacomputing solution. I guess it's not sustainable in a very long run sort of way, but if we train the voice models now, wouldn't we be able to use that model down the road reasonably well? If all of society goes down in a CME or war or something, I have way more important stuff to do than worrying about whether listeners get a transcript of my podcast.

In any case, if anyone is running a local AI to generate good transcripts, please report in with your experience. That sounds very very useful.

🐙 norayr [OP] · Nov 16 at 00:17:

well i think i explained i wasnt right and asked to excuse me, i had some feeling at the point of time which made me write that.

but to sum up

ai is dependence on big corporations, lots of computational resources. not permacomputing. not possible to grow/train in a home lab.
to me it is much easier to read than to listen. the radio genre and video genre are not text, they have other means of\expression which makes them different. still when i drive i listen podcasts or music. i even started to generate my own podcast xml from my anonradio dj set recordings. and gave up soundcloud. but yes if i can choose i choose reading texts.

🐙 norayr [OP] · Nov 16 at 00:39:

accessibility is very important. we need transcripts but that is a hard task, and i would like to avoid corporate ai. and we have almost no other ai, only corporate ais. also for mty language there are no even corporate ais that could make a transcript. youtube cannot transcript armenian too. maybe one day it will able to. still i would like to avoid dependance on corporations and saas.

🌲 Half_Elf_Monk · Nov 16 at 21:42:

Ah, thanks @norayr , I think I understand you now. I also would rather the models were freed... from corporate and state dependence. I'm hopeful that such things will exist in the future. Cheers...

Original Post

🌒 s/permacomputing

post text, not audio — publishing audio is convenient, but how to find it on the internet? we even agree on that images should have alt descriptions. otherwise we should rely on ai (which is not lowtech) to find us audio or video files that have the information we search for. p. s. that also relates to 'voice messages' in chats. it is easy to message, but it is not possible later to find the information in the chat log. again, ai may help, but do we want it to help? also, while it is easy...

💬 norayr · 9 comments · Jul 18 · 5 months ago