💾 Archived View for gemlog.blue › users › jiaming › 1630615397.gmi captured on 2023-06-16 at 17:30:52. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-04)
-=-=-=-=-=-=-
- oh oops, i wrote some stuff in chromium about my daily activities, learning fluid mechanics is super doable with video + textbook, MEC2404 test1, stupid discord failing during meeting when im supposed to be the "leader" etc
- Wow ELAN... what a world, linguistics is
- Lots of Jakarta Indonesian, Jakarta Indonesian Child Language, Jakarta Indonesian Child-Directed Speech as first results. LOTS why only study childrens speech? And why need SO MUCH??? Developing education is it?
- but wow The Language Archive is full of recordings
- why only one second clips instead of matching text with audio?? cant be file formatting problem... and what's with the weird spelling i wonder, maybe in OHHH it's Betawi Malay, so they tried capturing its sounds more authentically i see
- OHH. NO WONDER. lol the children felt so uncomfortable with the speaker
- why tf do people still write stuff in C# :'( Just complaining cuz mono runtime cant run libgen desktop
- linguistic annotation intersects with computer science's natural language processing
- Moisl (2009) makes the following critical reminder:
Data is ontologically different from the world. The world is as it is; data is
an interpretation of it for the purpose of scientific study. The weather is
not the meteorologist’s data – measurements of such things as air
temperature are. A text corpus is not the linguist’s data – measurements of
such things as average sentence length are. (p. 876)
- Investigating Obsolescence by Nancy Currier Dorian might be of interest, but nah not right now
- You know that's what im talking about. Language annotation should be a fundamental skill cultures/societies should have, with jobs and stuff
- I WANT THIS BOOK :'( Best Practices for Spoken Corpora in Linguistic Research https://cambridgescholars.com/product/978-1-4438-6033-8
- Btw i have a feeling the indonesian hokkien youtuber deleted his channel :( THIS is why you archive stuff haih, nvm maybe temporary private only
- btw diving into different worlds is so fun, like what workflows and work cultures u have? What softwares (ELAN, FoLiA)? What ideals (like language preservation)?
- Mystery of the decade... What no one published anything on the difference between ELAN and FoLiA ? Immediately I can think of like.. fracturing/splitting (cant rmb the term, but like the risc-v one) of the xml formats and incompatibilities/lack of thinking about long-term preservation etc etc. But tbh i dont think it's too big a deal, as long as good documentation survives. TBH elan's docs isnt even up to my expectations, but probably unis have really good stuff locked up away.
- Btw my browsing tab behaviour is super interesting, like i think i try to keep at most 10-12 open, so i notice i regularly close the less important ones. Somehow in like not a long time away i'll find what i thought was really important one of the not so important ones
- *btw i hate the .eaf elan xml format... yucksss all caps? and wtf is the time stamp organization?*
- WTF encoding in flac is so FAST HAHAHAH compared to video omggg.... and half the size and probably with error correction. does wav do crc internally?
- default ffmpeg conversions: HOLY SHIT 1.1GB (.wav) -> 725MB (.flac) -> 20.7MB (.opus) -> 359MB (.flac) just cuz of background noise???? WTF
- hmm can compression be considered noise remover??? guess not? idk but wow
- dang only if i could get my hands on the 2000-2002 colloquial jakartan spoken corpus for sneddon's book...
- Arbil for metadata organization?? .cmdi interestinggggg
https://archive.mpi.nl/forums/t/arbil-information-manuals-download/1045
- ish dumb ulcer-ish thing near my throat
- I REALLY REALLY REALLY WANT this book: "Best Practices for Spoken Corpora in Linguistic Research"