gemini - kennedy.gemi.dev

💾 Archived View for gemlog.blue › users › jiaming › 1630615397.gmi captured on 2023-06-16 at 17:30:52. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

- oh oops, i wrote some stuff in chromium about my daily activities, learning fluid mechanics is super doable with video + textbook, MEC2404 test1, stupid discord failing during meeting when im supposed to be the "leader" etc

- Wow ELAN... what a world, linguistics is

- Lots of Jakarta Indonesian, Jakarta Indonesian Child Language, Jakarta Indonesian Child-Directed Speech as first results. LOTS why only study childrens speech? And why need SO MUCH??? Developing education is it?

- but wow The Language Archive is full of recordings

- why only one second clips instead of matching text with audio?? cant be file formatting problem... and what's with the weird spelling i wonder, maybe in OHHH it's Betawi Malay, so they tried capturing its sounds more authentically i see

https://www.eva.mpg.de/de/linguistics/past-research-resources/jakarta-field-station/acquisition-of-jakarta-indonesian/

- OHH. NO WONDER. lol the children felt so uncomfortable with the speaker

- why tf do people still write stuff in C# :'( Just complaining cuz mono runtime cant run libgen desktop

- linguistic annotation intersects with computer science's natural language processing

- Moisl (2009) makes the following critical reminder:

Data is ontologically different from the world. The world is as it is; data is

an interpretation of it for the purpose of scientific study. The weather is

not the meteorologist’s data – measurements of such things as air

temperature are. A text corpus is not the linguist’s data – measurements of

such things as average sentence length are. (p. 876)

- Investigating Obsolescence by Nancy Currier Dorian might be of interest, but nah not right now

- You know that's what im talking about. Language annotation should be a fundamental skill cultures/societies should have, with jobs and stuff

- I WANT THIS BOOK :'( Best Practices for Spoken Corpora in Linguistic Research https://cambridgescholars.com/product/978-1-4438-6033-8

- Btw i have a feeling the indonesian hokkien youtuber deleted his channel :( THIS is why you archive stuff haih, nvm maybe temporary private only

- btw diving into different worlds is so fun, like what workflows and work cultures u have? What softwares (ELAN, FoLiA)? What ideals (like language preservation)?

- Mystery of the decade... What no one published anything on the difference between ELAN and FoLiA ? Immediately I can think of like.. fracturing/splitting (cant rmb the term, but like the risc-v one) of the xml formats and incompatibilities/lack of thinking about long-term preservation etc etc. But tbh i dont think it's too big a deal, as long as good documentation survives. TBH elan's docs isnt even up to my expectations, but probably unis have really good stuff locked up away.

- Btw my browsing tab behaviour is super interesting, like i think i try to keep at most 10-12 open, so i notice i regularly close the less important ones. Somehow in like not a long time away i'll find what i thought was really important one of the not so important ones

- *btw i hate the .eaf elan xml format... yucksss all caps? and wtf is the time stamp organization?*

- WTF encoding in flac is so FAST HAHAHAH compared to video omggg.... and half the size and probably with error correction. does wav do crc internally?

- default ffmpeg conversions: HOLY SHIT 1.1GB (.wav) -> 725MB (.flac) -> 20.7MB (.opus) -> 359MB (.flac) just cuz of background noise???? WTF

- hmm can compression be considered noise remover??? guess not? idk but wow

- dang only if i could get my hands on the 2000-2002 colloquial jakartan spoken corpus for sneddon's book...

- Arbil for metadata organization?? .cmdi interestinggggg

https://archive.mpi.nl/forums/t/arbil-information-manuals-download/1045

- ish dumb ulcer-ish thing near my throat

- I REALLY REALLY REALLY WANT this book: "Best Practices for Spoken Corpora in Linguistic Research"