💾 Archived View for midnight.pub › posts › 494 captured on 2022-03-01 at 16:59:38. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
Re: "Just Show Me the Text" by m150
If anyone knows of a proxy I could give a web URL to and receive a simple .txt version back of the article, please let me know! Otherwise, I might be tempted to create one. Maybe a gopher service?
I don't know about a proxy, but I wonder how far @m150 could get with the following command:
$ lynx -dump -nolist ${URL} > ${FILENAME}.txt
If a site is too dependent on JS, this won't work, but if there's text hidden under entirely too much JS this might be enough to extract it. You'll still want to massage it using sed, though.
That's what I did when retrieving and cleaning the Limyaael Rants.
Lynx works OK and mine defaults to utf-8. I use a sed filter I built to convert extended ASCII stuff to be US-ASCII compliant. Here is my filter so far:
https://every.sdf.org/.webshare/TXT.txt
Thanks starbreaker! That's actually a very elegant way. Always impressed to see the wonders of piping commands. Someone else mentioned:
Which I still haven't tested.