💾 Archived View for midnight.pub › posts › 494 captured on 2024-08-25 at 03:50:23. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-03)

-=-=-=-=-=-=-

Midnight Pub

Just the Text, Huh?

~starbreaker

Re: "Just Show Me the Text" by m150

If anyone knows of a proxy I could give a web URL to and receive a simple .txt version back of the article, please let me know! Otherwise, I might be tempted to create one. Maybe a gopher service?

I don't know about a proxy, but I wonder how far @m150 could get with the following command:

$ lynx -dump -nolist ${URL} > ${FILENAME}.txt

If a site is too dependent on JS, this won't work, but if there's text hidden under entirely too much JS this might be enough to extract it. You'll still want to massage it using sed, though.

That's what I did when retrieving and cleaning the Limyaael Rants.

Write a reply

Replies

~every wrote:

Lynx works OK and mine defaults to utf-8. I use a sed filter I built to convert extended ASCII stuff to be US-ASCII compliant. Here is my filter so far:

https://every.sdf.org/.webshare/TXT.txt

~m15o wrote (thread):

Thanks starbreaker! That's actually a very elegant way. Always impressed to see the wonders of piping commands. Someone else mentioned:

textify.it

Which I still haven't tested.