💾 Archived View for zaibatsu.circumlunar.space › ~solderpunk › phlog › vf1-updates-and-tips.txt captured on 2020-09-24 at 01:50:10.
-=-=-=-=-=-=-
VF-1 updates and tips --------------------- I'm very happy to have had both the time and motivation to get quite a bit of good progress made on my gopher client VF-1 recently. This post will mostly be an update on some of the new functionality. But first, some usage tips! I noticed recently that tfurrows has been keeping a list of tips[1] on his circumlunar gopherhole (where he explores the bookmarking functionality more deeply than I ever have!), and thought I might contribute a few of my own. First of all, an embarrassing revelation - when I recently wrote my allegedly definitive guide[2] to viewing "long stuff", I forgot about one option for handling long menus! There are so many options even I forget them. That said, this one is not one of my favourites, it was an early hack to help people who were struggling in earlier days before "less" worked on menus. When you use "ls" to list the current menu selectors, you can give it the "-r" option (i.e. "ls -r") to view the listing in the reverse of the usual order. Thus, items at the top which would normally go flying off the top of your screen appear at the bottom where you can always see them. The obvious downside, of course, is that stuff is backward. I don't really recommend this approach, but thought I'd mention it for completeness. Onto something a little more useful! You are probably already aware that VF-1 lets you set any external command you like as a handler for different kinds of content. By default, the handler for "text/plain" is just good old "cat", which does nothing other than spit the text onto your screen. If it overflows, you can run the "less" command to look at it in your favourite pager. An alternative to this is to use less as your text/plain handler, but feed it a few more options. For the past few days I have been using "less -FXR %s" as my default plain text handler. The -F option tells less to immediately quit if the file is short enough that it fits entirely on one screen, and -X option tells it not to clear the screen after exiting (as is the default behaviour of more). What this does is basically turn less into an automatic "cat if short, less if long" viewer. The -R is just there so that ANSI colour codes don't get mangled (more on that later). This means stuff never flies off the top of your screen, and you never have to manually run less to read the top of something. This results in a pretty seamless experience and I think I'll stick with it. Okay, time for new features. Starting with something very minor, the "text/plain" handler is now used for both item types 0 and 1, whereas previously it only worked for type 0. This change was inspired by Tomasino who, when learning about handlers, immediately set his to lolcat - something I'd never heard of. I encourage you to check it out, even if only briefly for, well, the lols. Basically you can pipe text through it and it uses ANSI colour codes to render that text into a GLORIOUS RAINBOW. We're talking hundreds of colours, each character slightly different from it's neighbours. Tomasino was disappointed that this worked on content but not on menus (which in his case means his entire phlog), the handler is now applied to menus too so you can enjoy ubiquitous rainbows in gopherspace. Tomasino was *also* disappointed that the colours disappeared when he used the "less" command, because until now that command ignored the text/plain handler and just fed the content straight to less. Now, the "less" command runs your text/plain handler and pipes the output of that to less (or rather, less -R, to preserve colours), so you can get colours even when you are lessing! To more fundamental changes, Tomasino has once again spurred me to make some improvements, in his recent championing of better support for the "+" item type which is used to specify redundant severs, i.e. gopher servers which host a mirror of the content at the current server. The RFC is pretty vague about exactly how this is supposed to work. Most modern clients take a very minimal approach to supporting this, and just list the mirror items like they would any other link but do something minor to indicated "hey, this is a mirror". I think the intent was probably for clients to do a bit more with this. The RFC has various comments in it which makes it pretty clear (to me at least) that the target environment for gopher was under-resourced university departments setting up servers on whatever old and under-powered hardware they had lying around, and spreading information over as many servers as possible to reduce load. Early gopher servers were probably expected to fail regularly. So VF-1 tries to handle + items in such a way as to reduce the pain of servers. After seeing that content at server A is mirrored at server B, if an attempt to fetch something from server A later during the same VF-1 session results in any kind of network error, VF-1 will automatically try to fetch the content from server B instead. The usefulness of this in 2019 is arguably limited - for one thing, modern gopher servers are probably extremely powerful and extremely under-loaded compared to early servers, and for another there is no caching of redundant servers, so if the "main" server you attempt to visit is down, you have no way to learn what the backups are. It's not perfect, but it's better than nothing, and I'm proud that VF-1 actually makes an effort to *do* something with this information. Speaking of being proud, the other significant changes are related to text decoding, and I suspect VF-1 might now be the best gopher client in town for people who regularly visit content encoded in a variety of non-UTF-8 forms. Tomasino had nothing to do with this change, which was instead prompted by the latest user at circumlunar.space, tengu[3], who had some initial problems serving Russian text from his gopherhole there, whether using UTF-8 or older Cyrillic encodings like KOI8-R or CP1251. With some digging, it turned out that this was mostly the fault of Gophernicus, but VF-1 could stand some improvement too. In the earliest versions of VF-1, I assumed that all text coming over the wire would be either ASCII or UTF-8 (which decode identically) and left it at that. This worked fine for about a week until someone on BBOARD reported that VF-1 died when trying to read some news article over at floodgap's feeds. It turned out that the article contained a name with an accented character in it, which was encoded in ISO-8859-1. So, I did a bit of research, learned that the 3 most commonly used encodings on the web are, in order, UTF-8, ISO-8559-1 and CP1251. So, I updated VF-1 to try these, in order, moving down the list each time one failed. If you know anything about text encoding you'll recognise how naive this was. Any text which is valid CP1251 is also valid ISO-8559-1, so an attempt to decode as ISO-8559-1 will never fail. It may result in gibberish, but it won't throw an exception, and so CP1251 text will never be decoded properly. So, now VF-1 attempts to decode everything as UTF-8 first and, if that fails, tries a single fallback encoding. That fallback defaults to ISO-8559-1, but it is under direct and easy user control using the "set" command, so you can do "set encoding cp1251" to change the fallback. If you regularly deal with just one non-UTF-8 encoding, you can of course stick this in your ~/.vf1rc file to make it permanently. But wait, there's more. There is a very nice Python library called chardet which attempts to automatically detect text encodings making use of language statistics. You can decode CP1251 as if it were ISO-8559-1 no problem, but you'll end up with gibberish text whose n-gram distribution won't match any natural language. Chardet uses this fact to guess encodings and with a little practice it seems to work quite well. Now, I am very proud of the fact that VF-1 has no dependencies outside the Python standard library and that all of the code is in one single file. All of this makes it extremely easy to install, even in weird environments where modern tools like pip are not available. I don't ever want to change this, so VF-1 does *not* depend on chardet. But, if you install it yourself, VF-1 will recognise that it's there and adopt the alternative strategy of autodetecting the encoding if UTF-8 fails, and will drop back to the user-specified encoding only if chardet fails to identify an encoding with confidence above 0.5. With chardet installed, I was able to use VF-1 to cruise around some Russian gopher sites tengu linked me to, and whether I encountered UTF-8, KOI8-R or CP1251 encoding, it all Just Worked, which was tremendously satisfying. VF-1+chardet seems bullet-proofly international, which is fantastic. As an aside, I was amused to note that the chardet FAQ[4] has the following entry: > Yippie! Screw the standards, I'll just auto-detect everything! > Don't do that. Virtually every format and protocol contains > a method for specifying character encoding. The FAQ goes on to talk about HTTP, HTML, XML, etc. Out here on the plain text frontier, of course, there ain't any such thing (well, maybe the /caps.txt hack does something about this, hmm...), so I don't feel bad at all about auto-detecting everything. It's pretty much the only choice we have. For the record, this is *not* something that I think it is worth extending gopher to work around. There is a much simpler and nicer solution, which is simply to use UTF-8 for absolutely all new content in gopherspace, so that there is no *need* to explicitly specify the character encoding. That's all that's new, aside from some tiny tidy ups and fixes. There are a few other small things I'd like to tackle, but it's starting to feel pretty complete for me. [1] gopher://circumlunar.space:70/1/~tfurrows/tips/ [2] gopher://circumlunar.space:70/0/~solderpunk/phlog/looking-at-long-stuff-with-vf1.txt [3] gopher://circumlunar.space:70/1/~tengu/ [4] https://chardet.readthedocs.io/en/latest/faq.html