There has been a lot of talk on the Gemini mailing list lately about applications, and dealing with the ugly realities of things like spam and CSRF vulnerabilities
To some extent, all this app stuff has taken me kind of by surprise. Not the fact that they're possible or that people want them, I expected that, and the client certificate stuff in Gemini was partially motivated by wanting a better way to do server-side applications. But I guess I was not prepared for how badly people would want apps, how many of them, how quickly, and what kind they would be - that serving static content out of the filesystem would be seen as an unfortunate limitation. I guess years of living in Gopherspace have narrowed my vision a bit.
Regardless, Gemini apps are going to happen, and I *welcome* them happening. But, as always, I'm really mindful of us not blindly following in the footsteps of the web. I worry that already people are racing to implement exactly the kind of apps the web has in exactly the same way, and recognising exactly the same kind of problems, and wondering whether or not we need to use exactly the same solutions.
We can, and should, dare to be different!
After giving it some thought, I've come up with a vision of how Gemini applications could work which leverages the core strengths and unique properties of the protocol and which I think/hope can sidestep not only the CSRF risk but also the complicated management of multiple client certificates from within a single client and the corresponding risk of accidentally using the same certificate across multiple sites. I outline this vision in the rest of the post.
Personally, I think the most compelling usage case for applications in Gemini is for stuff you host yourself (or which is hosted for you by a pubnix or other friendly and helpful group) for your own use, i.e. not for quick and easy interaction with random strangers, which is frankly a bit of web culture that I have extremely little interest in widely replicating in Geminispace. Think about TODO list apps, or scheduling apps, or a microblogging app like the one I sketched recently (which does include a public-facing component, so I'm not necessarily talking strictly about "hermit apps" here):
Microblogging via Gemini: a sketch
These "by me, for me" apps can have their access secured by pre-authorising a client certificate via its fingerprint, which easily locks access down to only you, so there is no worry about spam, which is perhaps the biggest problem which comes from opening stuff up to the public.
We still have the CSRF problem, though, even in this otherwise secure context. It's possible that some random jerk will put a link on a Gemini page they control which points to, say:
gemini://localhost/popular-application/delete-everything?confirm=yes
with a link text of "Cute cat pictures!" in the hopes that you'll follow it and your day will be ruined because your client was, either at that point in time or always, configured to use a particular certificate for requests to localhost without prompting you, in order to make your secure local apps more convenient to use.
We could solve this by having our apps inject random nonces into URLs, like the web, or we could solve it by only using clients which force us to go through a clunky confirmation procedure every single time a certificate is activated. But instead, why don't we just not develop clients and workflows where this is a risk? The threat of CSRF arises when you use a client which is (or automatically will be) authorised to do something consequential at site A to also consume content originating from site B. What if we just never, ever did that?
A basic Gemini client which only handles text/* responses, which has no bookmarking, no history, no GUS-integration or other niceties, is a very small thing. You can write it in 100 or 200 lines of code in any modern
language and it will work just fine. Believe me, I've done it. You can run several of them at once and your computer is unlikely to complain about the strain. We should make good use of these properties!
Imagine a client like the above which takes only the following command line arguments: a compulsory Gemini URL and compulsory paths to a client certificate and matching public key. The specified cert is always and automatically used for any and all requests the client sends. When the client starts up, the supplied URL is fetched: if the response has an error status, or returns something other than text/gemini, the application immediately quits. Otherwise, the client lets the user read the response and follow any links it includes, *provided* those links are to the same domain as the initially supplied URL, and at the same path or a "lower" path. Cross-domain links and links which move "up" the path beyond the entry point are either not even displayed, or are displayed in scary red letters and cannot be followed. There is no way for the user to specify a URL of their own. You are basically "bound" to your entry point.
What this provides is a nice little, dare I say it, "containerised" Gemini application experience. You are easily and reliably identified to one service and one service only, and no external content can control what you do with that identity, and your identity can't accidentally leak out to anywhere external.
Meanwhile, your "everyday" Gemini client, which will let you go anywhere you like and follow whichever links you like, does not know the paths to any of your client certificates. Maybe it doesn't even support client certificates at all! If you roam the dangerous wild internet frontiers with this client, and you accidentally follow a malicious link to one of your apps running on localhost, as long as that app requires an approved client certificate to do anything of consequence, no damage can be done.
This approach might sound like a usability nightmare, but it needn't be. You don't have to manually type:
gemtainer gemini://localhost/todo-app --cert ~/foo.pem --key ~/bar.pem
every time you want to check your TODO list from within your "safety client". Instead you set up a shell alias `todo` for the above and forget about the details. You could have a dozen different aliases for a dozen different applications, all at different hosts, all using different certs, all quickly and easily accessible with a simply remembered command. This actually makes the apps even easier to use than bookmarking them individually within a single client: just open a new terminal and use tab completion.
Remember, these little single-purpose clients are small, light things. Heck, if they only support text/gemini then, because that format by design can be rendered accurately line-by-line, these clients can immediately discard each line of content after rendering it, so they never need to have the entire response in memory at once. Write one of these in a compiled language and use a minimalist TLS library like BearSSL or WolfSSL and the result will be *extremely* lean in terms of CPU and memory requirements, basically "free" by modern standards. There is no reason not to treat them like run-of-the-mill command line apps and just spawn as many as you need, close them down as soon as you aren't using them, and start them up again later.
You can setup the same shell aliases on your home computer and your work computer and move between them seamlessly, because all the state is preserved server-side. If you just have the one machine, you can suspend it and resume it later on a different network and it doesn't matter. This is far less hassle than keeping a "real" command line application running in a screen or tmux session on a remote machine that you ssh into from everywhere to achieve a similar effect.
I really think this paradigm is tremendously nifty. It turns a nice, small, DIY-able, inspectable, auditable, trustable application into a totally general and reusable vehicle for secure, lightweight, text-based remote applications that can't really do anything to hurt you. This beats the socks off what the modern web-app experience offers. Yes, you can build tiny little remote command line apps which use HTTP to talk to an API somewhere, but every one of them is its own separate app, wedded to its own specific API and not useful for anything else. The Gemini-based solution sketched above lets you use *one* client program for *all* apps, which is clearly far superior, and does so without any risk of the apps being able to interact with one another. And client certificates are much more elegant than passing around ugly random tokens in URLs or headers.
Some of you may recognise this approach as being similar to the idea of using separate browser "profiles" for different kinds of web surfing. Indeed, this is ultimately the same basic idea, it just turns out to be far more scalable and far more usable in the case of Gemini because the clients and the notion of a "profile" (basically just a client cert) are both *so* much slimmer that you can have and use as many of them as you want.
Some people may still be thinking that this looks like an ugly complication, even if you can wrap these containers up in nice convenient shell scripts (and I'm sure some kind of GUI management solution could be whipped up for people who want one): "What, now I need *two* kinds of client to use Gemini? Gimme a break!". But I think it's a small surface complication which yields large simplifications deeper down. The containerised identities approach using ultra-slim clients creates two clearly separate ecological niches for clients: reading static textual content like gemlogs, technical documentation, fiction, news reports, weather forecasts, etc. on the one hand and making use of individual, certificate-secured dynamic applications on the other. This allows for client authors to target one niche only and therefore write simpler clients.
Clients for reading may simply opt out of dealing with client certificates entirely, letting their developers focus on user interface issues which make sense in that niche, like bookmarking, generating tables of contents, subscribing to feeds, etc. Similarly, none of that stuff makes any sense at all for clients intended to be locked into an individual application, so authors of those clients can forget about those things and focus instead on adding nice tools for managing client certificates, being able to handle text/gemini line by line, and other stuff that makes sense in this context. Maybe they could be configured so that instead of responding to status code 10 by reading a single line of text from the user, terminated with an enter, they launch an editor, so people can write longer posts more carefully.
In short, different tools for different jobs, but with a common underlying protocol and markup language. I already know of one Gemini client developer who started their project thinking they were signing up for one thing (reading static textual content) and then got burned out fielding bug reports and feature requests for people who were really excited about doing something else (applications). That sucks! A philosophy of different tools for different jobs can save people from this. And of course, *users* who are only interested in reading gemlogs can use simpler, faster, safer clients without any confusing certificate features, and can completely ignore what those weirdo application developers are doing elsewhere. Or vice versa. Or they can partake of both Gemini experiences. I really hope that this approach can ease some of the tensions that are building between people with different visions for what Geminispace should be, and make the lives of developers easier at the same time.
I've written all the above in terms of personal apps you might run just for yourself on localhost or your own VPS, because that's the easiest case to describe and think about, and because I'm a private and grumpy recluse by nature who likes to self-host things - but this isn't a fundamental limit of the basic approach. You can do this in a social way, as long as it is on a scale small enough that maintaining a list of authorised certificates is viable. But first, let's digress a bit. Recently Tomasino posted to the mailing list about "streaming" text/gemini content:
Tomasino's "gemini streaming" post
Basically, Tomasino raised the possibility of a Gemini server holding the connection with a client open for a long time and only occasionally feeding the client content, with the client taking full advantage of the possibility of rendering text/gemini line-by-line, as it comes, rather than buffering up a whole file and rendering at once when the connection closes. I think this is a really exciting idea, and I love that it leverages one of text/gemini's core strengths. In the #gemini IRC channel, xq mentioned the possibility of secure remote `tail -f`ing of server logs in this way, which is surely just scratching the surface of what could be done with this.
Now, let's put the streaming idea together with the idea of a "slightly social" version of these certificate-secured "for me, by me" apps that I'm so big on. There is no reason that pubnixes couldn't run Gemini-based chatrooms for their users in this manner! Users ssh into the pubnix and run some command to register their certificate fingerprint on the list of authorised fingerprints, which results in that certificate being tied to their pubnix username. They then spawn two of these Gemini microclients I've described which support the text/gemini streaming idea. They point one at an endpoint which streams the contents of the chatroom, one line of text/gemini per post (perhaps with #, ##, ### headings for hourly timestamps or whatever). They point the other client at a different endpoint which just serves an endless chain of status code 10 responses to get input from the user, allowing people to post to the chatroom. Put those two clients in side-by-side or one-above-the-other windows in a tiling window manager and boom, you've got yourself a little members-only chat system. Heck, you could require client certs only for the second endpoint which people can submit content to, and have the first endpoint which streams posts be accessible to anybody, allowing for "public meetings".
That's pretty darn cool!
So, where are we so far?
Gemini is a totally viable application platform for environments where it's feasible for administrator/moderator figures to manually administer a list of pre-approved certificate fingerprints which are required for requests to have any kind of side effect. This works for pubnixes and hackerspaces, for groups of friends, for families, for social clubs, for non-profit organisations, for anarchist collectives, and so on. The requirement of a pre-approved certificate completely removes the risk of spam. The use of "containerised identities" via separate client instances completely removes the risk of CSRF. The use of "streamed Gemini" can allow for interesting and useful kinds of applications which people may not have thought of in the context of Gemini. I hope all this convinces people who for some reason think that static textual content served from a filesystem is boring and who are restlessly looking for ambitious programming projects that Gemini already has more than enough power to build interesting, useful and secure applications without us having to add anything more.
But I know what you're thinking...
You're thinking "I don't want to write programs for little pre-existing happy groups! I want to write an awesome Gemini app I can share with the entire world, so everybody can see how incredibly smart I am!". Well, maybe you can even do that. In all of the above, there are two basic moving parts. The requirement of a certificate for requests with consequences, and containering that certificate away from anything other than the desired app, solve the problem of CSRF. Having the certificates be pre-approved solves the problem of spam and other nuisance bots. You can remove the pre-approval requirement and let people in with any old certificate if and only if there's nothing they can do in the app which has any negative consequences for anybody else.
What kind of app meets those requirements? Single player games, for starters. Have you seen Bitreich.org's Gopher adventure game?
It's a Gopher-rendition of a classic point-and-click adventure game. If I remember rightly, it uses just your IP address to establish a "session", so it's not terribly robust, or stable in the long term. With client certificates to help the server keep state, the Gemini equivalent of these could get quite rich! People can accumulate an inventory of items, earn experience points, level up, and so on. But if each player is basically in their own little bubble world, there is nothing that a malicious bot can do to spoil things for anybody else, so the lack of pre-approval of the certificate poses no problem.
There is even the possibility of multi-player games of a sort: our beloved Astrobotany game *almost* meets these criteria. There is the message board, though, which could, I suspect, easily be spammed by bots spinning up new certificates for each post, which could post offensive content and spoil the place. Removing the message board would, I think, make it safe - assuming the containerised client approach is used to address the CSRF issue. Otherwise, malicious links could cause you to e.g. rename your plant to something offensive. Removing the message board doesn't entirely remove the social component of the game - the ability to water your neighbour's plants could be left in. If a bot wanders in and starts randomly watering people's plants, well, that's probably not going to upset anybody.
But I know what you were *really* thinking earlier...
None of the above does much to help people who are eager to "combine the two separate worlds" of static documents and interactive server-side apps by adding commenting and likes and guestbooks and other kinds of quick and easy, wide-open social features to otherwise static textual content like gemlogs. Pre-approved client certificate fingerprints just don't work for that kind of thing. But allowing posts without them leaves you open to spam, and when it comes to leaving comments, as opposed to watering simulated space plants, spam can have real negative consequences. Heck, even allowing comments from genuine human users can have real negative consequences.
Now, personally - speaking as an individual user of Gemini and not as leader of the project - I could scarcely care less about these features. I have lived without that stuff for the past three years in Gopherspace, and not only have I not missed it, I (and many others) have reached the conclusion that the space is actually improved by its absence. I thoroughly enjoy responding to posts with posts of my own, and/or by interacting with the author via email, the Fediverse, whatever. I like to eat my Weeties, as Shufei would put it:
Shufei on the virtues of "traditional correspondence"
But I know not everybody feels that way. People who want public comments and likes and things in Geminispace are perfectly welcome to try to come up with ways to make them work smoothly in uncontained, reading-oriented clients without spam and without CSRF attacks or other problems. It's a difficult problem to solve, and Gemini very deliberately provides only simple tools with which to solve it. Good luck! Maybe it can be made to work. Even if it can, I strongly suspect that it will be a tedious uphill battle and many people will quickly decide that writing and using software which has to jump through hoops to solve these problems does not "spark joy" for them and will embrace an alternative experience. I suspect many people who will be driven to Geminispace by disillusionment with the web will be perfectly happy to leave those parts of it behind. But I could very easily be wrong. I'm not prescient, and nobody in Geminispace has to do what I say. For my part, though, I am far more excited about starting to build stuff along the other lines described above.
Fair enough. Here, roughly, is my vision for the future of Geminispace, taking apps into consideration: