💾 Archived View for gemini.hitchhiker-linux.org › gemlog › re_valid_urls.gmi captured on 2024-03-21 at 15:30:12. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Re Valid Urls

2023-01-13

So the idea that a browser should accept a url such as http://example.com//somefile.html actually sits all right with me. To begin with, the url is basically a unix path prefixed by a protocol, as you can interchangeably type all of these:

ls /some/path
ls /some//path
ls /some/./path
# And even this one
ls /some/../some/path

But that's far from the only reason I'm not picky about this sort of thing, although there really are a few other places where you should be picky, and I'll get to that in a minute.

A Gemtext example

Consider the following Gemtext.

> This is a quote.
> It was so long that someone decided
> to split it up into multiple lines

When I was building Eva, the way I interpreted the spec was that, because Gemtext is line based, those are three separate quotes, not one quote split into multiple lines. Then I noticed how often people were using quotes like the above in the wild. I realized two things. First, it was going to be a losing battle, and second, the spec is actually really ambiguous on this particular thing. So while it kind of bugs me, I decided that for Eva consecutive quote lines would render in the same quote bubble.

And then there's SSL

Also when building Eva, I originally used someone else's library to handle making Gemini requests. But when I got things up and running and did some navigating around I discovered that the browser would crash when requesting certain pages. It turns out that the library was using RusTLS, and that library was being incredibly strict about rejecting certificates if they were at all non-confoming. To add insult the crate author made this scenario an unrecoverable error, calling Panic. I consider that really bad practice, particularly in a library, but that's besides the point. The larger issue is that a few of the sites that were causing crashes were what I would consider quite important, such as the most well known search engine in Gemini space (at that time). So I decided, well, let's see if OpenSSL accepts those certificates. And it does. So I switched to using OpenSSL (actually the NativeTLS crate, but since it uses OpenSSL on Unix it's the same backend in practice).

But sometimes things aren't this clear cut

Consider the url http://example.com/somedirectory as a reverse example. In this case somedirectory is, in actuality, a directory on the server, but the client has requested it without the trailing slash. In this particular case the server, if it is well behaved, should redirect the client to the same url but with the trailing slash. Why? Because of relative url's. If the page has a link pointing to `somefile` then without the trailing slash to tell the client that the current path on the server is a directory, the client will request http://example.com/somefile instead of http://example.com/somedirectory/somefile. This is an important distinction.

Anyway, while seeing a path with a double slash in it definitely looks more disturbing than a path without the trailing slash, the actual situation is that one is ambiguous and the other is not. It's like learning English. No real point to this post after all, just had some thoughts after reading on here.

Home

All posts

All content for this site is licensed as CC BY-SA.

Finger

Contact