💾 Archived View for rawtext.club › ~sloum › geminilist › 000454.gmi captured on 2020-10-31 at 01:35:34. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
An outsider's view of the `gemini://` protocol

Sean Conner sean at conman.org
Fri Feb 28 23:42:01 GMT 2020
- - - - - - - - - - - - - - - - - - - ```

It was thus said that the Great Ciprian Dorin Craciun once stated:
> On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote:
> 
>   Why is a numeric status code so bad?  Yes, the rest of the protocol is
> 
> English centric (MIME types; left-to-right, UTF-8).  It just seems that
> 
> using words (regardless of language) is just complexity for its own sake.
> 
> 
> Why did people use `/etc/hosts` files before DNS was invented?  Why do
> we have `/etc/services`?  Why do we have `O_READ`?  Why do we have
> `chmod +x`?

  True, but parsing the status code character by character is only one wayof doing it.  Another way to to just convert it to a number and do thatcomparison.  When doing HTTP releated things [1], I do have named constantslike HTTP_OKAY and HTTP_NOTFOUND.

> Because numbers are hard to remember, and say nothing to a person that
> doesn't know the spec by heart.  (For example although I do a lot of
> HTTP related work with regard to routing and such, I always don't
> remember which of the 4-5 HTTP redirect codes says "temporary redirect
> but keep the same method" as "opposed to temporary redirect but switch
> to `GET`".)

  But you have that anyway.  I have HTTP_MOVETEMP (hmmm, why isn't itHTTP_REDIRECT_TEMPORARY?  I have to think on that ... ) but even then, Ihave to know that causes clients to switch to GET and if I don't want that,I have to use HTTP_MOVETEMP_M (hmm ... I almost typed HTTP_MOVETMP_M ...something else to think about).  So even with symbolic names there areissues.

  Perhaps it's me, but I don't mind looking up things if I don't recallthem.  I've been programming in C for 30 years now.  I *still* have to lookup the details to strftime() every single time I use it, but I recall thatrand() returns a number between 0 and MAX_RAND (inclusive), yet I usestrftime() way more often than I do rand().  

> 
> 
> As minor issues:
> 
> 
> * why `CRLF`?  it's easier (both in terms of availability of functions
> 
> 
> and efficiency) to split lines by a single character `\n` than by a
> 
> 
> string;
> 
>
> 
>   That was discussed earlier on the list:
> 
>
> 
>         https://lists.orbitalfox.eu/archives/gemini/2019/000116.html
> 
> OK, reading that email the answer seems to be "because other protocols
> have it"...  And even you admit that in your own code you also handle
> just `LF`.
> 
> So then why bother?  Why not simplify the protocol?

  True, but there's the 800-pound gorilla to consider---Windows.  OnWindows, a call like:

	fgets(buffer,sizeof(buffer),stdin);

will read the next line into the buffer, and automatically convert CRLF intojust LF.  That's because Windows uses CRLF to mark end of lines.  It gotthat from MS-DOS, which got that from CP/M, which got that from RT-11, whichgot that from (I suspect) a literal interpretation of the ASCII spec fromthe mid-60s [2].  Also the RFCs written in the 70s describing the early workof the Internet also used a literal interpretation of ASCII.

  So there's a lot of protocols defined for the Internet that use CRLF. Could a switch be made to just LF?  Sure.  It's also about as likely as theInternet byte order being switched from big-endian to little-endian.

> 
>   Okay, we use NaCL.  Now what?  What's needed to secure the communication
> 
> channel?  A key exchange.  Again, rule 1---never implement crypto.
> 
> 
> Given that one has the public key of the server (more on that later),
> one could use the following on client / server sides:
> 
>     https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes

  There's this wonderful talk by John Carmack:

	https://www.youtube.com/watch?v=dSCBCk4xVa0

which talks about ideas, and what might seem a good idea isn't when it comesto an actual implementation.

  The linked page just talks about an API for signing and ecrypting data. It says nothing about negotiating the cipher, key size, or anything remotelylike a protocol.  I would ask that if you feel this strongly about it, *doit!*  Implement a client and server that uses these alternative cryptosystems and then we'll have something to talk about.

  When solderpunk first designed Gemini, I didn't agree with all hisdescisions (especially the status codes), but I was interested.  I alsowanted to play around with TLS since I had finished writing a Lua interfacefor libtls.  So I wrote my own server, with what I felt the status codesshould be.  The thing was---*there was a working implementation* that wasused to argue certain points.  And through that, we got the compromise ofthe current status codes.

  You can argue for an idea.  But an idea *and an implementation* isstronger than just the idea.  I think that's why my Gemini server is sofeatureful---I went ahead and implemented my ideas to help argue for/againstideas, or even to just present *something* to talk about (when I have noopinion one way or the other).

> My take on this:  given a set of clear requirements for the
> `gemini://` protocol (which I've seen there are) one can come up with
> better solutions than TLS, ones that better fit the use-case.

  So do it.  One of the goals for Gemini is ease of implemetation (of boththe server and the client), so this will go a long way to showing how easyit is to implement your ideas.

> (Again, just to be clear, I'm not saying "lets invent our own crypto",
> but instead "let's look at other tested" alternatives.  As a
> side-note, NaCL, on which `libsodium` is based, was created by `Daniel
> J. Bernstein`...)

  Yes, I am aware of that.  I even installed djb's version of NaCL andplayed around with it.  It's nice, but a protocol it is not.

> 
>   One problem with that---incentives.  What's my incentive to make all this
> 
> information more easily machine readable?  On the web, you do that, and what
> 
> happens?  Google comes along, munches on all that sweet machine readable
> 
> data and serves it up directly to users, meaning the user just has to go to
> 
> Google for the information, not your server.  Given those incentives, I have
> 
> no reason to make my data easily machine readable when it means less
> 
> traffic.
> 
> The incentive is a clear one:  for the end-user.  Given that we can
> standardize on such an "index", then we can create better
> "user-agents" that are more useful to our actual users.  (And I'm not
> even touching on the persons that have various disabilities that
> hamper their interaction with computers.)

  Okay, how does that incentivise me?

  It's easy enough to add machine readable annotations to HTML.  Heck, thereare plenty of semantic tags in HTML to help with machine readability.  Yetwhy don't more people hand-code HTML?  Why is Markdown, which, I will add,has no defined way of adding metadata except by including HTML, so popular?

> For example say I'm exposing a API documentation via `gemini://`.  How
> do I handle the "all functions index page"?  Do I create a large
> `text/gemini` file, or a large HTML file?  How does the user interact
> with that?  With search?  Wouldn't he be better served by a searchable
> interface which filters the options as he types, like `dmenu` / `rofi`
> / `fzf` (or the countless other clones) do?  (Currently each
> programming language from Rust to Scheme tries to do something similar
> with JavaScript and the result is horrible...)

  PHP (which I don't like personally) has incredible documentation, but thePHP developers put a lot of work into creating the system to enable that. It's not just "make machine readable documentation" and poof---it's done.

  I would say that's mostly tooling, not an emergent property of HTML.

> Or, to take another approach, why do people use Google to search
> things?  Because our web pages are so poor when it comes to
> structuring information, that most often than not, when I want to find
> something on a site I just Google: `site:example.com the topic i'm
> interested in`.

  Web search engines were not initially designed to find stuff on a givensite, it was to find sites you didn't even knew existed, period.  The webquickly grew from "here's a list of all known web sites" to "there's no wayfor a single person to know what's out there."  Since then Google has grownto be a better index of sites than sites themselves (although I think Googleisn't quite as good as it used to be).

  Creating and maintaining a web site structure isn't easy, and it's all tooeasy to make a mistake that is hard to rectify, and I speak from experiencesince my website [3] is now 22 years old [4], and I have a bunch ofredirects to rectify past organizational mistakes (and redirects wereanother aspect I had to argue to add to Gemini, by the way---theimplemetation helped).

> I'm not advocating for RDF (it was quite convoluted) or semantic web,
> or GraphQL, etc.  I'm just advocating something better than the Gopher
> map.

  Okay, create a format and post it.  That's the best way to get thisstarted.

> 
>   As a user, that's great!  As a web site operator, not so much.
> 
> OK...  Now here is something I don't understand:  aren't you building
> Gemini sites for "users"?  You are building it for "operators"?

  I'm building it primarily for me.  Much like my website (and gophersite[5]) is mostly for my own benefit---if others like it, cool!  But it's notsolely for others.

> Because if the operator is what you optimize for, then why not just
> SSH into the operator's server where he provides you with his
> "favourite" BBS clone.

  Those do exist, but that's not something I want to do.

> 
>   Hey, go ahead and implement that.  I'd like to see that ...
> 
> There is already FreeNet and IPFS that implement content-based
> addressing.  I just wanted something in between that is still
> "location" driven, but is "content identity" aware.

  Again, what's stopping you from just doing it?  Waiting for consensus? Have you read the thread on text formatting?  It's literally half themessages to this list.  I do have to wonder how far along Gemini would be ifI had not just gone ahead and implented a server.

  -spc (In my opinion, working code trumps ideas ... )

[1]	Like my blog engine, written in C:

	https://github.com/spc476/mod_blog

[2]	A close reading of the actual ASCII standard reveals two control	codes, CR and LF.  CR is defined as "returning the carriage head	back to the start of a line" and LF is defined as "advancing to the	next line, without changing the position of the carriage." So a	literal reading of the spec says if you want to advance to the start	of the next line, you send both a CR and LF.  There is no control	code defined by ASCII that means "return the carriage to the start	of the line and advance to the next line." There *is* such a control	character, NEL, but that's defined by the ISO, not ANSI (and it	happens to be either character 133 or <ESC>E).

	Over time, some systems have adpoted one or the other to mean	"return carriage to start of line and advance to next line." Most	8-bit systems I've experienced used CR for that.  Unix picked LF.  A	few (mostly DEC influenced, like CP/M) used both.

	The RFCs written in the 70s (when the Internet was first being	developed) used a more literal imterpretation of the ASCII standard	and required both CRLF to mark the end of the line.

	There is also a similar issue with backspace.  ASCII defines BS as	"move the carriage to the previous character position; if at the	start of the line, don't do anything." DEL is defined as "ignore	this character." Neither one means "move back one space and erase	the character".  BS was intended to be used to create characters not	defined by ASCII, like ä by issuing the sequence

		a<BS>"

	Over time, different systems have implemented the "move back one	space and erase the character" by using either BS or DEL.

[3]	http://www.conman.org/

[4]	At the current domain.  It's a bit older than that, but it was under	a different domain I didn't control, which is why my personal pages	are under:

		http://www.conman.org/people/spc/		and not the top level.  That move was painful enough as it was.

[5]	gopher://gopher.conman.org/