Hello all! [Disclaimer: I'm not an active `gopher://` user, although long ago I did implement my own Gopher server in Erlang and another one in Go; however I do keep an eye on the Gopher mailing list, mostly because I'm nostalgic of a "simpler" web...] Today I've stumbled upon the `gemini://` protocol specification (v0.10) and FAQ, and after reading them both, I thought that perhaps an "outsiders" point of view could be useful. First of all I get it that `gemini://` wants to "sit" in between `gopher://` and `http://`; however from what it seems I think it resembles more HTTP/0.9 (https://www.w3.org/Protocols/HTTP/AsImplemented.html); i.e. it adds only the virtual host and response MIME type capability on-top of HTTP/0.9 or Gopher (plus TLS, but that's transport related). Although I do agree that the HTTP/1.1 semantic (because a large part is nowadays included in HTTP/2 and HTTP/3) has become extremely complex (from chunked encoding, to caching, and to server side push via `Link` headers, etc.), there are some features that I think are useful, especially given some of the stated goals of `gemini://` (like for example slow links, etc.):
It was thus said that the Great Ciprian Dorin Craciun once stated: > Hello all! Hello. [ snip ] > * caching -- given that most content is going to be static, caching > should be quite useful; however it doesn't seem to have been present > as a concern neither in the spec, FAQ or the mailing list archive; > I'm not advocating for the whole HTTP caching headers, but perhaps for > a simple SHA of the body so that clients can just skip downloading it > (although this would imply a more elaborate protocol, having a > "headers" and separate "body" phase); I don't think solderpunk (creator of this protocol) expects Gemini to be a replacement for HTTP---for him, it's more of a way to cut down on the bloat that has become the web. In fact, everything in Gemini could in fact be done with HTTP. With that said, I have made oblique references to adding something (a timestamp) to cut down on unneeded requests. It hasn't been taken up. > * `Content-Length` -- I've seen this mentioned in the FAQ or the > mailing lists; I think the days of "unreliable" protocols has passed; > (i.e. we should better make sure that the intended document was > properly delivered, in its entirety and unaltered;) I did bring this up early in the design, but it was rejected outright. This has since been brought up due to one Gemini site serving very large files. There has been some talk, but nothing has yet come from it. > * status codes -- although both Gemini and HTTP use numeric status > codes, I do believe that these are an artifact of ancient times, and > we could just replace them with proper symbols (perhaps hierarchical > in nature like `redirect:temporary` or `failure:temporary:slow-down`; I disagree. Using "proper symbols" is over all harder to deal with. First, it tends to be English-centric. I mean, we could go with: defectum:tempus:tardius or how about teip:sealadach:n?os-moille First off, the code has to be parsed, and while this is easy in languages like Python or Perl, you run into ... issues, with Rust, C++ or Go (not to mention the complete mess that is C). A number is easy to parse, easy to check and whose meaning can be translated into another language. The Gemini status codes (as well as HTTP and other three-digit status codes) don't even have to be converted into a number---you can easily do a two level check: if (status[0] == '2') /* happy path */ else if (status[0] == '3') /* redirection path */ else if (status[0] == '4') /* tempoary failure */ else if (status[0] == '5') /* permanent failure */ else if (status[0] == '6') { /* authorizatio needed */ if (status[1] == '1') /* client cert required */ else if (status[1] == '3') /* rejected! */ } There was a long, drawn-out discussion between solderpunk and me about status codes. The compromise was the two digit codes currently in use. > * keep-alive -- although in Gopher and Gemini the served documents > seem to be self-contained, and usually connections will be idle while > the user is pondering what to read, in case of crawlers having to > re-establish each time a new connection (especially a TLS one) would > eat a lot of resources and incur significant delays; (not to mention > that repeated TCP connection establishment to the same port or target > IP might be misinterpreted as an attack by various security appliances > or cloud providers;) I would think that would be a plus for this crowd, as it's less likely for Gemini to be quickly exploited. > Now on the transport side, somewhat related to the previous point, I > think TLS transient certificates are an overkill... If one wants to > implement "sessions", one could introduce This is the fault of both myself and solderpunk. When I implemented the first Gemin server (yes, even more solderpunk, who created the protocol) I included support for client certificates as a means of authentication of the client. My intent (besides playing around with that technology) was to have fine grained control over server requests without the user to have a password, and to that end, I have two areas on my Gemini server that require client certificates: gemini://gemini.conman.org/private/ This area will accept *any* client certificate, making it easy for clients to test that they do, in fact, serve up a client certificate. gemini://gemini.conman.org/conman-labs-private/ This area requires certificates signed by my local certificate authority (i.e. *I* give you the cert to use). This was my actual intent. It wasn't my intent to introduce a "cookie" like feature. solderpunk interpreted this as a "cookie" like feature and called it "transient certificates". I still view this feature as "client certificates" myself. I personally think the user of "transient certificates" is confusing. > On a second thought, why TLS? Why not something based on NaCL / > `libsodium` constructs, or even the "Noise Protocol" > (http://www.noiseprotocol.org/)? 1) Never, *NEVER* implement crypto yourself. 2) OpenSSL exists and has support in most (if not all) popular languages. 3) I never even heard of the Noise Protocol. > For example I've tried to build the > Asuka Rust-based client and it pulled ~104 dependencies and took a few > minutes to compile, this doesn't seem too lightweight... So wait? You try to use something other than OpenSSL and it had too many dependencies and took too long to compile? Or is did you mean to say that the existing Rust-based client for OpenSSL had too many dependencies? I think you mean the later, but it could be read as the former. > Why not just re-use PGP to sign / encrypt requests and replies? With > regard to PGP, There are issues with using PGP: https://latacora.micro.blog/2019/07/16/the-pgp-problem.html > given that Gopher communities tend to be quite small, > and composed of mostly "techie" people, this goes hand-in-hand with > the "web-of-trust" that is enabled by PGP and can provide something > that TLS can't at this moment: actual "attribution" of servers to > human beings and trust delegation; for example for a server one could > generate a pair of keys and other people could sign those keys as a > way to denote their "trust" in that server (and thus the hosted > content). Why not take this a step further and allow each document > served to be signed, thus extending this "attribution" not only to the > servers, but to the actual contents. This way a server could provide > a mirror / cached version of a certain document, while still proving > it is the original one. The hardest problem with crypto is key management. If anything, key management with PGP seems more problematic than with OpenSSL and the CA infrastructure (as bad as the CA infrastructure is). > Now getting back to the `gemini://` protocol, another odd thing I > found is the "query" feature. Gemini explicitly supports only `GET` > requests, and the `text/gemini` format doesn't support forms, yet it > still tries to implement a "single input-box form"... Granted it's a > nice hack, but it's not "elegant"... (Again, like in the case of > sessions, it seems more as an afterthought, even though this is the > way Gopher does it...) > > Perhaps a simple "form" solution would be better? Perhaps completely > eliminating for the time these "queries"? Or perhaps introducing a > new form of URL's like for example: > `gemini-query:?url=gemini://server/path&prompt=Please+enter+something` > which can be served either in-line (as was possible in Gopher) and / > or served as a redirect (thus eliminating another status code family). Forms lead to applications. Applications lead to client side scripting. Client side scripting leads to the web ... Of course there's pressure to expand the protocol. solderpunk is trying his hardest to keep that from happening and turning Gemini into another web clone. > Regarding the `text/gemini` format -- and taking into account various > emails in the archive about reflowing, etc -- makes me wonder if it is > actually needed. Why can't CommonMark be adopted as the HTML > equivalent, and a more up-to-date Gopher map variant as an alternative > for menus? There are already countless safe CommonMark parsers > out-there (for example in Rust there is one implemented by Google) and > the format is well understood and accepted by a large community > (especially the static side generators community). It can. RFC-7763 defines the media type text/markdown and RFC-7764 define known variations that can be specified. Could be done right now without any changes to Gemini. Go for it. > Regarding an up-to-date Gopher map alternative, I think this is an > important piece of the Gopher ecosystem that is missing from today's > world: a machine-parsable standard format of indexing documents. I > very fondly remember "directory" sites of yesteryear (like DMOZ or the > countless other clones) that strives to categorize the internet not by > "machine learning" but by human curation. Could you provide an example of what you mean by this? I'm not sure why a map alternative is needed. > * and perhaps add support for content-based addressing (as opposed to > server-based addressing) (i.e. persistent URL's); There already exist such protocols---I'm not sure what a new one based around Gemini would buy. > (Perhaps the closest to this ideal would be a Wikipedia style web...) We already have that---the Wikipedia. -spc
On Fri, Feb 28, 2020 at 4:44 AM Sean Conner <sean at conman.org> wrote: > I disagree. Using "proper symbols" is over all harder to deal with. > First, it tends to be English-centric. I mean, we could go with: > > defectum:tempus:tardius > > or how about > > teip:sealadach:n?os-moille The protocol is already English centric, for example the MIME types (which are IANA standards), it uses lef-to-right writing, it uses UTF-8 which is optimized for Latin-based alphabets, etc.; so if we want to be politically correct, we could use Latin or Esperanto. > First off, the code has to be parsed, and while this is easy in languages > like Python or Perl, you run into ... issues, with Rust, C++ or Go (not to > mention the complete mess that is C). A number is easy to parse, easy to > check and whose meaning can be translated into another language. The Gemini > status codes (as well as HTTP and other three-digit status codes) don't even > have to be converted into a number---you can easily do a two level check: > > if (status[0] == '2') > /* happy path */ > else if (status[0] == '3') > /* redirection path */ > else if (status[0] == '4') > /* tempoary failure */ > else if (status[0] == '5') > /* permanent failure */ > else if (status[0] == '6') > { > /* authorizatio needed */ > if (status[1] == '1') > /* client cert required */ > else if (status[1] == '3') > /* rejected! */ > } OK, although I understand why things are harder in C, you present above only the "easy part". Please take into account the line-reading, splitting into code and meta (and the protocol does say one or multiple whitespaces in between), checking the `CRLF` at the end. Now assuming you've done all that even the code above has a couple of bugs:
It was thus said that the Great Ciprian Dorin Craciun once stated: > On Fri, Feb 28, 2020 at 4:44 AM Sean Conner <sean at conman.org> wrote: > > I disagree. Using "proper symbols" is over all harder to deal with. > > First, it tends to be English-centric. I mean, we could go with: > > > > defectum:tempus:tardius > > > > or how about > > > > teip:sealadach:n?os-moille > > > The protocol is already English centric, for example the MIME types > (which are IANA standards), it uses lef-to-right writing, it uses > UTF-8 which is optimized for Latin-based alphabets, etc.; so if we > want to be politically correct, we could use Latin or Esperanto. Why is a numeric status code so bad? Yes, the rest of the protocol is English centric (MIME types; left-to-right, UTF-8). It just seems that using words (regardless of language) is just complexity for its own sake. > OK, although I understand why things are harder in C, you present > above only the "easy part". Please take into account the > line-reading, splitting into code and meta (and the protocol does say > one or multiple whitespaces in between), checking the `CRLF` at the > end. Now assuming you've done all that even the code above has a > couple of bugs: > * what if the server sends `99`? (it is not covered); > * what if the server sends just `6`? (it is not covered, although > given that perhaps `status` is `\0` terminated it won't be a large > problem, but still it would fall through;) > * what if the server just sends an empty status code? (is it checked > by the parser?) Oh, thanks for the client test suggestions. I'll need to add those to my client torture test (for a client, I would expect it to just reject the response and indicate a server error to the user). > As minor issues: > * why `CRLF`? it's easier (both in terms of availability of functions > and efficiency) to split lines by a single character `\n` than by a > string; That was discussed earlier on the list: gemini://gemi.dev/gemini-mailing-list/messages/000116.gmi > * why allow "one-or-more whitespaces" especially in protocol related > parts? why not mandate a strict syntax? solderpunk will have to answer that one. > > > On a second thought, why TLS? Why not something based on NaCL / > > > `libsodium` constructs, or even the "Noise Protocol" > > > (http://www.noiseprotocol.org/)? > > > > 1) Never, *NEVER* implement crypto yourself. > > I was never proposing to implement crypto ourselves. `libsodium` / > NaCL provides very useful high-level constructs, tailored for specific > use-cases (like for example message encryption and signing), that are > proven to be safe, and exports them with a very simple API that can be > easily understood and used. TLS was choosen because the COMMUNICATIONS LINK is encrypted, not just the payload. All Eve (the evesdropper) can see is what IP address you are connecting to, not what content you are reading, nor (depending upon the TLS version) what virtual server you might be talking to. > > 2) OpenSSL exists and has support in most (if not all) popular > > languages. > > Don't know what to say... I find the OpenSSL documentation terrible, > and it's hard to use... In fact given the complexity of TLS I would > say any wrapper, reimplementation, or alternative is as bad. For > example I played with Go's TLS library and even though it's manageable > it requires lots of attention to get things right. Yes, it is horrible. And people make do. I know for myself I'm using libtls, which is part of LibreSSL (a fork of OpenSSL) which makes using TLS trivial. I was able, with just the header file tls.h and man pages, wrap libtls for Lua [1], which I use for my Gemini server GLV-1.12556 [2]. I just wish libtls was more widely available. > > > Why not just re-use PGP to sign / encrypt requests and replies? With > > > regard to PGP, > > > > There are issues with using PGP: > > > > https://latacora.micro.blog/2019/07/16/the-pgp-problem.html > > There are issues with any technology, TLS included. > > However I would say it's easier to integrate GnuPG (even through > subprocesses) in order to encrypt / decrypt payloads (especially given > how low in count they are for Gemini's ecosystem) than implementing > TLS. Moreover it offers out-of-the-box the whole client side > certificate management, which adding to a TLS-based client would be > much more involved, more on this bellow... As I have mentioned, that only protects the payload, not the communications channel. > > The hardest problem with crypto is key management. If anything, key > > management with PGP seems more problematic than with OpenSSL and the CA > > infrastructure (as bad as the CA infrastructure is). > > One of the `gemini://` specifications explicitly states that the > server certificate authentication model is similar to SSH's first use > accept and cache afterward. However say you'll go with the actual CA > model, now you need to juggle Let's Encrypt (each 3 months) (or add > support for ACME in your server), then juggle PEM files, etc. > Regardless, either way one will have to implement all this certificate > management from scratch. Or self-signed certificates. Okay, we use NaCL. Now what? What's needed to secure the communication channel? A key exchange. Again, rule 1---never implement crypto. > > Forms lead to applications. Applications lead to client side scripting. > > Client side scripting leads to the web ... > > > > Of course there's pressure to expand the protocol. solderpunk is trying > > his hardest to keep that from happening and turning Gemini into another web > > clone. > > > But you are already implementing "applications" on-top of Gemini (and > Gopher) through CGI... Yes ... but there's only two Gemini servers that support CGI, GLV-1.12556 [2] and Jetforce [3] (two out of five Gemini server programs). I implemented CGI in GLV-1.12556 just because I could (and I think to prove a point). I technically don't need CGI support for the server I run since it's just as easy for me to implement custom handlers [4]. > > > Regarding an up-to-date Gopher map alternative, I think this is an > > > important piece of the Gopher ecosystem that is missing from today's > > > world: a machine-parsable standard format of indexing documents. I > > > very fondly remember "directory" sites of yesteryear (like DMOZ or the > > > countless other clones) that strives to categorize the internet not by > > > "machine learning" but by human curation. > > > > Could you provide an example of what you mean by this? I'm not sure why a > > map alternative is needed. > > One problem with today's web is that the actual "web structure" is > embedded in unstructured documents as links. What I liked about > Gopher maps is that it gave a machine-readable, but still > user-friendly, way to map and categorize the "web contents". One problem with that---incentives. What's my incentive to make all this information more easily machine readable? On the web, you do that, and what happens? Google comes along, munches on all that sweet machine readable data and serves it up directly to users, meaning the user just has to go to Google for the information, not your server. Given those incentives, I have no reason to make my data easily machine readable when it means less traffic. I recall the large push for RDF (Resource Description Framework) back around 2004 or so ... embed machine parsable relations and metadata and it would be oh so wonderful. Some people even bothered to to all that work. And for what? It was a pain to maintain, the tooling was poor, and Google would just suck it up and serve it to users directly, no reason for anyone to actually visit your site. As a user, that's great! As a web site operator, not so much. > > > * and perhaps add support for content-based addressing (as opposed to > > > server-based addressing) (i.e. persistent URL's); > > > > There already exist such protocols---I'm not sure what a new one based > > around Gemini would buy. > > I agree that `gemini://` is first and foremost a "transfer" protocol. > However one can include a document's identity as a first class citizen > of the protocol. > > For example say each document is identified by its SHA; then when > replying with a document also send that SHA in form of a permanent URL > like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&loc ation=gemini://second-server/...`; > then a client (that perhaps has bookmarked that particular version of > that document) could send that URL to a server (of his choosing via > configuration, to the first one specified in `location`, etc.) and if > that server has that document just reply with that, else use > `location`, else return 404. Hey, go ahead and implement that. I'd like to see that ... -spc (I got my feet wet in Gemini by implementing the first server ... ) [1] https://github.com/spc476/lua-conmanorg/blob/master/src/tls.c [2] https://github.com/spc476/GLV-1.12556 [3] https://github.com/michael-lazar/jetforce [4] gopher://gopher.conman.org/1Gopher:Ext:GLV-1/handlers/
It was thus said that the Great Ciprian Dorin Craciun once stated: > > For example say each document is identified by its SHA; then when > replying with a document also send that SHA in form of a permanent URL > like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&loc ation=gemini://second-server/...`; > then a client (that perhaps has bookmarked that particular version of > that document) could send that URL to a server (of his choosing via > configuration, to the first one specified in `location`, etc.) and if > that server has that document just reply with that, else use > `location`, else return 404. Actually, shouldn't the sever return "defectum:permanens:non_inveni"? -spc (Forgot to ask that in my previous email ... )
On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote: > Why is a numeric status code so bad? Yes, the rest of the protocol is > English centric (MIME types; left-to-right, UTF-8). It just seems that > using words (regardless of language) is just complexity for its own sake. Why did people use `/etc/hosts` files before DNS was invented? Why do we have `/etc/services`? Why do we have `O_READ`? Why do we have `chmod +x`? Because numbers are hard to remember, and say nothing to a person that doesn't know the spec by heart. (For example although I do a lot of HTTP related work with regard to routing and such, I always don't remember which of the 4-5 HTTP redirect codes says "temporary redirect but keep the same method" as "opposed to temporary redirect but switch to `GET`".) > > As minor issues: > > * why `CRLF`? it's easier (both in terms of availability of functions > > and efficiency) to split lines by a single character `\n` than by a > > string; > > That was discussed earlier on the list: > > gemini://gemi.dev/gemini-mailing-list/messages/000116.gmi OK, reading that email the answer seems to be "because other protocols have it"... And even you admit that in your own code you also handle just `LF`. So then why bother? Why not simplify the protocol? > > > > On a second thought, why TLS? Why not something based on NaCL / > > > > `libsodium` constructs, or even the "Noise Protocol" > > > > (http://www.noiseprotocol.org/)? > > > > > > 1) Never, *NEVER* implement crypto yourself. > > > > I was never proposing to implement crypto ourselves. `libsodium` / > > NaCL provides very useful high-level constructs, tailored for specific > > use-cases (like for example message encryption and signing), that are > > proven to be safe, and exports them with a very simple API that can be > > easily understood and used. > > TLS was choosen because the COMMUNICATIONS LINK is encrypted, not just the > payload. All Eve (the evesdropper) can see is what IP address you are > connecting to, not what content you are reading, nor (depending upon the TLS > version) what virtual server you might be talking to. Although I do agree that encryption at the "transport" level to hide the entire traffic is a good idea, if you take into account that `gemini://` requires one request and one reply per TCP connection (thus TLS connection), there is no actual "communications link here". Basically you are using TLS to encrypt only one payload. Moreover also because there is exactly one request / one reply one can just look at the traffic pattern and deduce what the user is doing just by analyzing the length of the stream (in both ways) and the time the server takes to respond (which says static or dynamically generated). (Granted TLS records are padded, however even so, having the size as multiple of some fixed value, still gives an insight into what was requested.) For example say lives in a country where certain books (perhaps about cryptography) are forbidden; now imagine there is a library out there that serves these books through `gemini://`; now imagine the country wants to see what books are read by its own citizens; all it has to do is record each session and deduce a response size range, then crawl that library and see which books fit into that range. Therefore I would say (I'm no cryptographer) TLS doesn't help at all, einter does PGP, either `libsodium` / NaCL... Another related topic regarding TLS that just struck me: given that `gemini://` supports out-of-the-box virtual hosts, do you couple that with TLS SNI? If not basically TLS is just an "obfuscation" than actual end-to-end encryption. Why I say that: because the spec says one should use SSH-style "do you trust this server" questions and keep that certificate in mind. But how about when the certificate expires, or is revoked? (SSH server public keys never expire...) How does the user know that the certificate was rightfully replaced or he is a victim of an MITM attack? > > > > Why not just re-use PGP to sign / encrypt requests and replies? With > > > > regard to PGP, > > > > > > There are issues with using PGP: > > > > > > https://latacora.micro.blog/2019/07/16/the-pgp-problem.html > > > > There are issues with any technology, TLS included. > > > > However I would say it's easier to integrate GnuPG (even through > > subprocesses) in order to encrypt / decrypt payloads (especially given > > how low in count they are for Gemini's ecosystem) than implementing > > TLS. Moreover it offers out-of-the-box the whole client side > > certificate management, which adding to a TLS-based client would be > > much more involved, more on this bellow... > > As I have mentioned, that only protects the payload, not the > communications channel. But as said, you don't have an actual communications channel because you use TLS for a single request / reply payload pair... :) > > > The hardest problem with crypto is key management. If anything, key > > > management with PGP seems more problematic than with OpenSSL and the CA > > > infrastructure (as bad as the CA infrastructure is). > > > > One of the `gemini://` specifications explicitly states that the > > server certificate authentication model is similar to SSH's first use > > accept and cache afterward. However say you'll go with the actual CA > > model, now you need to juggle Let's Encrypt (each 3 months) (or add > > support for ACME in your server), then juggle PEM files, etc. > > Regardless, either way one will have to implement all this certificate > > management from scratch. > > Or self-signed certificates. > > Okay, we use NaCL. Now what? What's needed to secure the communication > channel? A key exchange. Again, rule 1---never implement crypto. Given that one has the public key of the server (more on that later), one could use the following on client / server sides: https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes ``` The crypto_box_seal() function encrypts a message m of length mlen for a recipient whose public key is pk. It puts the ciphertext whose length is crypto_box_SEALBYTES + mlen into c. The function creates a new key pair for each message, and attaches the public key to the ciphertext. The secret key is overwritten and is not accessible after this function returns. The crypto_box_seal_open() function decrypts the ciphertext c whose length is clen, using the key pair (pk, sk), and puts the decrypted message into m (clen - crypto_box_SEALBYTES bytes). ``` How does one get the public key of the server? One could change the protocol so that the server speaks first and sends its own public key. My take on this: given a set of clear requirements for the `gemini://` protocol (which I've seen there are) one can come up with better solutions than TLS, ones that better fit the use-case. (Again, just to be clear, I'm not saying "lets invent our own crypto", but instead "let's look at other tested" alternatives. As a side-note, NaCL, on which `libsodium` is based, was created by `Daniel J. Bernstein`...) > > > > Regarding an up-to-date Gopher map alternative, I think this is an > > > > important piece of the Gopher ecosystem that is missing from today's > > > > world: a machine-parsable standard format of indexing documents. I > > > > very fondly remember "directory" sites of yesteryear (like DMOZ or the > > > > countless other clones) that strives to categorize the internet not by > > > > "machine learning" but by human curation. > > > > > > Could you provide an example of what you mean by this? I'm not sure why a > > > map alternative is needed. > > > > One problem with today's web is that the actual "web structure" is > > embedded in unstructured documents as links. What I liked about > > Gopher maps is that it gave a machine-readable, but still > > user-friendly, way to map and categorize the "web contents". > > One problem with that---incentives. What's my incentive to make all this > information more easily machine readable? On the web, you do that, and what > happens? Google comes along, munches on all that sweet machine readable > data and serves it up directly to users, meaning the user just has to go to > Google for the information, not your server. Given those incentives, I have > no reason to make my data easily machine readable when it means less > traffic. The incentive is a clear one: for the end-user. Given that we can standardize on such an "index", then we can create better "user-agents" that are more useful to our actual users. (And I'm not even touching on the persons that have various disabilities that hamper their interaction with computers.) For example say I'm exposing a API documentation via `gemini://`. How do I handle the "all functions index page"? Do I create a large `text/gemini` file, or a large HTML file? How does the user interact with that? With search? Wouldn't he be better served by a searchable interface which filters the options as he types, like `dmenu` / `rofi` / `fzf` (or the countless other clones) do? (Currently each programming language from Rust to Scheme tries to do something similar with JavaScript and the result is horrible...) Or, to take another approach, why do people use Google to search things? Because our web pages are so poor when it comes to structuring information, that most often than not, when I want to find something on a site I just Google: `site:example.com the topic i'm interested in`. > I recall the large push for RDF (Resource Description Framework) back > around 2004 or so ... embed machine parsable relations and metadata and it > would be oh so wonderful. Some people even bothered to to all that work. > And for what? It was a pain to maintain, the tooling was poor, and Google > would just suck it up and serve it to users directly, no reason for anyone > to actually visit your site. I'm not advocating for RDF (it was quite convoluted) or semantic web, or GraphQL, etc. I'm just advocating something better than the Gopher map. > As a user, that's great! As a web site operator, not so much. OK... Now here is something I don't understand: aren't you building Gemini sites for "users"? You are building it for "operators"? Because if the operator is what you optimize for, then why not just SSH into the operator's server where he provides you with his "favourite" BBS clone. > > > > * and perhaps add support for content-based addressing (as opposed to > > > > server-based addressing) (i.e. persistent URL's); > > > > > > There already exist such protocols---I'm not sure what a new one based > > > around Gemini would buy. > > > > I agree that `gemini://` is first and foremost a "transfer" protocol. > > However one can include a document's identity as a first class citizen > > of the protocol. > > > > For example say each document is identified by its SHA; then when > > replying with a document also send that SHA in form of a permanent URL > > like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&location=gemin i://second-server/...`; > > then a client (that perhaps has bookmarked that particular version of > > that document) could send that URL to a server (of his choosing via > > configuration, to the first one specified in `location`, etc.) and if > > that server has that document just reply with that, else use > > `location`, else return 404. > > Hey, go ahead and implement that. I'd like to see that ... There is already FreeNet and IPFS that implement content-based addressing. I just wanted something in between that is still "location" driven, but is "content identity" aware. Ciprian.
On Fri, Feb 28, 2020 at 01:16:30AM +0200, Ciprian Dorin Craciun wrote: > Hello all! > > Today I've stumbled upon the `gemini://` protocol specification > (v0.10) and FAQ, and after reading them both, I thought that perhaps > an "outsiders" point of view could be useful. Howdy! Thanks very much for taking the time to provide this outside perspective. I've done my best to take your comments in the constructive fashion you intended them. I'm going to reply relatively briefly to some major points below - please don't take brevity as me being dismissive, it's more to do with my available time! > * caching -- given that most content is going to be static, caching > should be quite useful; however it doesn't seem to have been present > as a concern neither in the spec, FAQ or the mailing list archive; > I'm not advocating for the whole HTTP caching headers, but perhaps for > a simple SHA of the body so that clients can just skip downloading it > (although this would imply a more elaborate protocol, having a > "headers" and separate "body" phase); Not just a more elaborate protocol (although that does count, by itself, against caching, as implementation simplicity is a driving goal of the protocol), but a more extensible protocol. I've fought since day one against anything that acts to divide the response header into parts, equivalent to the multiple header lines of HTTP. Extensibility, for all its benefits, is the eventual death of simplicity. Caching is not a bad thing, but it pays off the most for large content. Leaving caching out actively encourages content producers to make their content as small as possible. I like that. > * compression -- needless to say that `text/*` MIME types compress > very well, thus saving both bandwidth and caching storage; (granted > one can use compression on the TLS side, although I think that one was > dropped due to security issues?); As above, compression is not a bad thing, but for small content the benefit is not proportionate to the implementation effort. Gopherspace is an existence proof that worthwhile textual content can be served uncompressed and still be orders of magnitude smaller than the average website which *does* use compression. You're right about TLS compression having security problems. > * `Content-Length` -- I've seen this mentioned in the FAQ or the > mailing lists; I think the days of "unreliable" protocols has passed; > (i.e. we should better make sure that the intended document was > properly delivered, in its entirety and unaltered;) This is definitely the biggest existing pain point in Gemini so far, I think. I might write about this in another email. I still think for various reasons we can live without this, but I won't swear that if the right solution is proposed I won't consider it. Someone did mention earlier on the list that TLS has a way to explicitly signal a clean shut down of a connection, which would provide "in its entirety". > * status codes -- although both Gemini and HTTP use numeric status > codes, I do believe that these are an artifact of ancient times, and > we could just replace them with proper symbols (perhaps hierarchical > in nature like `redirect:temporary` or `failure:temporary:slow-down`; This seems to me like extra bytes with very little benefit? The status codes are supposed to be machine-readable, so what's wrong with numbers? > * keep-alive -- although in Gopher and Gemini the served documents > seem to be self-contained, and usually connections will be idle while > the user is pondering what to read, in case of crawlers having to > re-establish each time a new connection (especially a TLS one) would > eat a lot of resources and incur significant delays; (not to mention > that repeated TCP connection establishment to the same port or target > IP might be misinterpreted as an attack by various security appliances > or cloud providers;) The overhead of setting up a new TLS connection each time is a shame. TLS 1.3 introduces new functionality to reuse previously negotiated content, which is currently not widely supported in a lot of libraries but I hope that this will become easier in the future and ease some of the pain on this point. > Now on the transport side, somewhat related to the previous point, I > think TLS transient certificates are an overkill... If one wants to > implement "sessions", one could introduce > "client-side-generated-cookies" which are functionally equivalent to > these transient certificates. Instead of creating a transient > certificate, the client generates a unique token and sends that to the > server instead. The server has no more control over the value of that > cookie as it does for the transient certificate. > > Moreover the way sessions are signaled between the server and client, > piggy-backed ontop of status codes, seems rather an afterthought than > part of an orthogonal design. Perhaps these sessions should "moved" > to a higher level (i.e. after transport and before the actual > transaction, just like in the case of OSI stack). This is all true, but once client certificate support was already in the protocol for reasons unrelated to sessions, since it was *possible* to implement sessions using client certificates instead of adding some new part to the protocol, I chose to do it. This is part of the "maximise power to weight" principle that has guided Gemini's design. Once you are paying the weight penalty for some part of the protocol, you should extract as much power from it you can by using it to solve any problem you can. This will lead to somewhat clunky solutions to problems cobbled together from two or three exisitng parts, even when there is an obvious neater solution that could be achieved with one non-existing part, but I'm okay with that. > Also these transient certificates are sold as "privacy enablers" or > "tracking preventing" which is far from the truth. The server (based > on IP, ASN or other information) can easily map various transient > certificates as "possibly" belonging to the same person. Thus just by > allowing these one opens up the possibility of tracking (even if only > for a given session). Moreover, securely generating these transient > certificates does require some CPU power. But servers can do that with raw requests anyway, right? The CPU power point is well taken, believe me. I have considered having the spec (or maybe this belongs in our Best Practices document) encourage implementers to support and to prefer the computationally lighter ciphers in TLS (e.g. the ChaCha stream cipher). > On a second thought, why TLS? Why not something based on NaCL / > `libsodium` constructs, or even the "Noise Protocol" > (http://www.noiseprotocol.org/)? Mostly because TLS library support is much more wide spread than anything else. > For example I've tried to build the > Asuka Rust-based client and it pulled ~104 dependencies and took a few > minutes to compile, this doesn't seem too lightweight... A slight off-topic rant: That's not Asuka's fault, it's not TLS's fault and it's not Gemini's fault, that's Rust's fault. Every single Rust program I have ever tried to build has had over 100 dependencies. Every single one has had at least one dependency with a minimum required version (of either the library, or Rust itself) which was released only yesterday. The Rust toolchain and community seem to support and even actively encourage this unsustainable approach to development. It strikes me (as an outsider!) as a total mess. > Why not just re-use PGP to sign / encrypt requests and replies? With > regard to PGP, given that Gopher communities tend to be quite small, > and composed of mostly "techie" people, this goes hand-in-hand with > the "web-of-trust" I would prefer not to do anything like explicitly designing Gemini to cater to a small and tight-knit group of techies. I know it's that now, and maybe that's all it will ever be, but I would like to give it a decent chance of being more. There is an `application/pgp-encrypted` MIME type that Gemini can serve content with, and people can write clients that to handle this, so Gemininaut cypherpunks can do this if they want to! > Now getting back to the `gemini://` protocol, another odd thing I > found is the "query" feature. Gemini explicitly supports only `GET` > requests, and the `text/gemini` format doesn't support forms, yet it > still tries to implement a "single input-box form"... Granted it's a > nice hack, but it's not "elegant"... (Again, like in the case of > sessions, it seems more as an afterthought, even though this is the > way Gopher does it...) > Perhaps a simple "form" solution would be better? Perhaps completely > eliminating for the time these "queries"? Or perhaps introducing a > new form of URL's like for example: > `gemini-query:?url=gemini://server/path&prompt=Please+enter+something` > which can be served either in-line (as was possible in Gopher) and / > or served as a redirect (thus eliminating another status code family). I did, back during the long, drawn-out contemplation of whether to use one, two or three digit status codes, consider having the META content for query status be a string in some kind of small DSL for defining a form, but decided against it. You can simulate the effect using a sequency of "single input forms" tied together with a client certificate session. This is, IMHO, "elegant" in it's own way - a FORTHy kind of elegance where you build complicated things up by combining a small set of sharp primitives in creative ways. > Regarding the `text/gemini` format -- and taking into account various > emails in the archive about reflowing, etc -- makes me wonder if it is > actually needed. Why can't CommonMark be adopted as the HTML > equivalent, and a more up-to-date Gopher map variant as an alternative > for menus? There are already countless safe CommonMark parsers > out-there (for example in Rust there is one implemented by Google) and > the format is well understood and accepted by a large community > (especially the static side generators community). Sorry, I'm still too busy recovering from the trauma of our text/gemini discussion around Markdown to respond to this now. :) > All in all I find the `gemini://` project quite interesting, and I'll > keep an close eye on it. Please do! And please continue to share your thoughts with us here. I hope it doesn't seem to much like I've not taken some of your points seriously enough and have just stubbornly stuck to previous decisions. I really do see challenging questions regarding our design decisions as valuable things, and tried to consider your questions seriously - and I'll continue to do so in coming days. Cheers, Solderpunk
> The CPU power point is well taken, believe me. I have considered having > the spec (or maybe this belongs in our Best Practices document) > encourage implementers to support and to prefer the computationally > lighter ciphers in TLS (e.g. the ChaCha stream cipher). This would be awesome. This would be really nice for people like me who dream of one day implementing all the protocols for Gemini from scratch. TLS 1.3's Salsa20 & Poly1305 are much easier to implement than other protocols (yes, yes, "don't write your own crypto," but my goal here is novelty, not security of my specific client). > There is an `application/pgp-encrypted` MIME type that Gemini can serve > content with, and people can write clients that to handle this, so > Gemininaut cypherpunks can do this if they want to! Please no. PGP is a bit of a mess already. It's tough to install/maintain (because it has a daemon), and it's really easy to mess up. I think using something like NaCl could be much more difficult to mess up than automated PGP. --- Thanks again, everyone, for the thoughtful discussion. While I disagree on this topic, I'm very optimistic about and excited by the future of Gemini of a whole. Cheers! Aaron Janse
It was thus said that the Great Ciprian Dorin Craciun once stated: > On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote: > > Why is a numeric status code so bad? Yes, the rest of the protocol is > > English centric (MIME types; left-to-right, UTF-8). It just seems that > > using words (regardless of language) is just complexity for its own sake. > > > Why did people use `/etc/hosts` files before DNS was invented? Why do > we have `/etc/services`? Why do we have `O_READ`? Why do we have > `chmod +x`? True, but parsing the status code character by character is only one way of doing it. Another way to to just convert it to a number and do that comparison. When doing HTTP releated things [1], I do have named constants like HTTP_OKAY and HTTP_NOTFOUND. > Because numbers are hard to remember, and say nothing to a person that > doesn't know the spec by heart. (For example although I do a lot of > HTTP related work with regard to routing and such, I always don't > remember which of the 4-5 HTTP redirect codes says "temporary redirect > but keep the same method" as "opposed to temporary redirect but switch > to `GET`".) But you have that anyway. I have HTTP_MOVETEMP (hmmm, why isn't it HTTP_REDIRECT_TEMPORARY? I have to think on that ... ) but even then, I have to know that causes clients to switch to GET and if I don't want that, I have to use HTTP_MOVETEMP_M (hmm ... I almost typed HTTP_MOVETMP_M ... something else to think about). So even with symbolic names there are issues. Perhaps it's me, but I don't mind looking up things if I don't recall them. I've been programming in C for 30 years now. I *still* have to look up the details to strftime() every single time I use it, but I recall that rand() returns a number between 0 and MAX_RAND (inclusive), yet I use strftime() way more often than I do rand(). > > > As minor issues: > > > * why `CRLF`? it's easier (both in terms of availability of functions > > > and efficiency) to split lines by a single character `\n` than by a > > > string; > > > > That was discussed earlier on the list: > > > > gemini://gemi.dev/gemini-mailing-list/messages/000116.gmi > > OK, reading that email the answer seems to be "because other protocols > have it"... And even you admit that in your own code you also handle > just `LF`. > > So then why bother? Why not simplify the protocol? True, but there's the 800-pound gorilla to consider---Windows. On Windows, a call like: fgets(buffer,sizeof(buffer),stdin); will read the next line into the buffer, and automatically convert CRLF into just LF. That's because Windows uses CRLF to mark end of lines. It got that from MS-DOS, which got that from CP/M, which got that from RT-11, which got that from (I suspect) a literal interpretation of the ASCII spec from the mid-60s [2]. Also the RFCs written in the 70s describing the early work of the Internet also used a literal interpretation of ASCII. So there's a lot of protocols defined for the Internet that use CRLF. Could a switch be made to just LF? Sure. It's also about as likely as the Internet byte order being switched from big-endian to little-endian. > > Okay, we use NaCL. Now what? What's needed to secure the communication > > channel? A key exchange. Again, rule 1---never implement crypto. > > > Given that one has the public key of the server (more on that later), > one could use the following on client / server sides: > > https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes There's this wonderful talk by John Carmack: https://www.youtube.com/watch?v=dSCBCk4xVa0 which talks about ideas, and what might seem a good idea isn't when it comes to an actual implementation. The linked page just talks about an API for signing and ecrypting data. It says nothing about negotiating the cipher, key size, or anything remotely like a protocol. I would ask that if you feel this strongly about it, *do it!* Implement a client and server that uses these alternative crypto systems and then we'll have something to talk about. When solderpunk first designed Gemini, I didn't agree with all his descisions (especially the status codes), but I was interested. I also wanted to play around with TLS since I had finished writing a Lua interface for libtls. So I wrote my own server, with what I felt the status codes should be. The thing was---*there was a working implementation* that was used to argue certain points. And through that, we got the compromise of the current status codes. You can argue for an idea. But an idea *and an implementation* is stronger than just the idea. I think that's why my Gemini server is so featureful---I went ahead and implemented my ideas to help argue for/against ideas, or even to just present *something* to talk about (when I have no opinion one way or the other). > My take on this: given a set of clear requirements for the > `gemini://` protocol (which I've seen there are) one can come up with > better solutions than TLS, ones that better fit the use-case. So do it. One of the goals for Gemini is ease of implemetation (of both the server and the client), so this will go a long way to showing how easy it is to implement your ideas. > (Again, just to be clear, I'm not saying "lets invent our own crypto", > but instead "let's look at other tested" alternatives. As a > side-note, NaCL, on which `libsodium` is based, was created by `Daniel > J. Bernstein`...) Yes, I am aware of that. I even installed djb's version of NaCL and played around with it. It's nice, but a protocol it is not. > > One problem with that---incentives. What's my incentive to make all this > > information more easily machine readable? On the web, you do that, and what > > happens? Google comes along, munches on all that sweet machine readable > > data and serves it up directly to users, meaning the user just has to go to > > Google for the information, not your server. Given those incentives, I have > > no reason to make my data easily machine readable when it means less > > traffic. > > The incentive is a clear one: for the end-user. Given that we can > standardize on such an "index", then we can create better > "user-agents" that are more useful to our actual users. (And I'm not > even touching on the persons that have various disabilities that > hamper their interaction with computers.) Okay, how does that incentivise me? It's easy enough to add machine readable annotations to HTML. Heck, there are plenty of semantic tags in HTML to help with machine readability. Yet why don't more people hand-code HTML? Why is Markdown, which, I will add, has no defined way of adding metadata except by including HTML, so popular? > For example say I'm exposing a API documentation via `gemini://`. How > do I handle the "all functions index page"? Do I create a large > `text/gemini` file, or a large HTML file? How does the user interact > with that? With search? Wouldn't he be better served by a searchable > interface which filters the options as he types, like `dmenu` / `rofi` > / `fzf` (or the countless other clones) do? (Currently each > programming language from Rust to Scheme tries to do something similar > with JavaScript and the result is horrible...) PHP (which I don't like personally) has incredible documentation, but the PHP developers put a lot of work into creating the system to enable that. It's not just "make machine readable documentation" and poof---it's done. I would say that's mostly tooling, not an emergent property of HTML. > Or, to take another approach, why do people use Google to search > things? Because our web pages are so poor when it comes to > structuring information, that most often than not, when I want to find > something on a site I just Google: `site:example.com the topic i'm > interested in`. Web search engines were not initially designed to find stuff on a given site, it was to find sites you didn't even knew existed, period. The web quickly grew from "here's a list of all known web sites" to "there's no way for a single person to know what's out there." Since then Google has grown to be a better index of sites than sites themselves (although I think Google isn't quite as good as it used to be). Creating and maintaining a web site structure isn't easy, and it's all too easy to make a mistake that is hard to rectify, and I speak from experience since my website [3] is now 22 years old [4], and I have a bunch of redirects to rectify past organizational mistakes (and redirects were another aspect I had to argue to add to Gemini, by the way---the implemetation helped). > I'm not advocating for RDF (it was quite convoluted) or semantic web, > or GraphQL, etc. I'm just advocating something better than the Gopher > map. Okay, create a format and post it. That's the best way to get this started. > > As a user, that's great! As a web site operator, not so much. > > OK... Now here is something I don't understand: aren't you building > Gemini sites for "users"? You are building it for "operators"? I'm building it primarily for me. Much like my website (and gophersite [5]) is mostly for my own benefit---if others like it, cool! But it's not solely for others. > Because if the operator is what you optimize for, then why not just > SSH into the operator's server where he provides you with his > "favourite" BBS clone. Those do exist, but that's not something I want to do. > > Hey, go ahead and implement that. I'd like to see that ... > > There is already FreeNet and IPFS that implement content-based > addressing. I just wanted something in between that is still > "location" driven, but is "content identity" aware. Again, what's stopping you from just doing it? Waiting for consensus? Have you read the thread on text formatting? It's literally half the messages to this list. I do have to wonder how far along Gemini would be if I had not just gone ahead and implented a server. -spc (In my opinion, working code trumps ideas ... ) [1] Like my blog engine, written in C: https://github.com/spc476/mod_blog [2] A close reading of the actual ASCII standard reveals two control codes, CR and LF. CR is defined as "returning the carriage head back to the start of a line" and LF is defined as "advancing to the next line, without changing the position of the carriage." So a literal reading of the spec says if you want to advance to the start of the next line, you send both a CR and LF. There is no control code defined by ASCII that means "return the carriage to the start of the line and advance to the next line." There *is* such a control character, NEL, but that's defined by the ISO, not ANSI (and it happens to be either character 133 or <ESC>E). Over time, some systems have adpoted one or the other to mean "return carriage to start of line and advance to next line." Most 8-bit systems I've experienced used CR for that. Unix picked LF. A few (mostly DEC influenced, like CP/M) used both. The RFCs written in the 70s (when the Internet was first being developed) used a more literal imterpretation of the ASCII standard and required both CRLF to mark the end of the line. There is also a similar issue with backspace. ASCII defines BS as "move the carriage to the previous character position; if at the start of the line, don't do anything." DEL is defined as "ignore this character." Neither one means "move back one space and erase the character". BS was intended to be used to create characters not defined by ASCII, like ? by issuing the sequence a<BS>" Over time, different systems have implemented the "move back one space and erase the character" by using either BS or DEL. [3] http://www.conman.org/ [4] At the current domain. It's a bit older than that, but it was under a different domain I didn't control, which is why my personal pages are under: http://www.conman.org/people/spc/ and not the top level. That move was painful enough as it was. [5] gopher://gopher.conman.org/
On Sat, Feb 29, 2020 at 1:42 AM Sean Conner <sean at conman.org> wrote: > Perhaps it's me, but I don't mind looking up things if I don't recall > them. I've been programming in C for 30 years now. I *still* have to look > up the details to strftime() every single time I use it, but I recall that > rand() returns a number between 0 and MAX_RAND (inclusive), yet I use > strftime() way more often than I do rand(). When one is developing code then yes, looking up things in the documentation is OK. However when one is reading code, looking in the documentation breaks your focus. > > OK, reading that email the answer seems to be "because other protocols > > have it"... And even you admit that in your own code you also handle > > just `LF`. [...] > > True, but there's the 800-pound gorilla to consider---Windows. On > Windows, a call like: > [...] > > So there's a lot of protocols defined for the Internet that use CRLF. > Could a switch be made to just LF? Sure. It's also about as likely as the > Internet byte order being switched from big-endian to little-endian. OK, I'll drop the CRLF thing, but I find it odd that the only argument to this is "because systems and protocols designed many years ago did this (i.e. CRLF)", and to that you add "but anyway, all these systems just ignore all that and behave internally like it wasn't so (i.e. convert CRLF into LF)"... As a minor note, I've seen C mentioned a lot of times, but please take into account that many projects aren't developed in C anymore, but instead in Python / Ruby / Go / Rust / other "newer" languages, that are much easier to develop in than C. Case in point, out of the 3 clients for Gemini, one is in Go, one in Rust and the other in Python... > > > Okay, we use NaCL. Now what? What's needed to secure the communication > > > channel? A key exchange. Again, rule 1---never implement crypto. > > > > > > Given that one has the public key of the server (more on that later), > > one could use the following on client / server sides: > > > > https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes > > The linked page just talks about an API for signing and ecrypting data. > It says nothing about negotiating the cipher, key size, or anything remotely > like a protocol. (I have a hunch that you are not "acquainted" with NaCL / `libsodium`; the short story is this: the designers of NaCL (again including Daniel J. Bernstein) wanted to design and implement a secure, simple to use, high level cryptographic library, that makes all the choices for its users, so that ciphers, key sizes, padding, nonces, etc., aren't required to be handled by the user, and thus no mistakes would be made on this front.) In fact that link does say at the end under the section `Algorithm details` what happens behind the scenes: ~~~~ Sealed boxes leverage the crypto_box construction (X25519, XSalsa20-Poly1305). The format of a sealed box is: ephemeral_pk || box(m, recipient_pk, ephemeral_sk, nonce=blake2b(ephemeral_pk || recipient_pk)) ~~~~ > I would ask that if you feel this strongly about it, *do > it!* Implement a client and server that uses these alternative crypto > systems and then we'll have something to talk about. What is the chance you'll change your mind about TLS? 0.01%? Are you actually considering to compare TLS vs another proposal without bias towards "legacy `gemini://` implementations currently using TLS"? > You can argue for an idea. But an idea *and an implementation* is > stronger than just the idea. I think that's why my Gemini server is so > featureful---I went ahead and implemented my ideas to help argue for/against > ideas, or even to just present *something* to talk about (when I have no > opinion one way or the other). Perhaps I'll throw a proof-of-concept in Python or Go. (Although as said above, I think it won't change anything, as there is already a lot of "investment" in TLS...) > > > One problem with that---incentives. What's my incentive to make all this > > > information more easily machine readable? On the web, you do that, and what > > > happens? Google comes along, munches on all that sweet machine readable > > > data and serves it up directly to users, meaning the user just has to go to > > > Google for the information, not your server. Given those incentives, I have > > > no reason to make my data easily machine readable when it means less > > > traffic. > > > > The incentive is a clear one: for the end-user. Given that we can > > standardize on such an "index", then we can create better > > "user-agents" that are more useful to our actual users. (And I'm not > > even touching on the persons that have various disabilities that > > hamper their interaction with computers.) > > Okay, how does that incentivise me? I don't know what incentives one to publish content; some just want to push their ideas on the internet, others might want to help others through tutorials or documentation, others hope that by sharing they advertise themselves, and so on... However all of the above reasons (perhaps except the first one) do need to care about their users. > It's easy enough to add machine readable annotations to HTML. Heck, there > are plenty of semantic tags in HTML to help with machine readability. Yet > why don't more people hand-code HTML? Why is Markdown, which, I will add, > has no defined way of adding metadata except by including HTML, so popular? I don't know where the HTML micro-formats popped out in this discussion, as I advocated against this approach. :) > > I'm not advocating for RDF (it was quite convoluted) or semantic web, > > or GraphQL, etc. I'm just advocating something better than the Gopher > > map. > > Okay, create a format and post it. That's the best way to get this > started. OK, I'll try to take a stab at that. (Although like in the case of TLS, I think there is already too much "investment" in the current way things are done.) > > > Hey, go ahead and implement that. I'd like to see that ... > > > > There is already FreeNet and IPFS that implement content-based > > addressing. I just wanted something in between that is still > > "location" driven, but is "content identity" aware. > > Again, what's stopping you from just doing it? Waiting for consensus? Yes, a little bit of consensus won't hurt anybody... Else we end-up with TLS transient client certificates that act like cookies and which require about 2 or 3 separate status codes to signal their management... :) Ciprian.
It was thus said that the Great Ciprian Dorin Craciun once stated: > On Sat, Feb 29, 2020 at 1:42 AM Sean Conner <sean at conman.org> wrote: > > True, but there's the 800-pound gorilla to consider---Windows. On > > Windows, a call like: > > [...] > > > > So there's a lot of protocols defined for the Internet that use CRLF. > > Could a switch be made to just LF? Sure. It's also about as likely as the > > Internet byte order being switched from big-endian to little-endian. > > > OK, I'll drop the CRLF thing, but I find it odd that the only argument > to this is "because systems and protocols designed many years ago did > this (i.e. CRLF)", and to that you add "but anyway, all these systems > just ignore all that and behave internally like it wasn't so (i.e. > convert CRLF into LF)"... I have support to check for both CRLF and LF because I do quite a bit of work with existing Internet protocols (which define the use of CRLF) and do extensive testing with Unix (which defines only using LF) and it makes my life easier to support both [1]. Besides, I think you are underestimating the extent of Windows development out there, and I think (I can't prove) that it's easier for a programmer under Unix to add the '\r' than it would be for a Windows programmer to force Windows *not* to add the '\r'. > > The linked page just talks about an API for signing and ecrypting data. > > It says nothing about negotiating the cipher, key size, or anything remotely > > like a protocol. > > (I have a hunch that you are not "acquainted" with NaCL / `libsodium`; No, I'm aware of both NaCL (which as I stated before, I have installed on my home system) and libsodium (which I haven't installed, having NaCL already installed). > the short story is this: the designers of NaCL (again including > Daniel J. Bernstein) wanted to design and implement a secure, simple > to use, high level cryptographic library, that makes all the choices > for its users, so that ciphers, key sizes, padding, nonces, etc., > aren't required to be handled by the user, and thus no mistakes would > be made on this front.) Yes, and I just found the Lua module I wrote for NaCL (*not* libsodium) back in 2013 when I was last playing around with it. > In fact that link does say at the end under the section `Algorithm > details` what happens behind the scenes: > > ~~~~ > Sealed boxes leverage the crypto_box construction (X25519, XSalsa20-Poly1305). > > The format of a sealed box is: > > ephemeral_pk || box(m, recipient_pk, ephemeral_sk, > nonce=blake2b(ephemeral_pk || recipient_pk)) > ~~~~ I was going by what I recalled of NaCL, written by the highly esteemed Dr. Daniel J. Bernstein, of having to make *some* choices in what underlying function to use for encryption and being a bit concerned that the entirety of NaCL had to be included in the Lua module due to linking issues [4]. > > I would ask that if you feel this strongly about it, *do > > it!* Implement a client and server that uses these alternative crypto > > systems and then we'll have something to talk about. > > What is the chance you'll change your mind about TLS? 0.01%? Right now? Sounds about right. If you provide some "proof-of-concept" that can be looked at? It goes up. > Are you > actually considering to compare TLS vs another proposal without bias > towards "legacy `gemini://` implementations currently using TLS"? What I'm considering is "Hey! We should implement my great idea! And by 'we' I mean, someone else!" vibe I get when arguments like this pop up [5]. > > You can argue for an idea. But an idea *and an implementation* is > > stronger than just the idea. I think that's why my Gemini server is so > > featureful---I went ahead and implemented my ideas to help argue for/against > > ideas, or even to just present *something* to talk about (when I have no > > opinion one way or the other). > > Perhaps I'll throw a proof-of-concept in Python or Go. (Although as > said above, I think it won't change anything, as there is already a > lot of "investment" in TLS...) So let me show you how much investment it took me to use TLS for my Gemini server: local tls = require "org.conman.nfl.tls" local function main(ios) -- main routine to handle a request local request = ios:read("*l") -- mimics the Lua file IO API -- rest of code end local okay,err = tls.listen(CONF.network.addr,CONF.network.port,main,function(conf) conf:verify_client_optional() conf:insecure_no_verify_cert() return conf:cert_file(CONF.certificate.cert) and conf:key_file (CONF.certificate.key) and conf:protocols("all") end) That's it. [6] Granted, I think I had an easier time of it than some others because of the library I picked (libtls, which makes using TLS very easy). If the other non-TLS options are this easy then you might have a case. As solderpunk said, there are many, many, libraries and modules available for the most popular languages for TLS. And "ease of implementation" was one of the goals of Gemini. If these alternatives to TLS are just as easy to use, then a proof-of-concept should show that, right? And for an indication of how easy it is for me to use TLS, a hypothetical TCP-only version of Gemini would look very similar: local tcp = require "org.conman.nfl.tcp" local function main(ios) local request = ios:read("*l") -- rest of code end local okay,err = tcp.listen(CONF.network.addr,CONF.network.port,main) No other changes (except to remove the code to check for user certificates) would be required. That's how easy it should be. > > It's easy enough to add machine readable annotations to HTML. Heck, there > > are plenty of semantic tags in HTML to help with machine readability. Yet > > why don't more people hand-code HTML? Why is Markdown, which, I will add, > > has no defined way of adding metadata except by including HTML, so popular? > > I don't know where the HTML micro-formats popped out in this > discussion, as I advocated against this approach. :) Machine readable formats, or at least, machine readable bits. > > > I'm not advocating for RDF (it was quite convoluted) or semantic web, > > > or GraphQL, etc. I'm just advocating something better than the Gopher > > > map. > > > > Okay, create a format and post it. That's the best way to get this > > started. > > OK, I'll try to take a stab at that. (Although like in the case of > TLS, I think there is already too much "investment" in the current way > things are done.) Dude, have you *read* the thread about text formatting? Literally half the messages to this list have been about that, and we're *still* talking about it. > > Again, what's stopping you from just doing it? Waiting for consensus? > > Yes, a little bit of consensus won't hurt anybody... Else we end-up > with TLS transient client certificates that act like cookies and which > require about 2 or 3 separate status codes to signal their > management... :) Touch?. -spc [1] Okay, I have code that parses SIP messages [2]. As defined by many (many, many) RFCs, the transport over IP requires handling of CRLF. But test the parser, it's easier to support just LF, since the testing I do is all under Unix [3]. I also have code that deals with email messages, again, which are defined with CRLF, but on Unix, usually end with just LF. [2] At work. At home, I don't have to deal with the horrors of SIP. [3] No Windows at all at home, or at work. [4] I can go over this in detail if you wish, but I'd rather not as it gets rather deep rather quickly. [5] It happens quite regularly on the Lua mailing list I'm on. So much so that I outright ignore several people on that list. [6] Okay, it took a bit to write the Lua module around libtls (from LibreSSL), and some work to adapt it to my socket framework, but now that that is done, other people can leverage that work.
On 2/28/2020 2:04 AM, Ciprian Dorin Craciun wrote: > On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote: >> Why is a numeric status code so bad? Yes, the rest of the protocol is >> English centric (MIME types; left-to-right, UTF-8). It just seems that >> using words (regardless of language) is just complexity for its own sake. > > > Why did people use `/etc/hosts` files before DNS was invented? Why do > we have `/etc/services`? Why do we have `O_READ`? Why do we have > `chmod +x`? > > Because numbers are hard to remember, and say nothing to a person that > doesn't know the spec by heart. (For example although I do a lot of > HTTP related work with regard to routing and such, I always don't > remember which of the 4-5 HTTP redirect codes says "temporary redirect > but keep the same method" as "opposed to temporary redirect but switch > to `GET`".) > Well, section 1.3.2 of the Gemini spec-spec says two digit codes, but single (first digit) is all that is required. So, a 2, a 20, and a 21 are all success and there's no ambituity as to anything occuring at the first digit level, it's just more gravy with the second digit. I do fail to see why what appears to me to be a whole lot of work to implement what you suggest, especially considering that most servers will invariably choose to implement their own custom handlers for status/error codes, much like one does in Apache so the server operator themselves gets to choose what content to deliver as a result of a 404. So there would be added framework for human readable, non-numeric status codes (I would rather read the numerical codes in my logfiles), and then as Gemini matures and stabilizes, devs will build frameworks so the server operators can and will devlop custom pages for the status codes anyway. This seems, at best, somewhat redundant to me (ultimately). A 5 (or 50) might not provide as complete a picture as one would like, yet it's optional to serve the full digit code and still unambiguous with respect of what's going on at the baseline - a permanent falure. A 51 though, perhaps the most common user facing state where errors are encountered, will certainly eventually be accommodated by some clever little remark intended to amuse the user who just asked for something that isn't there. Reinforcing my suggestion that the server operators are going to want the devs to enable them to deliver cute little messages during such fashion faux pas'. That's just kinda what I was pondering while reading the exchange. -- Bradley D. Thornton Manager Network Services http://NorthTech.US TEL: +1.310.421.8268
On 3/1/2020 12:22 AM, Bradley D. Thornton wrote: > > > > Well, section 1.3.2 of the Gemini spec-spec says two digit codes, but > single (first digit) is all that is required. So, a 2, a 20, and a 21 > are all success and there's no ambituity as to anything occuring at the > first digit level, it's just more gravy with the second digit. Errata: There's no '2', the first character is followed by a zero on the most basic implementations. My bad. But we still don't have a Gemini status code analagous to that of the HTTP 418 - and IMNSHO, we should :P > -- Bradley D. Thornton Manager Network Services http://NorthTech.US TEL: +1.310.421.8268
On Sun, Mar 1, 2020 at 10:22 AM Bradley D. Thornton <Bradley at northtech.us> wrote: > On 2/28/2020 2:04 AM, Ciprian Dorin Craciun wrote: > > On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote: > >> Why is a numeric status code so bad? Yes, the rest of the protocol is > >> English centric (MIME types; left-to-right, UTF-8). It just seems that > >> using words (regardless of language) is just complexity for its own sake. > > > > Because numbers are hard to remember, and say nothing to a person that > > doesn't know the spec by heart. (For example although I do a lot of > > HTTP related work with regard to routing and such, I always don't > > remember which of the 4-5 HTTP redirect codes says "temporary redirect > > but keep the same method" as "opposed to temporary redirect but switch > > to `GET`".) > > Well, section 1.3.2 of the Gemini spec-spec says two digit codes, but > single (first digit) is all that is required. So, a 2, a 20, and a 21 > are all success and there's no ambituity as to anything occuring at the > first digit level, it's just more gravy with the second digit. Although I didn't state this before, having two digits, which in fact are interpreted as a two level decision tree (the first digit denoting generic class of conditions, and the second digit denoting more fine-grained reasons), has at least the following problems:
It was thus said that the Great Ciprian Dorin Craciun once stated: > > * either it's not enough, given that we've already used 50% of the > "generic class of conditions" (i.e. first digits 1 through 6); soon > enough, as the protocol progresses and matures, we'll identify new > classes of conditions, and we'll have to start to either introduce a > "miscellaneous" category, or use values from other categories, > breaking thus the clear hierarchy; Before the mailing list, solderpunk and I went back and forth over the status codes. solderpunk was intent on using single digit codes, whereas I was pushing for three digit codes. As I wrote to him back in July of 2019: > With a two digit scehem, you have at most 100 error codes (00 through 99), > with very clearly delineated classes that makes it easy for a client to act > upon just the first digit. > > With your one character scheme (and yes, I'm calling it "one character" > and not "one hexidecimal") the grouping is less clear, and it can *still* be > extended out to a total of 94 codes (if you use other characters). Also, > what should happen when a client receives a code of 'a' through 'f'? Is it > only upper case allowed? Or the lower case version as well? Because in > hexidecimal, 0xF and 0xf are the same value. > > What are you really afraid of? Expansion? Gopher gives you two > results---the content or nothing. A bit brutal, but simple. You can cut > this down to just four cases: > > success > transitory error > permanent error > need authorization > > but that's still a bit brutal. Just because you have 100, or 400, error > codes doesn't mean they all will get used. I'm sure the top three codes > seen in the wild for HTTP are: > > 200 Okay > 304 Not modified > 404 Not found > > with the following codes just enough to show up: > > 302 Move temp > 301 Move perm > 403 Not authorized > 500 Internal error > > and the rest are rounding errors. I can't seem to find any evidence to back > this up, but that's my gut feeling. I think a single character response > code is just too llmiting and yet, ripe for more abuse than with a two-digit > numeric range. Also, the use of three digit status codes goes back a long time. In fact, it was first proposed in RFC-360, dated June 24, 1972! [1] And guess what? It was almost a one-to-one mapping of current HTTP status code. 2xx where okay, 3xx were different, but I could see the mapping, 4xx were client errors and 5xx were server errors. There were also 1xx, but HTTP/1.1 defined 1xx status as well. And if anything, the fact that no new status classifications have come up in 48 years says that your fears of new categories might not be warranted. > * some conditions don't fall particularly well into clear categories; > for example `21 success with end of client certificate session` has to > do with TLS transient certificates management (which is `6x`); Fair enough, but solderpunk would have to fix that one. > in > fact this shouldn't even be a status code, but a "signal", because for > example a redirect or other failure could as well require the end of > client certificate session; Again, fair enough. I'm sure some of this is speculative anyway, since I don't think any servers have actually implemented this feature (I know I haven't). > * another example of "unclear" status codes are `42 CGI error` and `43 > proxy error`, which are part of the `4x temporary failure` group, but > might be in fact (especially in the case of 43, although granted we > have 53) permanent errors; (even `51 not found` can be seen as a > temporary error, because perhaps the resource will exist tomorrow;) Yes, solderpunk changed from client errors/server errors to temporary/permanent errors. I didn't fight it that much since I can see the logic in it. > * and speaking of proxies, we have `43 temporary proxy error` and `53 > proxy request refused`, but we have no other proxy related statuses > like for example `6y` that states `proxy requires authentication`, > etc.; I can see the argument for a "AUTHORIZATION FOR PROXY" error, but by the same token, what type of certificate? (and even there, I think having three different types of certificates is certainly a bit of confusion). This may require some clarification from solderpunk in the mean time. > So, if we really want to keep things simple why not change this into: > > * (we only use one digit to denote success or failure); > * `0` (i.e. like in UNIX) means success, here is your document; > * `1` (i.e. again like in UNIX) means "undefined failure", the client > MUST display the meta field to the user as plain text; (please note > that this "soft"-forbids the client and server to implement any clever > "extensions";) I still like numeric values as they are language agnostic. I mean, what If I get back: Bh? teip ann an cl?r a chur i bhfeidhm Would you even know what language to translate from? Yes, most likely this would be English, but I am ornery enough to follow the letter of the law if not the spirit. > * `2` not found / gone; (i.e. the server is working fine, but what > you are searching for does not exist at the moment; perhaps it > existed in the past, perhaps later it will exist;) There is a distinction between "gone" and "not found". "Gone" means "it was once here, but has since been removed, please stop referencing this resource" (i.e. "remove it from your bookmarks file"), while "not found" means just that---it's not here. I mentioned to solderpunk that I wish gopher had a "gone" message (along with redirect, which I'll get to below), since there is a good reason to mark something as "gone" and not just "not found". > * `3` redirect; neither temporary nor permanent; (because in fact > there isn't a clear definition and usage of temporary vs permanent;) I think there is: * permanent---this resource has permanently moved, and any future reference should use the new location (i.e. update your index or bookmark file!) * temporary---this reference is still a valid reference, but the acutual content is, for whatever reason, located there. A valid reason for a temorary redirect might be to redirect users to the most current resource available, say, a specification. A base link like: gemini://gemini.example.com/foobar-spec could in fact do a temporary redirect to gemini://gemini.example.com/foobar-spec.1.3.2 One can always link directly to a specific version, but the current will
On Mon, Mar 2, 2020 at 3:39 AM Sean Conner <sean at conman.org> wrote: > > So, if we really want to keep things simple why not change this into: > > > > * (we only use one digit to denote success or failure); > > * `0` (i.e. like in UNIX) means success, here is your document; > > * `1` (i.e. again like in UNIX) means "undefined failure", the client > > MUST display the meta field to the user as plain text; (please note > > that this "soft"-forbids the client and server to implement any clever > > "extensions";) > > I still like numeric values as they are language agnostic. I mean, what > If I get back: > > Bh? teip ann an cl?r a chur i bhfeidhm > > Would you even know what language to translate from? If a client would receive such a status it would reply to the user: "server error: invalid protocol". Why? Because I'm not advocating for "any sequence of ASCII characters", but a predefined (thus limited and known) list of tokens that are accepted. More clearly, there are two separate issues:
"Aaron Janse" <aaron at ajanse.me> writes: > Please no. PGP is a bit of a mess already. It's tough to > install/maintain (because it has a daemon), and it's really easy to > mess up. I think using something like NaCl could be much more > difficult to mess up than automated PGP. Yeah, in 2020, PGP is an elaborate foot-gun. The suggestion of Noise protocol is actually more interesting, as it's smaller and more future-proof than TLS. But while there are implementations in several languages, it's not as ubiquitous as TLS. (For instance, I had no trouble writing a Gemini server in Common Lisp, but I'd have to write my own bindings of the C implementation to use Noise protocol.) I'd have to read a lot more about it to know what its advantages are. -- Jason McBrayer | ?Strange is the night where black stars rise, jmcbray at carcosa.net | and strange moons circle through the skies, | but stranger still is lost Carcosa.? | ? Robert W. Chambers,The King in Yellow
---
Previous Thread: WWW indexing concerns (was: Gemini Universal Search)
Next Thread: Is it too late to consider adding a subset of Markdown to the render spec?