An outsider's view of the `gemini://` protocol

1. Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)

Hello all!

[Disclaimer:  I'm not an active `gopher://` user, although long ago I
did implement my own Gopher server in Erlang and another one in Go;
however I do keep an eye on the Gopher mailing list, mostly because
I'm nostalgic of a "simpler" web...]

Today I've stumbled upon the `gemini://` protocol specification
(v0.10) and FAQ, and after reading them both, I thought that perhaps
an "outsiders" point of view could be useful.




First of all I get it that `gemini://` wants to "sit" in between
`gopher://` and `http://`;  however from what it seems I think it
resembles more HTTP/0.9
(https://www.w3.org/Protocols/HTTP/AsImplemented.html);  i.e. it adds
only the virtual host and response MIME type capability on-top of
HTTP/0.9 or Gopher (plus TLS, but that's transport related).

Although I do agree that the HTTP/1.1 semantic (because a large part
is nowadays included in HTTP/2 and HTTP/3) has become extremely
complex (from chunked encoding, to caching, and to server side push
via `Link` headers, etc.), there are some features that I think are
useful, especially given some of the stated goals of `gemini://` (like
for example slow links, etc.):


should be quite useful;  however it doesn't seem to have been present
as a concern neither in the spec, FAQ or the mailing list archive;
I'm not advocating for the whole HTTP caching headers, but perhaps for
a simple SHA of the body so that clients can just skip downloading it
(although this would imply a more elaborate protocol, having a
"headers" and separate "body" phase);


very well, thus saving both bandwidth and caching storage;  (granted
one can use compression on the TLS side, although I think that one was
dropped due to security issues?);


mailing lists;  I think the days of "unreliable" protocols has passed;
 (i.e. we should better make sure that the intended document was
properly delivered, in its entirety and unaltered;)


codes, I do believe that these are an artifact of ancient times, and
we could just replace them with proper symbols (perhaps hierarchical
in nature like `redirect:temporary` or `failure:temporary:slow-down`;


seem to be self-contained, and usually connections will be idle while
the user is pondering what to read, in case of crawlers having to
re-establish each time a new connection (especially a TLS one) would
eat a lot of resources and incur significant delays;  (not to mention
that repeated TCP connection establishment to the same port or target
IP might be misinterpreted as an attack by various security appliances
or cloud providers;)




Now on the transport side, somewhat related to the previous point, I
think TLS transient certificates are an overkill...  If one wants to
implement "sessions", one could introduce
"client-side-generated-cookies" which are functionally equivalent to
these transient certificates.  Instead of creating a transient
certificate, the client generates a unique token and sends that to the
server instead.  The server has no more control over the value of that
cookie as it does for the transient certificate.

Moreover the way sessions are signaled between the server and client,
piggy-backed ontop of status codes, seems rather an afterthought than
part of an orthogonal design.  Perhaps these sessions should "moved"
to a higher level (i.e. after transport and before the actual
transaction, just like in the case of OSI stack).

Also these transient certificates are sold as "privacy enablers" or
"tracking preventing" which is far from the truth.  The server (based
on IP, ASN or other information) can easily map various transient
certificates as "possibly" belonging to the same person.  Thus just by
allowing these one opens up the possibility of tracking (even if only
for a given session).  Moreover, securely generating these transient
certificates does require some CPU power.




On a second thought, why TLS?  Why not something based on NaCL /
`libsodium` constructs, or even the "Noise Protocol"
(http://www.noiseprotocol.org/)?  For example I've tried to build the
Asuka Rust-based client and it pulled ~104 dependencies and took a few
minutes to compile, this doesn't seem too lightweight...  Granted a
lot of those dependencies might have come from other direct
dependencies, and in general Rust takes a lot to compile, but it does
give a hint...

Why not just re-use PGP to sign / encrypt requests and replies?  With
regard to PGP, given that Gopher communities tend to be quite small,
and composed of mostly "techie" people, this goes hand-in-hand with
the "web-of-trust" that is enabled by PGP and can provide something
that TLS can't at this moment: actual "attribution" of servers to
human beings and trust delegation;  for example for a server one could
generate a pair of keys and other people could sign those keys as a
way to denote their "trust" in that server (and thus the hosted
content).  Why not take this a step further and allow each document
served to be signed, thus extending this "attribution" not only to the
servers, but to the actual contents.  This way a server could provide
a mirror / cached version of a certain document, while still proving
it is the original one.

In fact with such an PGP approach one would no more authenticate the
server, but authenticate the actual document it receives;  thus the
server becomes a simple "conduit" through which the user downloads the
content, enabling one to proxy or mirror other servers and still keep
intact the cryptographic "proof of origin".




Now getting back to the `gemini://` protocol, another odd thing I
found is the "query" feature.  Gemini explicitly supports only `GET`
requests, and the `text/gemini` format doesn't support forms, yet it
still tries to implement a "single input-box form"...  Granted it's a
nice hack, but it's not "elegant"...  (Again, like in the case of
sessions, it seems more as an afterthought, even though this is the
way Gopher does it...)

Perhaps a simple "form" solution would be better?  Perhaps completely
eliminating for the time these "queries"?  Or perhaps introducing a
new form of URL's like for example:
`gemini-query:?url=gemini://server/path&prompt=Please+enter+something`
which can be served either in-line (as was possible in Gopher) and /
or served as a redirect (thus eliminating another status code family).




Regarding the `text/gemini` format -- and taking into account various
emails in the archive about reflowing, etc -- makes me wonder if it is
actually needed.  Why can't CommonMark be adopted as the HTML
equivalent, and a more up-to-date Gopher map variant as an alternative
for menus?  There are already countless safe CommonMark parsers
out-there (for example in Rust there is one implemented by Google) and
the format is well understood and accepted by a large community
(especially the static side generators community).

Regarding an up-to-date Gopher map alternative, I think this is an
important piece of the Gopher ecosystem that is missing from today's
world:  a machine-parsable standard format of indexing documents.  I
very fondly remember "directory" sites of yesteryear (like DMOZ or the
countless other clones) that strives to categorize the internet not by
"machine learning" but by human curation.




In fact (and here I stop speaking about Gemini as it is right now, but
instead I try to summarize what I believe a proper alternative for the
"web" would be) if one puts together:

server-based addressing) (i.e. persistent URL's);

, we get closer to the initial "spirit" of both the "web" (i.e. the
90's era WWW), namely:

that link to one-another;

perhaps revisions;

hierarchical) structure;

(Perhaps the closest to this ideal would be a Wikipedia style web...)




All in all I find the `gemini://` project quite interesting, and I'll
keep an close eye on it.  I'm also glad to see that the Gopher world
hasn't yet died, but instead spawned a modern alternative.

Also, although all of my above comments are somewhat in a negative
tone, please take them in a constructive manner, and please note that
I do appreciate other aspects of the Gemini proposal (from the
simplification of the protocol and allowing as first class citizen the
proxying of other kinds of URL's, to the fact that the `text/gemini`
mandates that the client is free to wrap the text as one sees fit).

Good work guys, and I hope you'll find this useful,
Ciprian.

Link to individual message.

2. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ciprian Dorin Craciun once stated:
> Hello all!

  Hello.

  [ snip ]

> * caching -- given that most content is going to be static, caching
> should be quite useful;  however it doesn't seem to have been present
> as a concern neither in the spec, FAQ or the mailing list archive;
> I'm not advocating for the whole HTTP caching headers, but perhaps for
> a simple SHA of the body so that clients can just skip downloading it
> (although this would imply a more elaborate protocol, having a
> "headers" and separate "body" phase);

  I don't think solderpunk (creator of this protocol) expects Gemini to be a
replacement for HTTP---for him, it's more of a way to cut down on the bloat
that has become the web.  In fact, everything in Gemini could in fact be
done with HTTP.  With that said, I have made oblique references to adding
something (a timestamp) to cut down on unneeded requests.  It hasn't been
taken up.

> * `Content-Length` -- I've seen this mentioned in the FAQ or the
> mailing lists;  I think the days of "unreliable" protocols has passed;
>  (i.e. we should better make sure that the intended document was
> properly delivered, in its entirety and unaltered;)

  I did bring this up early in the design, but it was rejected outright. 
This has since been brought up due to one Gemini site serving very large
files.  There has been some talk, but nothing has yet come from it.

> * status codes -- although both Gemini and HTTP use numeric status
> codes, I do believe that these are an artifact of ancient times, and
> we could just replace them with proper symbols (perhaps hierarchical
> in nature like `redirect:temporary` or `failure:temporary:slow-down`;

  I disagree.  Using "proper symbols" is over all harder to deal with. 
First, it tends to be English-centric.  I mean, we could go with:

	defectum:tempus:tardius

or how about

	teip:sealadach:n?os-moille

  First off, the code has to be parsed, and while this is easy in languages
like Python or Perl, you run into ... issues, with Rust, C++ or Go (not to
mention the complete mess that is C).  A number is easy to parse, easy to
check and whose meaning can be translated into another language.  The Gemini
status codes (as well as HTTP and other three-digit status codes) don't even
have to be converted into a number---you can easily do a two level check:

	if (status[0] == '2')
		/* happy path */
	else if (status[0] == '3')
		/* redirection path */
	else if (status[0] == '4')
		/* tempoary failure */
	else if (status[0] == '5')
		/* permanent failure */
	else if (status[0] == '6')
	{
		/* authorizatio needed */
		if (status[1] == '1')
			/* client cert required */
		else if (status[1] == '3')
			/* rejected! */
	}

  There was a long, drawn-out discussion between solderpunk and me about
status codes.  The compromise was the two digit codes currently in use.

> * keep-alive -- although in Gopher and Gemini the served documents
> seem to be self-contained, and usually connections will be idle while
> the user is pondering what to read, in case of crawlers having to
> re-establish each time a new connection (especially a TLS one) would
> eat a lot of resources and incur significant delays;  (not to mention
> that repeated TCP connection establishment to the same port or target
> IP might be misinterpreted as an attack by various security appliances
> or cloud providers;)

  I would think that would be a plus for this crowd, as it's less likely for
Gemini to be quickly exploited.

> Now on the transport side, somewhat related to the previous point, I
> think TLS transient certificates are an overkill...  If one wants to
> implement "sessions", one could introduce

  This is the fault of both myself and solderpunk.  When I implemented the
first Gemin server (yes, even more solderpunk, who created the protocol) I
included support for client certificates as a means of authentication of the
client.  My intent (besides playing around with that technology) was to have
fine grained control over server requests without the user to have a
password, and to that end, I have two areas on my Gemini server that require
client certificates:

	gemini://gemini.conman.org/private/

		This area will accept *any* client certificate, making it
		easy for clients to test that they do, in fact, serve up a
		client certificate.

	gemini://gemini.conman.org/conman-labs-private/

		This area requires certificates signed by my local
		certificate authority (i.e. *I* give you the cert to use). 
		This was my actual intent.

It wasn't my intent to introduce a "cookie" like feature. solderpunk
interpreted this as a "cookie" like feature and called it "transient
certificates".  I still view this feature as "client certificates" myself. 
I personally think the user of "transient certificates" is confusing.

> On a second thought, why TLS?  Why not something based on NaCL /
> `libsodium` constructs, or even the "Noise Protocol"
> (http://www.noiseprotocol.org/)? 

	1) Never, *NEVER* implement crypto yourself.

	2) OpenSSL exists and has support in most (if not all) popular
	languages.

	3) I never even heard of the Noise Protocol.

> For example I've tried to build the
> Asuka Rust-based client and it pulled ~104 dependencies and took a few
> minutes to compile, this doesn't seem too lightweight...  

  So wait?  You try to use something other than OpenSSL and it had too many
dependencies and took too long to compile?  Or is did you mean to say that
the existing Rust-based client for OpenSSL had too many dependencies?  I
think you mean the later, but it could be read as the former.

> Why not just re-use PGP to sign / encrypt requests and replies?  With
> regard to PGP, 

  There are issues with using PGP:

	https://latacora.micro.blog/2019/07/16/the-pgp-problem.html

> given that Gopher communities tend to be quite small,
> and composed of mostly "techie" people, this goes hand-in-hand with
> the "web-of-trust" that is enabled by PGP and can provide something
> that TLS can't at this moment: actual "attribution" of servers to
> human beings and trust delegation;  for example for a server one could
> generate a pair of keys and other people could sign those keys as a
> way to denote their "trust" in that server (and thus the hosted
> content).  Why not take this a step further and allow each document
> served to be signed, thus extending this "attribution" not only to the
> servers, but to the actual contents.  This way a server could provide
> a mirror / cached version of a certain document, while still proving
> it is the original one.

  The hardest problem with crypto is key management.  If anything, key
management with PGP seems more problematic than with OpenSSL and the CA
infrastructure (as bad as the CA infrastructure is).

> Now getting back to the `gemini://` protocol, another odd thing I
> found is the "query" feature.  Gemini explicitly supports only `GET`
> requests, and the `text/gemini` format doesn't support forms, yet it
> still tries to implement a "single input-box form"...  Granted it's a
> nice hack, but it's not "elegant"...  (Again, like in the case of
> sessions, it seems more as an afterthought, even though this is the
> way Gopher does it...)
> 
> Perhaps a simple "form" solution would be better?  Perhaps completely
> eliminating for the time these "queries"?  Or perhaps introducing a
> new form of URL's like for example:
> `gemini-query:?url=gemini://server/path&prompt=Please+enter+something`
> which can be served either in-line (as was possible in Gopher) and /
> or served as a redirect (thus eliminating another status code family).

  Forms lead to applications.  Applications lead to client side scripting. 
Client side scripting leads to the web ... 

  Of course there's pressure to expand the protocol.  solderpunk is trying
his hardest to keep that from happening and turning Gemini into another web
clone.

> Regarding the `text/gemini` format -- and taking into account various
> emails in the archive about reflowing, etc -- makes me wonder if it is
> actually needed.  Why can't CommonMark be adopted as the HTML
> equivalent, and a more up-to-date Gopher map variant as an alternative
> for menus?  There are already countless safe CommonMark parsers
> out-there (for example in Rust there is one implemented by Google) and
> the format is well understood and accepted by a large community
> (especially the static side generators community).

  It can.  RFC-7763 defines the media type text/markdown and RFC-7764 define
known variations that can be specified.  Could be done right now without any
changes to Gemini.  Go for it.

> Regarding an up-to-date Gopher map alternative, I think this is an
> important piece of the Gopher ecosystem that is missing from today's
> world:  a machine-parsable standard format of indexing documents.  I
> very fondly remember "directory" sites of yesteryear (like DMOZ or the
> countless other clones) that strives to categorize the internet not by
> "machine learning" but by human curation.

  Could you provide an example of what you mean by this?  I'm not sure why a
map alternative is needed.

> * and perhaps add support for content-based addressing (as opposed to
> server-based addressing) (i.e. persistent URL's);

  There already exist such protocols---I'm not sure what a new one based
around Gemini would buy.

> (Perhaps the closest to this ideal would be a Wikipedia style web...)

  We already have that---the Wikipedia.

  -spc

Link to individual message.

3. Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)

On Fri, Feb 28, 2020 at 4:44 AM Sean Conner <sean at conman.org> wrote:
>   I disagree.  Using "proper symbols" is over all harder to deal with.
> First, it tends to be English-centric.  I mean, we could go with:
>
>         defectum:tempus:tardius
>
> or how about
>
>         teip:sealadach:n?os-moille


The protocol is already English centric, for example the MIME types
(which are IANA standards), it uses lef-to-right writing, it uses
UTF-8 which is optimized for Latin-based alphabets, etc.;  so if we
want to be politically correct, we could use Latin or Esperanto.


>   First off, the code has to be parsed, and while this is easy in languages
> like Python or Perl, you run into ... issues, with Rust, C++ or Go (not to
> mention the complete mess that is C).  A number is easy to parse, easy to
> check and whose meaning can be translated into another language.  The Gemini
> status codes (as well as HTTP and other three-digit status codes) don't even
> have to be converted into a number---you can easily do a two level check:
>
>         if (status[0] == '2')
>                 /* happy path */
>         else if (status[0] == '3')
>                 /* redirection path */
>         else if (status[0] == '4')
>                 /* tempoary failure */
>         else if (status[0] == '5')
>                 /* permanent failure */
>         else if (status[0] == '6')
>         {
>                 /* authorizatio needed */
>                 if (status[1] == '1')
>                         /* client cert required */
>                 else if (status[1] == '3')
>                         /* rejected! */
>         }


OK, although I understand why things are harder in C, you present
above only the "easy part".  Please take into account the
line-reading, splitting into code and meta (and the protocol does say
one or multiple whitespaces in between), checking the `CRLF` at the
end.  Now assuming you've done all that even the code above has a
couple of bugs:

given that perhaps `status` is `\0` terminated it won't be a large
problem, but still it would fall through;)

by the parser?)

So if simplicity is a real concern, then why not introduce something
like `0:success` `1:failure:temporary`.  (I.e. the first character is
either `0` or `1`;  other more advanced clients should parse the
rest.)

Also taking into account that the client still has to handle relative
redirects, I think the status code parsing pales in comparison.


As minor issues:

and efficiency) to split lines by a single character `\n` than by a
string;

parts?  why not mandate a strict syntax?




> > Now on the transport side, somewhat related to the previous point, I
> > think TLS transient certificates are an overkill...  If one wants to
> > implement "sessions", one could introduce
>
>   This is the fault of both myself and solderpunk.  When I implemented the
> first Gemin server (yes, even more solderpunk, who created the protocol) I
> included support for client certificates as a means of authentication of the
> client.  My intent (besides playing around with that technology) was to have
> fine grained control over server requests without the user to have a
> password, and to that end, I have two areas on my Gemini server that require
> client certificates:
>
> [...]
>
> It wasn't my intent to introduce a "cookie" like feature. solderpunk
> interpreted this as a "cookie" like feature and called it "transient
> certificates".  I still view this feature as "client certificates" myself.
> I personally think the user of "transient certificates" is confusing.


I was specifically targeting only the "transient certificates", not
proper "client certificates".

In fact I appreciate very much the usage of client certificates as
means to authenticate known clients.  (This is something I personally
use in production for back-office endpoints.)




> > On a second thought, why TLS?  Why not something based on NaCL /
> > `libsodium` constructs, or even the "Noise Protocol"
> > (http://www.noiseprotocol.org/)?
>
>         1) Never, *NEVER* implement crypto yourself.


I was never proposing to implement crypto ourselves.  `libsodium` /
NaCL provides very useful high-level constructs, tailored for specific
use-cases (like for example message encryption and signing), that are
proven to be safe, and exports them with a very simple API that can be
easily understood and used.


>         3) I never even heard of the Noise Protocol.


The "Noise Protocol" is currently used by WireGuard, WhatsApp and
possibly other applications that target network-based communications.
Although it is more complex than NaCL.

(It was just an example of more "current" frameworks.)


>         2) OpenSSL exists and has support in most (if not all) popular
>         languages.


Don't know what to say...  I find the OpenSSL documentation terrible,
and it's hard to use...  In fact given the complexity of TLS I would
say any wrapper, reimplementation, or alternative is as bad.  For
example I played with Go's TLS library and even though it's manageable
it requires lots of attention to get things right.




> > For example I've tried to build the
> > Asuka Rust-based client and it pulled ~104 dependencies and took a few
> > minutes to compile, this doesn't seem too lightweight...
>
>   So wait?  You try to use something other than OpenSSL and it had too many
> dependencies and took too long to compile?  Or is did you mean to say that
> the existing Rust-based client for OpenSSL had too many dependencies?  I
> think you mean the later, but it could be read as the former.


Looking in https://tildegit.org/julienxx/asuka/src/branch/master/Cargo.toml
apparently it is using `native-tls`
(https://crates.io/crates/native-tls) which apparently is using
OpenSSL on Linux; and this `native-tls` library isn't an "odd" one, it
is used by many high profile Rust libraries.  Removing it and checking
the dependency tree it seems it drops the dependencies with about 15
packages.

However as said earlier, perhaps it's Rust's ecosystem fault, most
likely other used libraries might also be to blame for this, but
regardless mandating the use of TLS doesn't simplify things.




> > Why not just re-use PGP to sign / encrypt requests and replies?  With
> > regard to PGP,
>
>   There are issues with using PGP:
>
>         https://latacora.micro.blog/2019/07/16/the-pgp-problem.html


There are issues with any technology, TLS included.

However I would say it's easier to integrate GnuPG (even through
subprocesses) in order to encrypt / decrypt payloads (especially given
how low in count they are for Gemini's ecosystem) than implementing
TLS.  Moreover it offers out-of-the-box the whole client side
certificate management, which adding to a TLS-based client would be
much more involved, more on this bellow...


> > given that Gopher communities tend to be quite small,
> > and composed of mostly "techie" people, this goes hand-in-hand with
> > the "web-of-trust" that is enabled by PGP and can provide something
> > that TLS can't at this moment: actual "attribution" of servers to
> > human beings and trust delegation;  for example for a server one could
> > generate a pair of keys and other people could sign those keys as a
> > way to denote their "trust" in that server (and thus the hosted
> > content).  Why not take this a step further and allow each document
> > served to be signed, thus extending this "attribution" not only to the
> > servers, but to the actual contents.  This way a server could provide
> > a mirror / cached version of a certain document, while still proving
> > it is the original one.
>
>   The hardest problem with crypto is key management.  If anything, key
> management with PGP seems more problematic than with OpenSSL and the CA
> infrastructure (as bad as the CA infrastructure is).


One of the `gemini://` specifications explicitly states that the
server certificate authentication model is similar to SSH's first use
accept and cache afterward.  However say you'll go with the actual CA
model, now you need to juggle Let's Encrypt (each 3 months) (or add
support for ACME in your server), then juggle PEM files, etc.
Regardless, either way one will have to implement all this certificate
management from scratch.

Now on the client certificate side, again a client would have to
implement all that from scratch.

Thus on the contrary, PGP (with perhaps GnuPG) would simplify all this
because it already implements all these features, and has clearly
defined operations over all these entities, including a web-of-trust.

(In fact none of the package managers I know of use S/MIME, i.e. X.509
certificates and CA's, for package signatures, but instead delegate to
GnuPG...)




> > Now getting back to the `gemini://` protocol, another odd thing I
> > found is the "query" feature.  Gemini explicitly supports only `GET`
> > requests, and the `text/gemini` format doesn't support forms, yet it
> > still tries to implement a "single input-box form"...  Granted it's a
> > nice hack, but it's not "elegant"...  (Again, like in the case of
> > sessions, it seems more as an afterthought, even though this is the
> > way Gopher does it...)
> >
> > Perhaps a simple "form" solution would be better?  Perhaps completely
> > eliminating for the time these "queries"?  Or perhaps introducing a
> > new form of URL's like for example:
> > `gemini-query:?url=gemini://server/path&prompt=Please+enter+something`
> > which can be served either in-line (as was possible in Gopher) and /
> > or served as a redirect (thus eliminating another status code family).
>
>   Forms lead to applications.  Applications lead to client side scripting.
> Client side scripting leads to the web ...
>
>   Of course there's pressure to expand the protocol.  solderpunk is trying
> his hardest to keep that from happening and turning Gemini into another web
> clone.


But you are already implementing "applications" on-top of Gemini (and
Gopher) through CGI...  And you already are implementing forms,
although "single-input" ones...  Even with this single input form one
could implement a wizard style "one input at a time" form...
Basically you give the technical possibility for "applications".

I wasn't talking about "client side scripting";  I was just saying
either drop this completely from the protocol, either specify it.  (At
the moment nothing stops a client / server implementer to just reuse
the "question" and "answer" to send back and forth an actual form
specification and answer...)

(Also "client side scripting" can't be eradicated through the
protocol.  One is free to include for example JavaScript in the
client, and the protocol can't say "no".)




> > Regarding the `text/gemini` format -- and taking into account various
> > emails in the archive about reflowing, etc -- makes me wonder if it is
> > actually needed.  Why can't CommonMark be adopted as the HTML
> > equivalent, and a more up-to-date Gopher map variant as an alternative
> > for menus?  There are already countless safe CommonMark parsers
> > out-there (for example in Rust there is one implemented by Google) and
> > the format is well understood and accepted by a large community
> > (especially the static side generators community).
>
>   It can.  RFC-7763 defines the media type text/markdown and RFC-7764 define
> known variations that can be specified.  Could be done right now without any
> changes to Gemini.  Go for it.


I know "I can";  I can even use PDF as the default "document format"
in my own client / server.  I could even use Flash.  :)

However I was speaking as the "default", Gemini endorsed format.




> > Regarding an up-to-date Gopher map alternative, I think this is an
> > important piece of the Gopher ecosystem that is missing from today's
> > world:  a machine-parsable standard format of indexing documents.  I
> > very fondly remember "directory" sites of yesteryear (like DMOZ or the
> > countless other clones) that strives to categorize the internet not by
> > "machine learning" but by human curation.
>
>   Could you provide an example of what you mean by this?  I'm not sure why a
> map alternative is needed.


One problem with today's web is that the actual "web structure" is
embedded in unstructured documents as links.  What I liked about
Gopher maps is that it gave a machine-readable, but still
user-friendly, way to map and categorize the "web contents".

Think about the following example:  I want to look for a cheap telecom
plan;  I open multiple telecom provider web sites, and now for each
one I have to "navigate" their "UX optimized" layouts (expanding
menus, drop-downs, burger buttons, etc.) (some placed on the top, some
on the right, etc.) to find the proper page that lists these plans.
Now imagine how that looks in Gopher:  each site would in fact provide
a Gopher-map that looks the same (at least in terms of layout) and I
can find the information I'm looking for much easier.

To be more "academic":  the current web pages (HTML) serve a couple of
distinct purposes:

links mainly as bibliographic references;  (this would be equivalent
to PDF files;)

etc.);  (this would be equivalent to Flash;)

help the user find what he is searching for;  (this would be
equivalent to site-maps, Gopher maps, RSS, etc.)

Now getting back to Gemini:

aren't specifically designed for this purpose;

How would such an "index" document look like?  A machine readable
(don't have the specific syntax yet, perhaps JSON?, perhaps something
else?) that allows one to:

summary, author, date, some other standard meta-data like RSS/Atom
does);

that one doesn't need multiple transactions to load a small depth,
well structured menu);





> > * and perhaps add support for content-based addressing (as opposed to
> > server-based addressing) (i.e. persistent URL's);
>
>   There already exist such protocols---I'm not sure what a new one based
> around Gemini would buy.


I agree that `gemini://` is first and foremost a "transfer" protocol.
However one can include a document's identity as a first class citizen
of the protocol.

For example say each document is identified by its SHA;  then when
replying with a document also send that SHA in form of a permanent URL
like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&locat
ion=gemini://second-server/...`;
 then a client (that perhaps has bookmarked that particular version of
that document) could send that URL to a server (of his choosing via
configuration, to the first one specified in `location`, etc.) and if
that server has that document just reply with that, else use
`location`, else return 404.

Ciprian.

Link to individual message.

4. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ciprian Dorin Craciun once stated:
> On Fri, Feb 28, 2020 at 4:44 AM Sean Conner <sean at conman.org> wrote:
> >   I disagree.  Using "proper symbols" is over all harder to deal with.
> > First, it tends to be English-centric.  I mean, we could go with:
> >
> >         defectum:tempus:tardius
> >
> > or how about
> >
> >         teip:sealadach:n?os-moille
> 
> 
> The protocol is already English centric, for example the MIME types
> (which are IANA standards), it uses lef-to-right writing, it uses
> UTF-8 which is optimized for Latin-based alphabets, etc.;  so if we
> want to be politically correct, we could use Latin or Esperanto.

  Why is a numeric status code so bad?  Yes, the rest of the protocol is
English centric (MIME types; left-to-right, UTF-8).  It just seems that
using words (regardless of language) is just complexity for its own sake.

> OK, although I understand why things are harder in C, you present
> above only the "easy part".  Please take into account the
> line-reading, splitting into code and meta (and the protocol does say
> one or multiple whitespaces in between), checking the `CRLF` at the
> end.  Now assuming you've done all that even the code above has a
> couple of bugs:
> * what if the server sends `99`?  (it is not covered);
> * what if the server sends just `6`? (it is not covered, although
> given that perhaps `status` is `\0` terminated it won't be a large
> problem, but still it would fall through;)
> * what if the server just sends an empty status code? (is it checked
> by the parser?)

  Oh, thanks for the client test suggestions.  I'll need to add those to my
client torture test (for a client, I would expect it to just reject the
response and indicate a server error to the user).

> As minor issues:
> * why `CRLF`?  it's easier (both in terms of availability of functions
> and efficiency) to split lines by a single character `\n` than by a
> string;

  That was discussed earlier on the list:

	gemini://gemi.dev/gemini-mailing-list/messages/000116.gmi

> * why allow "one-or-more whitespaces" especially in protocol related
> parts?  why not mandate a strict syntax?

  solderpunk will have to answer that one.

> > > On a second thought, why TLS?  Why not something based on NaCL /
> > > `libsodium` constructs, or even the "Noise Protocol"
> > > (http://www.noiseprotocol.org/)?
> >
> >         1) Never, *NEVER* implement crypto yourself.
> 
> I was never proposing to implement crypto ourselves.  `libsodium` /
> NaCL provides very useful high-level constructs, tailored for specific
> use-cases (like for example message encryption and signing), that are
> proven to be safe, and exports them with a very simple API that can be
> easily understood and used.

  TLS was choosen because the COMMUNICATIONS LINK is encrypted, not just the
payload.  All Eve (the evesdropper) can see is what IP address you are
connecting to, not what content you are reading, nor (depending upon the TLS
version) what virtual server you might be talking to.

> >         2) OpenSSL exists and has support in most (if not all) popular
> >         languages.
> 
> Don't know what to say...  I find the OpenSSL documentation terrible,
> and it's hard to use...  In fact given the complexity of TLS I would
> say any wrapper, reimplementation, or alternative is as bad.  For
> example I played with Go's TLS library and even though it's manageable
> it requires lots of attention to get things right.

  Yes, it is horrible.  And people make do.  I know for myself I'm using
libtls, which is part of LibreSSL (a fork of OpenSSL) which makes using TLS
trivial.  I was able, with just the header file tls.h and man pages, wrap
libtls for Lua [1], which I use for my Gemini server GLV-1.12556 [2].  I
just wish libtls was more widely available.

> > > Why not just re-use PGP to sign / encrypt requests and replies?  With
> > > regard to PGP,
> >
> >   There are issues with using PGP:
> >
> >         https://latacora.micro.blog/2019/07/16/the-pgp-problem.html
> 
> There are issues with any technology, TLS included.
> 
> However I would say it's easier to integrate GnuPG (even through
> subprocesses) in order to encrypt / decrypt payloads (especially given
> how low in count they are for Gemini's ecosystem) than implementing
> TLS.  Moreover it offers out-of-the-box the whole client side
> certificate management, which adding to a TLS-based client would be
> much more involved, more on this bellow...

  As I have mentioned, that only protects the payload, not the
communications channel.

> >   The hardest problem with crypto is key management.  If anything, key
> > management with PGP seems more problematic than with OpenSSL and the CA
> > infrastructure (as bad as the CA infrastructure is).
> 
> One of the `gemini://` specifications explicitly states that the
> server certificate authentication model is similar to SSH's first use
> accept and cache afterward.  However say you'll go with the actual CA
> model, now you need to juggle Let's Encrypt (each 3 months) (or add
> support for ACME in your server), then juggle PEM files, etc.
> Regardless, either way one will have to implement all this certificate
> management from scratch.

  Or self-signed certificates.

  Okay, we use NaCL.  Now what?  What's needed to secure the communication
channel?  A key exchange.  Again, rule 1---never implement crypto.

> >   Forms lead to applications.  Applications lead to client side scripting.
> > Client side scripting leads to the web ...
> >
> >   Of course there's pressure to expand the protocol.  solderpunk is trying
> > his hardest to keep that from happening and turning Gemini into another web
> > clone.
> 
> 
> But you are already implementing "applications" on-top of Gemini (and
> Gopher) through CGI...  

  Yes ... but there's only two Gemini servers that support CGI, GLV-1.12556
[2] and Jetforce [3] (two out of five Gemini server programs).  I
implemented CGI in GLV-1.12556 just because I could (and I think to prove a
point).  I technically don't need CGI support for the server I run since
it's just as easy for me to implement custom handlers [4].

> > > Regarding an up-to-date Gopher map alternative, I think this is an
> > > important piece of the Gopher ecosystem that is missing from today's
> > > world:  a machine-parsable standard format of indexing documents.  I
> > > very fondly remember "directory" sites of yesteryear (like DMOZ or the
> > > countless other clones) that strives to categorize the internet not by
> > > "machine learning" but by human curation.
> >
> >   Could you provide an example of what you mean by this?  I'm not sure why a
> > map alternative is needed.
> 
> One problem with today's web is that the actual "web structure" is
> embedded in unstructured documents as links.  What I liked about
> Gopher maps is that it gave a machine-readable, but still
> user-friendly, way to map and categorize the "web contents".

  One problem with that---incentives.  What's my incentive to make all this
information more easily machine readable?  On the web, you do that, and what
happens?  Google comes along, munches on all that sweet machine readable
data and serves it up directly to users, meaning the user just has to go to
Google for the information, not your server.  Given those incentives, I have
no reason to make my data easily machine readable when it means less
traffic.

  I recall the large push for RDF (Resource Description Framework) back
around 2004 or so ... embed machine parsable relations and metadata and it
would be oh so wonderful.  Some people even bothered to to all that work. 
And for what?  It was a pain to maintain, the tooling was poor, and Google
would just suck it up and serve it to users directly, no reason for anyone
to actually visit your site.

  As a user, that's great!  As a web site operator, not so much.

> > > * and perhaps add support for content-based addressing (as opposed to
> > > server-based addressing) (i.e. persistent URL's);
> >
> >   There already exist such protocols---I'm not sure what a new one based
> > around Gemini would buy.
> 
> I agree that `gemini://` is first and foremost a "transfer" protocol.
> However one can include a document's identity as a first class citizen
> of the protocol.
> 
> For example say each document is identified by its SHA;  then when
> replying with a document also send that SHA in form of a permanent URL
> like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&loc
ation=gemini://second-server/...`;
>  then a client (that perhaps has bookmarked that particular version of
> that document) could send that URL to a server (of his choosing via
> configuration, to the first one specified in `location`, etc.) and if
> that server has that document just reply with that, else use
> `location`, else return 404.

  Hey, go ahead and implement that.  I'd like to see that ... 

  -spc (I got my feet wet in Gemini by implementing the first server ... )

[1]	https://github.com/spc476/lua-conmanorg/blob/master/src/tls.c

[2]	https://github.com/spc476/GLV-1.12556

[3]	https://github.com/michael-lazar/jetforce

[4]	gopher://gopher.conman.org/1Gopher:Ext:GLV-1/handlers/

Link to individual message.

5. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ciprian Dorin Craciun once stated:
> 
> For example say each document is identified by its SHA;  then when
> replying with a document also send that SHA in form of a permanent URL
> like say `gemini-object:?sha={SHA}&location=gemini://first-server/...&loc
ation=gemini://second-server/...`;
>  then a client (that perhaps has bookmarked that particular version of
> that document) could send that URL to a server (of his choosing via
> configuration, to the first one specified in `location`, etc.) and if
> that server has that document just reply with that, else use
> `location`, else return 404.

  Actually, shouldn't the sever return "defectum:permanens:non_inveni"?

  -spc (Forgot to ask that in my previous email ... )

Link to individual message.

6. Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)

On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote:
>   Why is a numeric status code so bad?  Yes, the rest of the protocol is
> English centric (MIME types; left-to-right, UTF-8).  It just seems that
> using words (regardless of language) is just complexity for its own sake.


Why did people use `/etc/hosts` files before DNS was invented?  Why do
we have `/etc/services`?  Why do we have `O_READ`?  Why do we have
`chmod +x`?

Because numbers are hard to remember, and say nothing to a person that
doesn't know the spec by heart.  (For example although I do a lot of
HTTP related work with regard to routing and such, I always don't
remember which of the 4-5 HTTP redirect codes says "temporary redirect
but keep the same method" as "opposed to temporary redirect but switch
to `GET`".)




> > As minor issues:
> > * why `CRLF`?  it's easier (both in terms of availability of functions
> > and efficiency) to split lines by a single character `\n` than by a
> > string;
>
>   That was discussed earlier on the list:
>
>         gemini://gemi.dev/gemini-mailing-list/messages/000116.gmi


OK, reading that email the answer seems to be "because other protocols
have it"...  And even you admit that in your own code you also handle
just `LF`.

So then why bother?  Why not simplify the protocol?




> > > > On a second thought, why TLS?  Why not something based on NaCL /
> > > > `libsodium` constructs, or even the "Noise Protocol"
> > > > (http://www.noiseprotocol.org/)?
> > >
> > >         1) Never, *NEVER* implement crypto yourself.
> >
> > I was never proposing to implement crypto ourselves.  `libsodium` /
> > NaCL provides very useful high-level constructs, tailored for specific
> > use-cases (like for example message encryption and signing), that are
> > proven to be safe, and exports them with a very simple API that can be
> > easily understood and used.
>
>   TLS was choosen because the COMMUNICATIONS LINK is encrypted, not just the
> payload.  All Eve (the evesdropper) can see is what IP address you are
> connecting to, not what content you are reading, nor (depending upon the TLS
> version) what virtual server you might be talking to.


Although I do agree that encryption at the "transport" level to hide
the entire traffic is a good idea, if you take into account that
`gemini://` requires one request and one reply per TCP connection
(thus TLS connection), there is no actual "communications link here".

Basically you are using TLS to encrypt only one payload.  Moreover
also because there is exactly one request / one reply one can just
look at the traffic pattern and deduce what the user is doing just by
analyzing the length of the stream (in both ways) and the time the
server takes to respond (which says static or dynamically generated).
(Granted TLS records are padded, however even so, having the size as
multiple of some fixed value, still gives an insight into what was
requested.)

For example say lives in a country where certain books (perhaps about
cryptography) are forbidden;  now imagine there is a library out there
that serves these books through `gemini://`;  now imagine the country
wants to see what books are read by its own citizens;  all it has to
do is record each session and deduce a response size range, then crawl
that library and see which books fit into that range.

Therefore I would say (I'm no cryptographer) TLS doesn't help at all,
einter does PGP, either `libsodium` / NaCL...




Another related topic regarding TLS that just struck me:  given that
`gemini://` supports out-of-the-box virtual hosts, do you couple that
with TLS SNI?

If not basically TLS is just an "obfuscation" than actual end-to-end
encryption.  Why I say that:  because the spec says one should use
SSH-style "do you trust this server" questions and keep that
certificate in mind.  But how about when the certificate expires, or
is revoked?  (SSH server public keys never expire...)  How does the
user know that the certificate was rightfully replaced or he is a
victim of an MITM attack?




> > > > Why not just re-use PGP to sign / encrypt requests and replies?  With
> > > > regard to PGP,
> > >
> > >   There are issues with using PGP:
> > >
> > >         https://latacora.micro.blog/2019/07/16/the-pgp-problem.html
> >
> > There are issues with any technology, TLS included.
> >
> > However I would say it's easier to integrate GnuPG (even through
> > subprocesses) in order to encrypt / decrypt payloads (especially given
> > how low in count they are for Gemini's ecosystem) than implementing
> > TLS.  Moreover it offers out-of-the-box the whole client side
> > certificate management, which adding to a TLS-based client would be
> > much more involved, more on this bellow...
>
>   As I have mentioned, that only protects the payload, not the
> communications channel.


But as said, you don't have an actual communications channel because
you use TLS for a single request / reply payload pair...  :)




> > >   The hardest problem with crypto is key management.  If anything, key
> > > management with PGP seems more problematic than with OpenSSL and the CA
> > > infrastructure (as bad as the CA infrastructure is).
> >
> > One of the `gemini://` specifications explicitly states that the
> > server certificate authentication model is similar to SSH's first use
> > accept and cache afterward.  However say you'll go with the actual CA
> > model, now you need to juggle Let's Encrypt (each 3 months) (or add
> > support for ACME in your server), then juggle PEM files, etc.
> > Regardless, either way one will have to implement all this certificate
> > management from scratch.
>
>   Or self-signed certificates.
>
>   Okay, we use NaCL.  Now what?  What's needed to secure the communication
> channel?  A key exchange.  Again, rule 1---never implement crypto.


Given that one has the public key of the server (more on that later),
one could use the following on client / server sides:

    https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes

 ```
The crypto_box_seal() function encrypts a message m of length mlen for
a recipient whose public key is pk. It puts the ciphertext whose
length is crypto_box_SEALBYTES + mlen into c.

The function creates a new key pair for each message, and attaches the
public key to the ciphertext. The secret key is overwritten and is not
accessible after this function returns.

The crypto_box_seal_open() function decrypts the ciphertext c whose
length is clen, using the key pair (pk, sk), and puts the decrypted
message into m (clen - crypto_box_SEALBYTES bytes).
 ```

How does one get the public key of the server?  One could change the
protocol so that the server speaks first and sends its own public key.


My take on this:  given a set of clear requirements for the
`gemini://` protocol (which I've seen there are) one can come up with
better solutions than TLS, ones that better fit the use-case.

(Again, just to be clear, I'm not saying "lets invent our own crypto",
but instead "let's look at other tested" alternatives.  As a
side-note, NaCL, on which `libsodium` is based, was created by `Daniel
J. Bernstein`...)




> > > > Regarding an up-to-date Gopher map alternative, I think this is an
> > > > important piece of the Gopher ecosystem that is missing from today's
> > > > world:  a machine-parsable standard format of indexing documents.  I
> > > > very fondly remember "directory" sites of yesteryear (like DMOZ or the
> > > > countless other clones) that strives to categorize the internet not by
> > > > "machine learning" but by human curation.
> > >
> > >   Could you provide an example of what you mean by this?  I'm not sure why a
> > > map alternative is needed.
> >
> > One problem with today's web is that the actual "web structure" is
> > embedded in unstructured documents as links.  What I liked about
> > Gopher maps is that it gave a machine-readable, but still
> > user-friendly, way to map and categorize the "web contents".
>
>   One problem with that---incentives.  What's my incentive to make all this
> information more easily machine readable?  On the web, you do that, and what
> happens?  Google comes along, munches on all that sweet machine readable
> data and serves it up directly to users, meaning the user just has to go to
> Google for the information, not your server.  Given those incentives, I have
> no reason to make my data easily machine readable when it means less
> traffic.


The incentive is a clear one:  for the end-user.  Given that we can
standardize on such an "index", then we can create better
"user-agents" that are more useful to our actual users.  (And I'm not
even touching on the persons that have various disabilities that
hamper their interaction with computers.)

For example say I'm exposing a API documentation via `gemini://`.  How
do I handle the "all functions index page"?  Do I create a large
`text/gemini` file, or a large HTML file?  How does the user interact
with that?  With search?  Wouldn't he be better served by a searchable
interface which filters the options as he types, like `dmenu` / `rofi`
/ `fzf` (or the countless other clones) do?  (Currently each
programming language from Rust to Scheme tries to do something similar
with JavaScript and the result is horrible...)

Or, to take another approach, why do people use Google to search
things?  Because our web pages are so poor when it comes to
structuring information, that most often than not, when I want to find
something on a site I just Google: `site:example.com the topic i'm
interested in`.




>   I recall the large push for RDF (Resource Description Framework) back
> around 2004 or so ... embed machine parsable relations and metadata and it
> would be oh so wonderful.  Some people even bothered to to all that work.
> And for what?  It was a pain to maintain, the tooling was poor, and Google
> would just suck it up and serve it to users directly, no reason for anyone
> to actually visit your site.

I'm not advocating for RDF (it was quite convoluted) or semantic web,
or GraphQL, etc.  I'm just advocating something better than the Gopher
map.




>   As a user, that's great!  As a web site operator, not so much.


OK...  Now here is something I don't understand:  aren't you building
Gemini sites for "users"?  You are building it for "operators"?

Because if the operator is what you optimize for, then why not just
SSH into the operator's server where he provides you with his
"favourite" BBS clone.




> > > > * and perhaps add support for content-based addressing (as opposed to
> > > > server-based addressing) (i.e. persistent URL's);
> > >
> > >   There already exist such protocols---I'm not sure what a new one based
> > > around Gemini would buy.
> >
> > I agree that `gemini://` is first and foremost a "transfer" protocol.
> > However one can include a document's identity as a first class citizen
> > of the protocol.
> >
> > For example say each document is identified by its SHA;  then when
> > replying with a document also send that SHA in form of a permanent URL
> > like say 
`gemini-object:?sha={SHA}&location=gemini://first-server/...&location=gemin
i://second-server/...`;
> >  then a client (that perhaps has bookmarked that particular version of
> > that document) could send that URL to a server (of his choosing via
> > configuration, to the first one specified in `location`, etc.) and if
> > that server has that document just reply with that, else use
> > `location`, else return 404.
>
>   Hey, go ahead and implement that.  I'd like to see that ...


There is already FreeNet and IPFS that implement content-based
addressing.  I just wanted something in between that is still
"location" driven, but is "content identity" aware.

Ciprian.

Link to individual message.

7. solderpunk (solderpunk (a) SDF.ORG)

On Fri, Feb 28, 2020 at 01:16:30AM +0200, Ciprian Dorin Craciun wrote:
> Hello all!
> 
> Today I've stumbled upon the `gemini://` protocol specification
> (v0.10) and FAQ, and after reading them both, I thought that perhaps
> an "outsiders" point of view could be useful.

Howdy!

Thanks very much for taking the time to provide this outside
perspective.  I've done my best to take your comments in the
constructive fashion you intended them.  I'm going to reply relatively
briefly to some major points below - please don't take brevity as me
being dismissive, it's more to do with my available time!
 
> * caching -- given that most content is going to be static, caching
> should be quite useful;  however it doesn't seem to have been present
> as a concern neither in the spec, FAQ or the mailing list archive;
> I'm not advocating for the whole HTTP caching headers, but perhaps for
> a simple SHA of the body so that clients can just skip downloading it
> (although this would imply a more elaborate protocol, having a
> "headers" and separate "body" phase);

Not just a more elaborate protocol (although that does count, by itself,
against caching, as implementation simplicity is a driving goal of the
protocol), but a more extensible protocol.  I've fought since day one
against anything that acts to divide the response header into parts,
equivalent to the multiple header lines of HTTP.  Extensibility, for all
its benefits, is the eventual death of simplicity.

Caching is not a bad thing, but it pays off the most for large content.
Leaving caching out actively encourages content producers to make their
content as small as possible.  I like that.
 
> * compression -- needless to say that `text/*` MIME types compress
> very well, thus saving both bandwidth and caching storage;  (granted
> one can use compression on the TLS side, although I think that one was
> dropped due to security issues?);

As above, compression is not a bad thing, but for small content the
benefit is not proportionate to the implementation effort.  Gopherspace
is an existence proof that worthwhile textual content can be served
uncompressed and still be orders of magnitude smaller than the average
website which *does* use compression.

You're right about TLS compression having security problems.

> * `Content-Length` -- I've seen this mentioned in the FAQ or the
> mailing lists;  I think the days of "unreliable" protocols has passed;
>  (i.e. we should better make sure that the intended document was
> properly delivered, in its entirety and unaltered;)

This is definitely the biggest existing pain point in Gemini so far, I
think.  I might write about this in another email.  I still think for
various reasons we can live without this, but I won't swear that if the
right solution is proposed I won't consider it.

Someone did mention earlier on the list that TLS has a way to explicitly
signal a clean shut down of a connection, which would provide "in its
entirety".

> * status codes -- although both Gemini and HTTP use numeric status
> codes, I do believe that these are an artifact of ancient times, and
> we could just replace them with proper symbols (perhaps hierarchical
> in nature like `redirect:temporary` or `failure:temporary:slow-down`;

This seems to me like extra bytes with very little benefit?  The status
codes are supposed to be machine-readable, so what's wrong with numbers?

> * keep-alive -- although in Gopher and Gemini the served documents
> seem to be self-contained, and usually connections will be idle while
> the user is pondering what to read, in case of crawlers having to
> re-establish each time a new connection (especially a TLS one) would
> eat a lot of resources and incur significant delays;  (not to mention
> that repeated TCP connection establishment to the same port or target
> IP might be misinterpreted as an attack by various security appliances
> or cloud providers;)

The overhead of setting up a new TLS connection each time is a shame.
TLS 1.3 introduces new functionality to reuse previously negotiated
content, which is currently not widely supported in a lot of libraries
but I hope that this will become easier in the future and ease some of
the pain on this point.
 
> Now on the transport side, somewhat related to the previous point, I
> think TLS transient certificates are an overkill...  If one wants to
> implement "sessions", one could introduce
> "client-side-generated-cookies" which are functionally equivalent to
> these transient certificates.  Instead of creating a transient
> certificate, the client generates a unique token and sends that to the
> server instead.  The server has no more control over the value of that
> cookie as it does for the transient certificate.
>
> Moreover the way sessions are signaled between the server and client,
> piggy-backed ontop of status codes, seems rather an afterthought than
> part of an orthogonal design.  Perhaps these sessions should "moved"
> to a higher level (i.e. after transport and before the actual
> transaction, just like in the case of OSI stack).
 
This is all true, but once client certificate support was already in the
protocol for reasons unrelated to sessions, since it was *possible* to
implement sessions using client certificates instead of adding some new
part to the protocol, I chose to do it.  This is part of the "maximise
power to weight" principle that has guided Gemini's design.  Once you
are paying the weight penalty for some part of the protocol, you should
extract as much power from it you can by using it to solve any problem
you can.  This will lead to somewhat clunky solutions to problems
cobbled together from two or three exisitng parts, even when there is an
obvious neater solution that could be achieved with one non-existing
part, but I'm okay with that.
 
> Also these transient certificates are sold as "privacy enablers" or
> "tracking preventing" which is far from the truth.  The server (based
> on IP, ASN or other information) can easily map various transient
> certificates as "possibly" belonging to the same person.  Thus just by
> allowing these one opens up the possibility of tracking (even if only
> for a given session).  Moreover, securely generating these transient
> certificates does require some CPU power.

But servers can do that with raw requests anyway, right?

The CPU power point is well taken, believe me.  I have considered having
the spec (or maybe this belongs in our Best Practices document)
encourage implementers to support and to prefer the computationally
lighter ciphers in TLS (e.g. the ChaCha stream cipher).
 
> On a second thought, why TLS?  Why not something based on NaCL /
> `libsodium` constructs, or even the "Noise Protocol"
> (http://www.noiseprotocol.org/)?

Mostly because TLS library support is much more wide spread than
anything else.

> For example I've tried to build the
> Asuka Rust-based client and it pulled ~104 dependencies and took a few
> minutes to compile, this doesn't seem too lightweight...

A slight off-topic rant:  That's not Asuka's fault, it's not TLS's fault
and it's not Gemini's fault, that's Rust's fault.  Every single Rust
program I have ever tried to build has had over 100 dependencies.  Every
single one has had at least one dependency with a minimum required
version (of either the library, or Rust itself) which was released only
yesterday.  The Rust toolchain and community seem to support and even
actively encourage this unsustainable approach to development.  It
strikes me (as an outsider!) as a total mess.
 
> Why not just re-use PGP to sign / encrypt requests and replies?  With
> regard to PGP, given that Gopher communities tend to be quite small,
> and composed of mostly "techie" people, this goes hand-in-hand with
> the "web-of-trust"

I would prefer not to do anything like explicitly designing Gemini to
cater to a small and tight-knit group of techies.  I know it's that now,
and maybe that's all it will ever be, but I would like to give it a
decent chance of being more.

There is an `application/pgp-encrypted` MIME type that Gemini can serve
content with, and people can write clients that to handle this, so
Gemininaut cypherpunks can do this if they want to!

> Now getting back to the `gemini://` protocol, another odd thing I
> found is the "query" feature.  Gemini explicitly supports only `GET`
> requests, and the `text/gemini` format doesn't support forms, yet it
> still tries to implement a "single input-box form"...  Granted it's a
> nice hack, but it's not "elegant"...  (Again, like in the case of
> sessions, it seems more as an afterthought, even though this is the
> way Gopher does it...)

> Perhaps a simple "form" solution would be better?  Perhaps completely
> eliminating for the time these "queries"?  Or perhaps introducing a
> new form of URL's like for example:
> `gemini-query:?url=gemini://server/path&prompt=Please+enter+something`
> which can be served either in-line (as was possible in Gopher) and /
> or served as a redirect (thus eliminating another status code family).

I did, back during the long, drawn-out contemplation of whether to use
one, two or three digit status codes, consider having the META content
for query status be a string in some kind of small DSL for defining a
form, but decided against it.  You can simulate the effect using a
sequency of "single input forms" tied together with a client certificate
session.  This is, IMHO, "elegant" in it's own way - a FORTHy kind of
elegance where you build complicated things up by combining a small set
of sharp primitives in creative ways.

> Regarding the `text/gemini` format -- and taking into account various
> emails in the archive about reflowing, etc -- makes me wonder if it is
> actually needed.  Why can't CommonMark be adopted as the HTML
> equivalent, and a more up-to-date Gopher map variant as an alternative
> for menus?  There are already countless safe CommonMark parsers
> out-there (for example in Rust there is one implemented by Google) and
> the format is well understood and accepted by a large community
> (especially the static side generators community).

Sorry, I'm still too busy recovering from the trauma of our text/gemini
discussion around Markdown to respond to this now. :)
 
> All in all I find the `gemini://` project quite interesting, and I'll
> keep an close eye on it.

Please do!  And please continue to share your thoughts with us here.  I
hope it doesn't seem to much like I've not taken some of your points
seriously enough and have just stubbornly stuck to previous decisions.
I really do see challenging questions regarding our design decisions as
valuable things, and tried to consider your questions seriously - and
I'll continue to do so in coming days.

Cheers,
Solderpunk

Link to individual message.

8. Aaron Janse (aaron (a) ajanse.me)

> The CPU power point is well taken, believe me.  I have considered having
> the spec (or maybe this belongs in our Best Practices document)
> encourage implementers to support and to prefer the computationally
> lighter ciphers in TLS (e.g. the ChaCha stream cipher).

This would be awesome. This would be really nice for people like me who
dream of one day implementing all the protocols for Gemini from scratch.
TLS 1.3's Salsa20 & Poly1305 are much easier to implement than other
protocols (yes, yes, "don't write your own crypto," but my goal here
is novelty, not security of my specific client).

> There is an `application/pgp-encrypted` MIME type that Gemini can serve
> content with, and people can write clients that to handle this, so
> Gemininaut cypherpunks can do this if they want to!

Please no. PGP is a bit of a mess already. It's tough to install/maintain
(because it has a daemon), and it's really easy to mess up. I think using
something like NaCl could be much more difficult to mess up than automated
PGP.

---

Thanks again, everyone, for the thoughtful discussion. While I disagree on this
topic, I'm very optimistic about and excited by the future of Gemini of a whole.

Cheers!
Aaron Janse

Link to individual message.

9. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ciprian Dorin Craciun once stated:
> On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote:
> >   Why is a numeric status code so bad?  Yes, the rest of the protocol is
> > English centric (MIME types; left-to-right, UTF-8).  It just seems that
> > using words (regardless of language) is just complexity for its own sake.
> 
> 
> Why did people use `/etc/hosts` files before DNS was invented?  Why do
> we have `/etc/services`?  Why do we have `O_READ`?  Why do we have
> `chmod +x`?

  True, but parsing the status code character by character is only one way
of doing it.  Another way to to just convert it to a number and do that
comparison.  When doing HTTP releated things [1], I do have named constants
like HTTP_OKAY and HTTP_NOTFOUND.

> Because numbers are hard to remember, and say nothing to a person that
> doesn't know the spec by heart.  (For example although I do a lot of
> HTTP related work with regard to routing and such, I always don't
> remember which of the 4-5 HTTP redirect codes says "temporary redirect
> but keep the same method" as "opposed to temporary redirect but switch
> to `GET`".)

  But you have that anyway.  I have HTTP_MOVETEMP (hmmm, why isn't it
HTTP_REDIRECT_TEMPORARY?  I have to think on that ... ) but even then, I
have to know that causes clients to switch to GET and if I don't want that,
I have to use HTTP_MOVETEMP_M (hmm ... I almost typed HTTP_MOVETMP_M ...
something else to think about).  So even with symbolic names there are
issues.

  Perhaps it's me, but I don't mind looking up things if I don't recall
them.  I've been programming in C for 30 years now.  I *still* have to look
up the details to strftime() every single time I use it, but I recall that
rand() returns a number between 0 and MAX_RAND (inclusive), yet I use
strftime() way more often than I do rand().  

> > > As minor issues:
> > > * why `CRLF`?  it's easier (both in terms of availability of functions
> > > and efficiency) to split lines by a single character `\n` than by a
> > > string;
> >
> >   That was discussed earlier on the list:
> >
> >         gemini://gemi.dev/gemini-mailing-list/messages/000116.gmi
> 
> OK, reading that email the answer seems to be "because other protocols
> have it"...  And even you admit that in your own code you also handle
> just `LF`.
> 
> So then why bother?  Why not simplify the protocol?

  True, but there's the 800-pound gorilla to consider---Windows.  On
Windows, a call like:

	fgets(buffer,sizeof(buffer),stdin);

will read the next line into the buffer, and automatically convert CRLF into
just LF.  That's because Windows uses CRLF to mark end of lines.  It got
that from MS-DOS, which got that from CP/M, which got that from RT-11, which
got that from (I suspect) a literal interpretation of the ASCII spec from
the mid-60s [2].  Also the RFCs written in the 70s describing the early work
of the Internet also used a literal interpretation of ASCII.

  So there's a lot of protocols defined for the Internet that use CRLF. 
Could a switch be made to just LF?  Sure.  It's also about as likely as the
Internet byte order being switched from big-endian to little-endian.

> >   Okay, we use NaCL.  Now what?  What's needed to secure the communication
> > channel?  A key exchange.  Again, rule 1---never implement crypto.
> 
> 
> Given that one has the public key of the server (more on that later),
> one could use the following on client / server sides:
> 
>     https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes

  There's this wonderful talk by John Carmack:

	https://www.youtube.com/watch?v=dSCBCk4xVa0

which talks about ideas, and what might seem a good idea isn't when it comes
to an actual implementation.

  The linked page just talks about an API for signing and ecrypting data. 
It says nothing about negotiating the cipher, key size, or anything remotely
like a protocol.  I would ask that if you feel this strongly about it, *do
it!*  Implement a client and server that uses these alternative crypto
systems and then we'll have something to talk about.

  When solderpunk first designed Gemini, I didn't agree with all his
descisions (especially the status codes), but I was interested.  I also
wanted to play around with TLS since I had finished writing a Lua interface
for libtls.  So I wrote my own server, with what I felt the status codes
should be.  The thing was---*there was a working implementation* that was
used to argue certain points.  And through that, we got the compromise of
the current status codes.

  You can argue for an idea.  But an idea *and an implementation* is
stronger than just the idea.  I think that's why my Gemini server is so
featureful---I went ahead and implemented my ideas to help argue for/against
ideas, or even to just present *something* to talk about (when I have no
opinion one way or the other).

> My take on this:  given a set of clear requirements for the
> `gemini://` protocol (which I've seen there are) one can come up with
> better solutions than TLS, ones that better fit the use-case.

  So do it.  One of the goals for Gemini is ease of implemetation (of both
the server and the client), so this will go a long way to showing how easy
it is to implement your ideas.

> (Again, just to be clear, I'm not saying "lets invent our own crypto",
> but instead "let's look at other tested" alternatives.  As a
> side-note, NaCL, on which `libsodium` is based, was created by `Daniel
> J. Bernstein`...)

  Yes, I am aware of that.  I even installed djb's version of NaCL and
played around with it.  It's nice, but a protocol it is not.

> >   One problem with that---incentives.  What's my incentive to make all this
> > information more easily machine readable?  On the web, you do that, and what
> > happens?  Google comes along, munches on all that sweet machine readable
> > data and serves it up directly to users, meaning the user just has to go to
> > Google for the information, not your server.  Given those incentives, I have
> > no reason to make my data easily machine readable when it means less
> > traffic.
> 
> The incentive is a clear one:  for the end-user.  Given that we can
> standardize on such an "index", then we can create better
> "user-agents" that are more useful to our actual users.  (And I'm not
> even touching on the persons that have various disabilities that
> hamper their interaction with computers.)

  Okay, how does that incentivise me?

  It's easy enough to add machine readable annotations to HTML.  Heck, there
are plenty of semantic tags in HTML to help with machine readability.  Yet
why don't more people hand-code HTML?  Why is Markdown, which, I will add,
has no defined way of adding metadata except by including HTML, so popular?

> For example say I'm exposing a API documentation via `gemini://`.  How
> do I handle the "all functions index page"?  Do I create a large
> `text/gemini` file, or a large HTML file?  How does the user interact
> with that?  With search?  Wouldn't he be better served by a searchable
> interface which filters the options as he types, like `dmenu` / `rofi`
> / `fzf` (or the countless other clones) do?  (Currently each
> programming language from Rust to Scheme tries to do something similar
> with JavaScript and the result is horrible...)

  PHP (which I don't like personally) has incredible documentation, but the
PHP developers put a lot of work into creating the system to enable that. 
It's not just "make machine readable documentation" and poof---it's done.

  I would say that's mostly tooling, not an emergent property of HTML.

> Or, to take another approach, why do people use Google to search
> things?  Because our web pages are so poor when it comes to
> structuring information, that most often than not, when I want to find
> something on a site I just Google: `site:example.com the topic i'm
> interested in`.

  Web search engines were not initially designed to find stuff on a given
site, it was to find sites you didn't even knew existed, period.  The web
quickly grew from "here's a list of all known web sites" to "there's no way
for a single person to know what's out there."  Since then Google has grown
to be a better index of sites than sites themselves (although I think Google
isn't quite as good as it used to be).

  Creating and maintaining a web site structure isn't easy, and it's all too
easy to make a mistake that is hard to rectify, and I speak from experience
since my website [3] is now 22 years old [4], and I have a bunch of
redirects to rectify past organizational mistakes (and redirects were
another aspect I had to argue to add to Gemini, by the way---the
implemetation helped).

> I'm not advocating for RDF (it was quite convoluted) or semantic web,
> or GraphQL, etc.  I'm just advocating something better than the Gopher
> map.

  Okay, create a format and post it.  That's the best way to get this
started.

> >   As a user, that's great!  As a web site operator, not so much.
> 
> OK...  Now here is something I don't understand:  aren't you building
> Gemini sites for "users"?  You are building it for "operators"?

  I'm building it primarily for me.  Much like my website (and gophersite
[5]) is mostly for my own benefit---if others like it, cool!  But it's not
solely for others.

> Because if the operator is what you optimize for, then why not just
> SSH into the operator's server where he provides you with his
> "favourite" BBS clone.

  Those do exist, but that's not something I want to do.

> >   Hey, go ahead and implement that.  I'd like to see that ...
> 
> There is already FreeNet and IPFS that implement content-based
> addressing.  I just wanted something in between that is still
> "location" driven, but is "content identity" aware.

  Again, what's stopping you from just doing it?  Waiting for consensus? 
Have you read the thread on text formatting?  It's literally half the
messages to this list.  I do have to wonder how far along Gemini would be if
I had not just gone ahead and implented a server.

  -spc (In my opinion, working code trumps ideas ... )

[1]	Like my blog engine, written in C:

	https://github.com/spc476/mod_blog

[2]	A close reading of the actual ASCII standard reveals two control
	codes, CR and LF.  CR is defined as "returning the carriage head
	back to the start of a line" and LF is defined as "advancing to the
	next line, without changing the position of the carriage." So a
	literal reading of the spec says if you want to advance to the start
	of the next line, you send both a CR and LF.  There is no control
	code defined by ASCII that means "return the carriage to the start
	of the line and advance to the next line." There *is* such a control
	character, NEL, but that's defined by the ISO, not ANSI (and it
	happens to be either character 133 or <ESC>E).

	Over time, some systems have adpoted one or the other to mean
	"return carriage to start of line and advance to next line." Most
	8-bit systems I've experienced used CR for that.  Unix picked LF.  A
	few (mostly DEC influenced, like CP/M) used both.

	The RFCs written in the 70s (when the Internet was first being
	developed) used a more literal imterpretation of the ASCII standard
	and required both CRLF to mark the end of the line.

	There is also a similar issue with backspace.  ASCII defines BS as
	"move the carriage to the previous character position; if at the
	start of the line, don't do anything." DEL is defined as "ignore
	this character." Neither one means "move back one space and erase
	the character".  BS was intended to be used to create characters not
	defined by ASCII, like ? by issuing the sequence

		a<BS>"

	Over time, different systems have implemented the "move back one
	space and erase the character" by using either BS or DEL.

[3]	http://www.conman.org/

[4]	At the current domain.  It's a bit older than that, but it was under
	a different domain I didn't control, which is why my personal pages
	are under:

		http://www.conman.org/people/spc/
	
	and not the top level.  That move was painful enough as it was.

[5]	gopher://gopher.conman.org/

Link to individual message.

10. Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)

On Sat, Feb 29, 2020 at 1:42 AM Sean Conner <sean at conman.org> wrote:
>   Perhaps it's me, but I don't mind looking up things if I don't recall
> them.  I've been programming in C for 30 years now.  I *still* have to look
> up the details to strftime() every single time I use it, but I recall that
> rand() returns a number between 0 and MAX_RAND (inclusive), yet I use
> strftime() way more often than I do rand().


When one is developing code then yes, looking up things in the
documentation is OK.  However when one is reading code, looking in the
documentation breaks your focus.




> > OK, reading that email the answer seems to be "because other protocols
> > have it"...  And even you admit that in your own code you also handle
> > just `LF`.  [...]
>
>   True, but there's the 800-pound gorilla to consider---Windows.  On
> Windows, a call like:
> [...]
>
>   So there's a lot of protocols defined for the Internet that use CRLF.
> Could a switch be made to just LF?  Sure.  It's also about as likely as the
> Internet byte order being switched from big-endian to little-endian.


OK, I'll drop the CRLF thing, but I find it odd that the only argument
to this is "because systems and protocols designed many years ago did
this (i.e. CRLF)", and to that you add "but anyway, all these systems
just ignore all that and behave internally like it wasn't so (i.e.
convert CRLF into LF)"...




As a minor note, I've seen C mentioned a lot of times, but please take
into account that many projects aren't developed in C anymore, but
instead in Python / Ruby / Go / Rust / other "newer" languages, that
are much easier to develop in than C.  Case in point, out of the 3
clients for Gemini, one is in Go, one in Rust and the other in
Python...




> > >   Okay, we use NaCL.  Now what?  What's needed to secure the communication
> > > channel?  A key exchange.  Again, rule 1---never implement crypto.
> >
> >
> > Given that one has the public key of the server (more on that later),
> > one could use the following on client / server sides:
> >
> >     https://libsodium.gitbook.io/doc/public-key_cryptography/sealed_boxes
>
>   The linked page just talks about an API for signing and ecrypting data.
> It says nothing about negotiating the cipher, key size, or anything remotely
> like a protocol.


(I have a hunch that you are not "acquainted" with NaCL / `libsodium`;
 the short story is this:  the designers of NaCL (again including
Daniel J. Bernstein) wanted to design and implement a secure, simple
to use, high level cryptographic library, that makes all the choices
for its users, so that ciphers, key sizes, padding, nonces, etc.,
aren't required to be handled by the user, and thus no mistakes would
be made on this front.)

In fact that link does say at the end under the section `Algorithm
details` what happens behind the scenes:

~~~~
Sealed boxes leverage the crypto_box construction (X25519, XSalsa20-Poly1305).

The format of a sealed box is:

ephemeral_pk || box(m, recipient_pk, ephemeral_sk,
nonce=blake2b(ephemeral_pk || recipient_pk))
~~~~




> I would ask that if you feel this strongly about it, *do
> it!*  Implement a client and server that uses these alternative crypto
> systems and then we'll have something to talk about.


What is the chance you'll change your mind about TLS?  0.01%?  Are you
actually considering to compare TLS vs another proposal without bias
towards "legacy `gemini://` implementations currently using TLS"?




>   You can argue for an idea.  But an idea *and an implementation* is
> stronger than just the idea.  I think that's why my Gemini server is so
> featureful---I went ahead and implemented my ideas to help argue for/against
> ideas, or even to just present *something* to talk about (when I have no
> opinion one way or the other).


Perhaps I'll throw a proof-of-concept in Python or Go.  (Although as
said above, I think it won't change anything, as there is already a
lot of "investment" in TLS...)




> > >   One problem with that---incentives.  What's my incentive to make all this
> > > information more easily machine readable?  On the web, you do that, and what
> > > happens?  Google comes along, munches on all that sweet machine readable
> > > data and serves it up directly to users, meaning the user just has to go to
> > > Google for the information, not your server.  Given those incentives, I have
> > > no reason to make my data easily machine readable when it means less
> > > traffic.
> >
> > The incentive is a clear one:  for the end-user.  Given that we can
> > standardize on such an "index", then we can create better
> > "user-agents" that are more useful to our actual users.  (And I'm not
> > even touching on the persons that have various disabilities that
> > hamper their interaction with computers.)
>
>   Okay, how does that incentivise me?


I don't know what incentives one to publish content;  some just want
to push their ideas on the internet, others might want to help others
through tutorials or documentation, others hope that by sharing they
advertise themselves, and so on...

However all of the above reasons (perhaps except the first one) do
need to care about their users.




>   It's easy enough to add machine readable annotations to HTML.  Heck, there
> are plenty of semantic tags in HTML to help with machine readability.  Yet
> why don't more people hand-code HTML?  Why is Markdown, which, I will add,
> has no defined way of adding metadata except by including HTML, so popular?


I don't know where the HTML micro-formats popped out in this
discussion, as I advocated against this approach.  :)




> > I'm not advocating for RDF (it was quite convoluted) or semantic web,
> > or GraphQL, etc.  I'm just advocating something better than the Gopher
> > map.
>
>   Okay, create a format and post it.  That's the best way to get this
> started.


OK, I'll try to take a stab at that.  (Although like in the case of
TLS, I think there is already too much "investment" in the current way
things are done.)




> > >   Hey, go ahead and implement that.  I'd like to see that ...
> >
> > There is already FreeNet and IPFS that implement content-based
> > addressing.  I just wanted something in between that is still
> > "location" driven, but is "content identity" aware.
>
>   Again, what's stopping you from just doing it?  Waiting for consensus?


Yes, a little bit of consensus won't hurt anybody...  Else we end-up
with TLS transient client certificates that act like cookies and which
require about 2 or 3 separate status codes to signal their
management...  :)

Ciprian.

Link to individual message.

11. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ciprian Dorin Craciun once stated:
> On Sat, Feb 29, 2020 at 1:42 AM Sean Conner <sean at conman.org> wrote:
> >   True, but there's the 800-pound gorilla to consider---Windows.  On
> > Windows, a call like:
> > [...]
> >
> >   So there's a lot of protocols defined for the Internet that use CRLF.
> > Could a switch be made to just LF?  Sure.  It's also about as likely as the
> > Internet byte order being switched from big-endian to little-endian.
> 
> 
> OK, I'll drop the CRLF thing, but I find it odd that the only argument
> to this is "because systems and protocols designed many years ago did
> this (i.e. CRLF)", and to that you add "but anyway, all these systems
> just ignore all that and behave internally like it wasn't so (i.e.
> convert CRLF into LF)"...

  I have support to check for both CRLF and LF because I do quite a bit of
work with existing Internet protocols (which define the use of CRLF) and do
extensive testing with Unix (which defines only using LF) and it makes my
life easier to support both [1].  Besides, I think you are underestimating
the extent of Windows development out there, and I think (I can't prove)
that it's easier for a programmer under Unix to add the '\r' than it would
be for a Windows programmer to force Windows *not* to add the '\r'.

> >   The linked page just talks about an API for signing and ecrypting data.
> > It says nothing about negotiating the cipher, key size, or anything remotely
> > like a protocol.
> 
> (I have a hunch that you are not "acquainted" with NaCL / `libsodium`;

  No, I'm aware of both NaCL (which as I stated before, I have installed on
my home system) and libsodium (which I haven't installed, having NaCL
already installed).

>  the short story is this:  the designers of NaCL (again including
> Daniel J. Bernstein) wanted to design and implement a secure, simple
> to use, high level cryptographic library, that makes all the choices
> for its users, so that ciphers, key sizes, padding, nonces, etc.,
> aren't required to be handled by the user, and thus no mistakes would
> be made on this front.)

  Yes, and I just found the Lua module I wrote for NaCL (*not* libsodium)
back in 2013 when I was last playing around with it.  

> In fact that link does say at the end under the section `Algorithm
> details` what happens behind the scenes:
> 
> ~~~~
> Sealed boxes leverage the crypto_box construction (X25519, XSalsa20-Poly1305).
> 
> The format of a sealed box is:
> 
> ephemeral_pk || box(m, recipient_pk, ephemeral_sk,
> nonce=blake2b(ephemeral_pk || recipient_pk))
> ~~~~

  I was going by what I recalled of NaCL, written by the highly esteemed Dr.
Daniel J. Bernstein, of having to make *some* choices in what underlying
function to use for encryption and being a bit concerned that the entirety
of NaCL had to be included in the Lua module due to linking issues [4].

> > I would ask that if you feel this strongly about it, *do
> > it!*  Implement a client and server that uses these alternative crypto
> > systems and then we'll have something to talk about.
> 
> What is the chance you'll change your mind about TLS?  0.01%?  

  Right now?  Sounds about right.  If you provide some "proof-of-concept"
that can be looked at?  It goes up.  

> Are you
> actually considering to compare TLS vs another proposal without bias
> towards "legacy `gemini://` implementations currently using TLS"?

  What I'm considering is "Hey!  We should implement my great idea!  And by
'we' I mean, someone else!" vibe I get when arguments like this pop up [5].

> >   You can argue for an idea.  But an idea *and an implementation* is
> > stronger than just the idea.  I think that's why my Gemini server is so
> > featureful---I went ahead and implemented my ideas to help argue for/against
> > ideas, or even to just present *something* to talk about (when I have no
> > opinion one way or the other).
> 
> Perhaps I'll throw a proof-of-concept in Python or Go.  (Although as
> said above, I think it won't change anything, as there is already a
> lot of "investment" in TLS...)

  So let me show you how much investment it took me to use TLS for my Gemini
server:

	local tls = require "org.conman.nfl.tls"
	
	local function main(ios)  -- main routine to handle a request
	  local request = ios:read("*l")  -- mimics the Lua file IO API
	  -- rest of code
	end

	local okay,err = tls.listen(CONF.network.addr,CONF.network.port,main,function(conf)
	  conf:verify_client_optional()
	  conf:insecure_no_verify_cert()
	  return conf:cert_file(CONF.certificate.cert)
	     and conf:key_file (CONF.certificate.key)
	     and conf:protocols("all")
	end)

  That's it. [6]  Granted, I think I had an easier time of it than some
others because of the library I picked (libtls, which makes using TLS very
easy).  If the other non-TLS options are this easy then you might have a
case. As solderpunk said, there are many, many, libraries and modules
available for the most popular languages for TLS.  And "ease of
implementation" was one of the goals of Gemini.  If these alternatives to
TLS are just as easy to use, then a proof-of-concept should show that,
right?

  And for an indication of how easy it is for me to use TLS, a hypothetical
TCP-only version of Gemini would look very similar:

	local tcp = require "org.conman.nfl.tcp"

	local function main(ios)
	  local request = ios:read("*l")
	  -- rest of code
	end

	local okay,err = tcp.listen(CONF.network.addr,CONF.network.port,main)

  No other changes (except to remove the code to check for user
certificates) would be required.  That's how easy it should be.

> >   It's easy enough to add machine readable annotations to HTML.  Heck, there
> > are plenty of semantic tags in HTML to help with machine readability.  Yet
> > why don't more people hand-code HTML?  Why is Markdown, which, I will add,
> > has no defined way of adding metadata except by including HTML, so popular?
> 
> I don't know where the HTML micro-formats popped out in this
> discussion, as I advocated against this approach.  :)

  Machine readable formats, or at least, machine readable bits.

> > > I'm not advocating for RDF (it was quite convoluted) or semantic web,
> > > or GraphQL, etc.  I'm just advocating something better than the Gopher
> > > map.
> >
> >   Okay, create a format and post it.  That's the best way to get this
> > started.
> 
> OK, I'll try to take a stab at that.  (Although like in the case of
> TLS, I think there is already too much "investment" in the current way
> things are done.)

  Dude, have you *read* the thread about text formatting?  Literally half
the messages to this list have been about that, and we're *still* talking
about it.

> >   Again, what's stopping you from just doing it?  Waiting for consensus?
> 
> Yes, a little bit of consensus won't hurt anybody...  Else we end-up
> with TLS transient client certificates that act like cookies and which
> require about 2 or 3 separate status codes to signal their
> management...  :)

  Touch?.

  -spc

[1]	Okay, I have code that parses SIP messages [2].  As defined by many
	(many, many) RFCs, the transport over IP requires handling of CRLF. 
	But test the parser, it's easier to support just LF, since the
	testing I do is all under Unix [3].

	I also have code that deals with email messages, again, which are
	defined with CRLF, but on Unix, usually end with just LF.

[2]	At work.  At home, I don't have to deal with the horrors of SIP.

[3]	No Windows at all at home, or at work.

[4]	I can go over this in detail if you wish, but I'd rather not as it
	gets rather deep rather quickly.

[5]	It happens quite regularly on the Lua mailing list I'm on.  So much
	so that I outright ignore several people on that list.

[6]	Okay, it took a bit to write the Lua module around libtls (from
	LibreSSL), and some work to adapt it to my socket framework, but now
	that that is done, other people can leverage that work.

Link to individual message.

12. Bradley D. Thornton (Bradley (a) NorthTech.US)



On 2/28/2020 2:04 AM, Ciprian Dorin Craciun wrote:
> On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote:
>>   Why is a numeric status code so bad?  Yes, the rest of the protocol is
>> English centric (MIME types; left-to-right, UTF-8).  It just seems that
>> using words (regardless of language) is just complexity for its own sake.
> 
> 
> Why did people use `/etc/hosts` files before DNS was invented?  Why do
> we have `/etc/services`?  Why do we have `O_READ`?  Why do we have
> `chmod +x`?
> 
> Because numbers are hard to remember, and say nothing to a person that
> doesn't know the spec by heart.  (For example although I do a lot of
> HTTP related work with regard to routing and such, I always don't
> remember which of the 4-5 HTTP redirect codes says "temporary redirect
> but keep the same method" as "opposed to temporary redirect but switch
> to `GET`".)
> 

Well, section 1.3.2 of the Gemini spec-spec says two digit codes, but
single (first digit) is all that is required. So, a 2, a 20, and a 21
are all success and there's no ambituity as to anything occuring at the
first digit level, it's just more gravy with the second digit.

I do fail to see why what appears to me to be a whole lot of work to
implement what you suggest, especially considering that most servers
will invariably choose to implement their own custom handlers for
status/error codes, much like one does in Apache so the server operator
themselves gets to choose what content to deliver as a result of a 404.

So there would be added framework for human readable, non-numeric status
codes (I would rather read the numerical codes in my logfiles), and then
as Gemini matures and stabilizes, devs will build frameworks so the
server operators can and will devlop custom pages for the status codes
anyway. This seems, at best, somewhat redundant to me (ultimately).

A 5 (or 50) might not provide as complete a picture as one would like,
yet it's optional to serve the full digit code and still unambiguous
with respect of what's going on at the baseline - a permanent falure.

A 51 though, perhaps the most common user facing state where errors are
encountered, will certainly eventually be accommodated by some clever
little remark intended to amuse the user who just asked for something
that isn't there. Reinforcing my suggestion that the server operators
are going to want the devs to enable them to deliver cute little
messages during such fashion faux pas'.

That's just kinda what I was pondering while reading the exchange.

-- 
Bradley D. Thornton
Manager Network Services
http://NorthTech.US
TEL: +1.310.421.8268

Link to individual message.

13. Bradley D. Thornton (Bradley (a) NorthTech.US)



On 3/1/2020 12:22 AM, Bradley D. Thornton wrote:
> 
> 

> 
> Well, section 1.3.2 of the Gemini spec-spec says two digit codes, but
> single (first digit) is all that is required. So, a 2, a 20, and a 21
> are all success and there's no ambituity as to anything occuring at the
> first digit level, it's just more gravy with the second digit.

Errata: There's no '2', the first character is followed by a zero on the
most basic implementations. My bad.

But we still don't have a Gemini status code analagous to that of the
HTTP 418 - and IMNSHO, we should :P

>

-- 
Bradley D. Thornton
Manager Network Services
http://NorthTech.US
TEL: +1.310.421.8268

Link to individual message.

14. Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)

On Sun, Mar 1, 2020 at 10:22 AM Bradley D. Thornton
<Bradley at northtech.us> wrote:
> On 2/28/2020 2:04 AM, Ciprian Dorin Craciun wrote:
> > On Fri, Feb 28, 2020 at 11:07 AM Sean Conner <sean at conman.org> wrote:
> >>   Why is a numeric status code so bad?  Yes, the rest of the protocol is
> >> English centric (MIME types; left-to-right, UTF-8).  It just seems that
> >> using words (regardless of language) is just complexity for its own sake.
> >
> > Because numbers are hard to remember, and say nothing to a person that
> > doesn't know the spec by heart.  (For example although I do a lot of
> > HTTP related work with regard to routing and such, I always don't
> > remember which of the 4-5 HTTP redirect codes says "temporary redirect
> > but keep the same method" as "opposed to temporary redirect but switch
> > to `GET`".)
>
> Well, section 1.3.2 of the Gemini spec-spec says two digit codes, but
> single (first digit) is all that is required. So, a 2, a 20, and a 21
> are all success and there's no ambituity as to anything occuring at the
> first digit level, it's just more gravy with the second digit.


Although I didn't state this before, having two digits, which in fact
are interpreted as a two level decision tree (the first digit denoting
generic class of conditions, and the second digit denoting more
fine-grained reasons), has at least the following problems:


the 410 non-empty lines of specification (v0.10.0), 77 are dedicated
to section `1.3.2 Status Codes` and another 93 to the `Appendix 1 --
Full two digit status codes`, i.e. ~40% of the specification is
dedicated to only these two digits...  granted these two sections also
contain additional information about how to interpret them and the
meta field, etc.;  but regardless having so many status codes does add
additional complexity to the protocol;


"generic class of conditions" (i.e. first digits 1 through 6);  soon
enough, as the protocol progresses and matures, we'll identify new
classes of conditions, and we'll have to start to either introduce a
"miscellaneous" category, or use values from other categories,
breaking thus the clear hierarchy;


for example `21 success with end of client certificate session` has to
do with TLS transient certificates management (which is `6x`);  in
fact this shouldn't even be a status code, but a "signal", because for
example a redirect or other failure could as well require the end of
client certificate session;


proxy error`, which are part of the `4x temporary failure` group, but
might be in fact (especially in the case of 43, although granted we
have 53) permanent errors;  (even `51 not found` can be seen as a
temporary error, because perhaps the resource will exist tomorrow;)


proxy request refused`, but we have no other proxy related statuses
like for example `6y` that states `proxy requires authentication`,
etc.;




So, if we really want to keep things simple why not change this into:


MUST display the meta field to the user as plain text;  (please note
that this "soft"-forbids the client and server to implement any clever
"extensions";)

you are searching for does not exist at the moment;  perhaps it
existed in the past, perhaps later it will exist;)

there isn't a clear definition and usage of temporary vs permanent;)

can be more elegantly solved with a redirect to a `gemini+query:...`
URL;)


log the failure, and not retry the same URL for more than 4 times in a
24 hour window;

  * they have received at least 100 consecutive errors;
  * over the last 200 requests, at least 100 of them where errors;


How would we automate authentication given we have no `authentication
required`?  We display the user the meta, the user interprets the
message and if he sees a message that prompts him to authenticate he
will do so through a menu, and the user-agent will perhaps remember
this decision.




> I do fail to see why what appears to me to be a whole lot of work to
> implement what you suggest,


Now getting back to my "symbolic" status codes proposal, it's no more
work, because currently the code looks like:

 ````
if (status[0] == '1') {
   ...
} else if (status[0] == '2') {
   ...
}
 ````

Meanwhile my proposal would require one to:
 ````
if (hasprefix (status, "success:")) {
   ...
} else if (hasprefix (status, "redirect:")) {
   ...
}
 ````

Granted now one has to implement, or find already implemented the
`hasprefix`, but all languages have it, and even in C one can
implement it as `strncmp (status, expected, strlen (expected)) == 0`.




> , especially considering that most servers
> will invariably choose to implement their own custom handlers for
> status/error codes, much like one does in Apache so the server operator
> themselves gets to choose what content to deliver as a result of a 404.


No this proliferation of "status codes" won't happen because the
protocol won't allow for it.  (Although even today with numeric status
codes people can just invent their own, unless we clearly define
conditions for all 100 codes, and even then people can disregard their
definitions...)


The only marginal advantage I see for numeric codes is in logs as you've stated.

Ciprian.

Link to individual message.

15. Sean Conner (sean (a) conman.org)

It was thus said that the Great Ciprian Dorin Craciun once stated:
> 
> * either it's not enough, given that we've already used 50% of the
> "generic class of conditions" (i.e. first digits 1 through 6);  soon
> enough, as the protocol progresses and matures, we'll identify new
> classes of conditions, and we'll have to start to either introduce a
> "miscellaneous" category, or use values from other categories,
> breaking thus the clear hierarchy;

  Before the mailing list, solderpunk and I went back and forth over the
status codes.  solderpunk was intent on using single digit codes, whereas I
was pushing for three digit codes.  As I wrote to him back in July of 2019:

>   With a two digit scehem, you have at most 100 error codes (00 through 99),
> with very clearly delineated classes that makes it easy for a client to act
> upon just the first digit.
> 
>   With your one character scheme (and yes, I'm calling it "one character"
> and not "one hexidecimal") the grouping is less clear, and it can *still* be
> extended out to a total of 94 codes (if you use other characters).  Also,
> what should happen when a client receives a code of 'a' through 'f'?  Is it
> only upper case allowed?  Or the lower case version as well?  Because in
> hexidecimal, 0xF and 0xf are the same value.
> 
>   What are you really afraid of?  Expansion?  Gopher gives you two
> results---the content or nothing.  A bit brutal, but simple.  You can cut
> this down to just four cases:
> 
>         success
>         transitory error
>         permanent error
>         need authorization
> 
> but that's still a bit brutal.  Just because you have 100, or 400, error
> codes doesn't mean they all will get used.  I'm sure the top three codes
> seen in the wild for HTTP are:
> 
>         200     Okay
>         304     Not modified
>         404     Not found
> 
> with the following codes just enough to show up:
> 
>         302     Move temp
>         301     Move perm
>         403     Not authorized
>         500     Internal error
> 
> and the rest are rounding errors.  I can't seem to find any evidence to back
> this up, but that's my gut feeling.  I think a single character response
> code is just too llmiting and yet, ripe for more abuse than with a two-digit
> numeric range.

  Also, the use of three digit status codes goes back a long time.  In fact,
it was first proposed in RFC-360, dated June 24, 1972! [1] And guess what? 
It was almost a one-to-one mapping of current HTTP status code.  2xx where
okay, 3xx were different, but I could see the mapping, 4xx were client
errors and 5xx were server errors.  There were also 1xx, but HTTP/1.1
defined 1xx status as well.

  And if anything, the fact that no new status classifications have come up
in 48 years says that your fears of new categories might not be warranted.

> * some conditions don't fall particularly well into clear categories;
> for example `21 success with end of client certificate session` has to
> do with TLS transient certificates management (which is `6x`);  

  Fair enough, but solderpunk would have to fix that one.

> in
> fact this shouldn't even be a status code, but a "signal", because for
> example a redirect or other failure could as well require the end of
> client certificate session;

  Again, fair enough.  I'm sure some of this is speculative anyway, since I
don't think any servers have actually implemented this feature (I know I
haven't).

> * another example of "unclear" status codes are `42 CGI error` and `43
> proxy error`, which are part of the `4x temporary failure` group, but
> might be in fact (especially in the case of 43, although granted we
> have 53) permanent errors;  (even `51 not found` can be seen as a
> temporary error, because perhaps the resource will exist tomorrow;)

  Yes, solderpunk changed from client errors/server errors to
temporary/permanent errors.  I didn't fight it that much since I can see the
logic in it.

> * and speaking of proxies, we have `43 temporary proxy error` and `53
> proxy request refused`, but we have no other proxy related statuses
> like for example `6y` that states `proxy requires authentication`,
> etc.;

  I can see the argument for a "AUTHORIZATION FOR PROXY" error, but by the
same token, what type of certificate? (and even there, I think having three
different types of certificates is certainly a bit of confusion).  This may
require some clarification from solderpunk in the mean time.

> So, if we really want to keep things simple why not change this into:
> 
> * (we only use one digit to denote success or failure);
> * `0` (i.e. like in UNIX) means success, here is your document;
> * `1` (i.e. again like in UNIX) means "undefined failure", the client
> MUST display the meta field to the user as plain text;  (please note
> that this "soft"-forbids the client and server to implement any clever
> "extensions";)

  I still like numeric values as they are language agnostic.  I mean, what
If I get back:

	Bh? teip ann an cl?r a chur i bhfeidhm

Would you even know what language to translate from?

  Yes, most likely this would be English, but I am ornery enough to follow
the letter of the law if not the spirit.

> * `2` not found / gone;  (i.e. the server is working fine, but what
> you are searching for does not exist at the moment;  perhaps it
> existed in the past, perhaps later it will exist;)

  There is a distinction between "gone" and "not found".  "Gone" means "it
was once here, but has since been removed, please stop referencing this
resource" (i.e. "remove it from your bookmarks file"), while "not found"
means just that---it's not here.

  I mentioned to solderpunk that I wish gopher had a "gone" message (along
with redirect, which I'll get to below), since there is a good reason to
mark something as "gone" and not just "not found".

> * `3` redirect;  neither temporary nor permanent; (because in fact
> there isn't a clear definition and usage of temporary vs permanent;)

  I think there is:

	* permanent---this resource has permanently moved, and any future
		reference should use the new location (i.e. update your
		index or bookmark file!)

	* temporary---this reference is still a valid reference, but the
		acutual content is, for whatever reason, located there.

  A valid reason for a temorary redirect might be to redirect users to the
most current resource available, say, a specification.  A base link like:

	gemini://gemini.example.com/foobar-spec

could in fact do a temporary redirect to

	gemini://gemini.example.com/foobar-spec.1.3.2

One can always link directly to a specific version, but the current will


  The actions are the same, but the semantics are different.

> > I do fail to see why what appears to me to be a whole lot of work to
> > implement what you suggest,
> 
> Now getting back to my "symbolic" status codes proposal, it's no more
> work, because currently the code looks like:
> 
> ````
> if (status[0] == '1') {
>    ...
> } else if (status[0] == '2') {
>    ...
> }
> ````
> 
> Meanwhile my proposal would require one to:
> ````
> if (hasprefix (status, "success:")) {
>    ...
> } else if (hasprefix (status, "redirect:")) {
>    ...
> }
> ````
> 
> Granted now one has to implement, or find already implemented the
> `hasprefix`, but all languages have it, and even in C one can
> implement it as `strncmp (status, expected, strlen (expected)) == 0`.

  Nice, but you still need a function to check the second (and possibly)
third fields.  There's strtok() (a standard C function) but that can't be
used in multithreaded applications ... 

> > , especially considering that most servers
> > will invariably choose to implement their own custom handlers for
> > status/error codes, much like one does in Apache so the server operator
> > themselves gets to choose what content to deliver as a result of a 404.
> 
> No this proliferation of "status codes" won't happen because the
> protocol won't allow for it.  (Although even today with numeric status
> codes people can just invent their own, unless we clearly define
> conditions for all 100 codes, and even then people can disregard their
> definitions...)

  That can be done now.  I can program a web server to return a status of
700.  Now what clients will do in that case ... hmm ... okay, a quick test
revealed the following:

	Firefox		- treated a status code of 700 as a 200 and
			displayed the page

	Safari		- treated a status code of 700 as a 200 and 
			displayed the page

	Lynx		- warned of the nonstandard status code, and then
			proceeded to treat 700 as 200 and displayed the page.

  -spc

[1]	There is also RFC-354, an earlier RFC but one dated two weeks after
	RFC-360, that also uses three digit status codes similar to RFC-360.

Link to individual message.

16. Ciprian Dorin Craciun (ciprian.craciun (a) gmail.com)

On Mon, Mar 2, 2020 at 3:39 AM Sean Conner <sean at conman.org> wrote:
> > So, if we really want to keep things simple why not change this into:
> >
> > * (we only use one digit to denote success or failure);
> > * `0` (i.e. like in UNIX) means success, here is your document;
> > * `1` (i.e. again like in UNIX) means "undefined failure", the client
> > MUST display the meta field to the user as plain text;  (please note
> > that this "soft"-forbids the client and server to implement any clever
> > "extensions";)
>
>   I still like numeric values as they are language agnostic.  I mean, what
> If I get back:
>
>         Bh? teip ann an cl?r a chur i bhfeidhm
>
> Would you even know what language to translate from?


If a client would receive such a status it would reply to the user:
"server error: invalid protocol".  Why?  Because I'm not advocating
for "any sequence of ASCII characters", but a predefined (thus limited
and known) list of tokens that are accepted.


More clearly, there are two separate issues:

ASCII tokens, etc. -- i.e. the serialization of an error condition;

different "response" classes do we want to specify;




> > * `2` not found / gone;  (i.e. the server is working fine, but what
> > you are searching for does not exist at the moment;  perhaps it
> > existed in the past, perhaps later it will exist;)
>
>   There is a distinction between "gone" and "not found".  "Gone" means "it
> was once here, but has since been removed, please stop referencing this
> resource" (i.e. "remove it from your bookmarks file"), while "not found"
> means just that---it's not here.
>
>   I mentioned to solderpunk that I wish gopher had a "gone" message (along
> with redirect, which I'll get to below), since there is a good reason to
> mark something as "gone" and not just "not found".


I understand the "philosophical" distinction between "gone" and "not
found";  but how often have you encountered a web server that properly
responds with "gone"?  Or, given that many `gemini://` servers will be
static servers using the file-system, how would one "know" if a
selector refers to a "gone" file?




> > * `3` redirect;  neither temporary nor permanent; (because in fact
> > there isn't a clear definition and usage of temporary vs permanent;)
>
>   A valid reason for a temorary redirect might be to redirect users to the
> most current resource available, say, a specification.  A base link like:
>
> [...]
>
> One can always link directly to a specific version, but the current will
> *always* be found at a known location.
>
>   The actions are the same, but the semantics are different.


I think that if one would apply a case-by-case analysis, one could
find better solutions.

For example in your documentation case, how about instead of
redirecting we present an `text/gemini` resource that provides a short
summary, a link to the current version, and a list of links to
previous versions.

I would say it makes more sense because:

there are multiple versions;

need such a "links page", thus why not make proper use of it;

Ciprian.

Link to individual message.

17. Jason McBrayer (jmcbray (a) carcosa.net)

"Aaron Janse" <aaron at ajanse.me> writes:

> Please no. PGP is a bit of a mess already. It's tough to
> install/maintain (because it has a daemon), and it's really easy to
> mess up. I think using something like NaCl could be much more
> difficult to mess up than automated PGP.

Yeah, in 2020, PGP is an elaborate foot-gun.

The suggestion of Noise protocol is actually more interesting, as it's
smaller and more future-proof than TLS. But while there are
implementations in several languages, it's not as ubiquitous as TLS.
(For instance, I had no trouble writing a Gemini server in Common Lisp,
but I'd have to write my own bindings of the C implementation to use
Noise protocol.) I'd have to read a lot more about it to know what its
advantages are.

-- 
Jason McBrayer      | ?Strange is the night where black stars rise,
jmcbray at carcosa.net | and strange moons circle through the skies,
                    | but stranger still is lost Carcosa.?
                    | ? Robert W. Chambers,The King in Yellow

Link to individual message.

---

Previous Thread: WWW indexing concerns (was: Gemini Universal Search)

Next Thread: Is it too late to consider adding a subset of Markdown to the render spec?