Questions regarding "POST" request and line endings

1. Felix Queißner (felix (a) masterq32.de)

Hey!

I wanted to ask (and maybe discuss, if not done already) why Gemini has
no option to upload a file to a server except for a roughly 1024 byte
long URL string.

Imho this would be a practical use case e.g. for web forums or other
services. As some gemini servers already work with cgi scripts and
similar, i think it's a reasonable thing to allow updates via the spec.

And people will eventually use the query string to upload files with
multiple requests if they want to upload files...

Another question:
Why does gemini use <CR><LF> line endings instead of a single <LF> or
<CR> token? It makes the parser implementation more complex and imho
brings no benefit to the protocol and text format itself. I see no
reason why i should have a single <CR> or <LF> in a line and it may
confuse users: "12345, World!<CR>Hello" may be printed as "Hello,
World!" on most text terminals which is imho undesirable for
non-interactive output.

If a text-based client requires output to be <CR><LF> instead of <LF>,
it can patch the data on the fly while outputting it.

Regards
xq

Link to individual message.

2. jan6 (a) tilde.ninja (jan6 (a) tilde.ninja)

May 17, 2020 3:55 PM, "Felix Quei?ner" <felix at masterq32.de> wrote:
> Why does gemini use <CR><LF> line endings instead of a single <LF> or
> <CR> token? It makes the parser implementation more complex and imho
> brings no benefit to the protocol and text format itself.

<CR><LF> is the windows line ending, also HTTP spec for example, requires that
no idea why it's required, exactly

functionally you could just split on <LF> and remove all <CR> occurrences, I think.
CR, the carriage return, would return the cursor to the start of the line, 
which is almost certainly not wanted in the middle of the text (also 
ancient macOS, before OS X, used ONLY <CR>, iirc, as a fun sidenote)

using <CR><LF> would allow you to directly netcat from windows, or so, I suppose...

I'd think it would be best if the server side would accept either 
<CR><LF>, AND only a normal <LF>

and I'm not sure it really matter what the server returns, because you can 
strip out all <CR> characters,
and on linux (probably all unixes?) it displays like a normal newline anyway

Link to individual message.

3. solderpunk (solderpunk (a) SDF.ORG)

On Sun, May 17, 2020 at 02:55:03PM +0200, Felix Quei?ner wrote:
 
> I wanted to ask (and maybe discuss, if not done already) why Gemini has
> no option to upload a file to a server except for a roughly 1024 byte
> long URL string.

I have to admit, this idea has occasionally crossed my mind.  Most
recently, when Dave wrote a helper for Git so that people could `git
pull` over Gemini, which I thought was super cool - `git push` isn't
possible with Gemini as a read-only protocol.

It's not that I don't think there are good uses for this.

The original reason is that I was obsessed from day one with making it
extremely hard for people to be able to extend the core Gemini protocol.
HTTP, for example, allows as many headers as you like in
requests/responses.  Clients are expected to read them all, and handle
the ones they can handle.  This means anybody can come up with a new
header, and if it's popular many clients/servers will implement it, and
then it becomes a de facto part of the standard, and clients/servers
which don't handle it are seen as "broken" or "primitive".

This extensibility is of course a useful thing in many ways from an
engineering perspective.  But in the long term it is, IMHO, fundamentaly
totally incompatible with ideals like simplicity and minimalism and
privacy and "anybody can implement it themselves over a weekend in <
1000 LOC".  Designers of protocols which are extensible effectively lose
a lot of control over their protocol.  It's pointless me trying very
hard to keep stuff which could be abused for tracking out of Gemini if
it can be snuck in by popular consensus this way, because inevitably it
will be.  You've just got to limit the scope for this kind of extension
everywhere you can.

If you take this idea seriously, you are basically forced to choose
one kind of "thing" a lot, and then have that thing be totally implicit.
If there's only one kind of Gemini request (something analogous to GET),
then we don't have to explicitly put anything in the request format
saying "this is a GET-ish request".  And if there's nothing explicit
there, nobody can write an "advanced" server which recognises a
different value in that place.

So, thinking from a perspective of simplifying HTTP, I had to choose
only one method, so I chose GET.  I had to choose only one response
header, so I chose Content-type (because my experience maintaining a
popular Gopher client convinced me this was the most sorely lacking bit
of information).  Several people convinced me to use full blown URLs
instead of just paths as I originally specced for requests, which is
equivalent to choosing Host as the only request header.  Basically this
theme runs deep all throughout Gemini's design: wherever HTTP allows
several things, pick the one most fundamentally important/useful one,
and make it an implicit default with no scope for anything else.

If somebody can come up with a way to distinguish GET from POST style
requests without also opening up an obvious door to arbitrarily many
extra request types, I'll give it some thought.  But I'm not optimistic.

Insisting on non-extensibility necessarily imposes limits on how much
Gemini can do.  That's okay.  Limitations encourage creativity, and give
different things their own unique style/taste/whatever.  Gemini is never
going to be able to do everything that the web can do - it can't
possibly do that while remaining simpler.  We should accept this.
 
> Another question:
> Why does gemini use <CR><LF> line endings instead of a single <LF> or
> <CR> token? It makes the parser implementation more complex and imho
> brings no benefit to the protocol and text format itself. I see no
> reason why i should have a single <CR> or <LF> in a line and it may
> confuse users: "12345, World!<CR>Hello" may be printed as "Hello,
> World!" on most text terminals which is imho undesirable for
> non-interactive output.

As recently mentioned, the spec doesn't actually explicitly say anything
about line endings in text/gemini content itself (although it should).
It does suggest that CRLF is needed at the end of => lines, but that was
unintentional on my part.  I agree that requiring CRLF for actual
content is strange and I suspect this will change in the next revision.

CRLF *is* clearly and deliberately specced in the non-content part of
the protocol, i.e. for requests and response headers.  And the honest
answer here is, well, that's how every internet protocol whose spec I've
ever looked at works - HTTP, Gopher, SMTP, IRC, for example, all do
this.  I admit to being ignorant as to the exact historical reason for 
his convention.  But it's a deep and wide convention adhered to by
people who know more than I do, and for that reason I'm reluctant to
break it without very good reason.

If people have strong feelings in either direction about the line
terminator to be used in the protocol and in text/gemini content, I'm
very happy to hear it.

Cheers,
Solderpunk

Link to individual message.

4. Felix Queißner (felix (a) masterq32.de)

First of all: thanks for the very extensive response!

> It's not that I don't think there are good uses for this.
> 
> The original reason is that I was obsessed from day one with making it
> extremely hard for people to be able to extend the core Gemini protocol.
> HTTP, for example, allows as many headers as you like in
> requests/responses.  Clients are expected to read them all, and handle
> the ones they can handle.  This means anybody can come up with a new
> header, and if it's popular many clients/servers will implement it, and
> then it becomes a de facto part of the standard, and clients/servers
> which don't handle it are seen as "broken" or "primitive".
Yes i can understand this and it was not my intention to create
extensibility in the protocol but just allow a single, client-induced
data upload to the server.

> This extensibility is of course a useful thing in many ways from an
> engineering perspective.  But in the long term it is, IMHO, fundamentaly
> totally incompatible with ideals like simplicity and minimalism and
> privacy and "anybody can implement it themselves over a weekend in <
> 1000 LOC".  Designers of protocols which are extensible effectively lose
> a lot of control over their protocol.
Yes, true

> It's pointless me trying very
> hard to keep stuff which could be abused for tracking out of Gemini if
> it can be snuck in by popular consensus this way, because inevitably it
> will be.  You've just got to limit the scope for this kind of extension
> everywhere you can.
One proposal for more privacy and less tracking:
Explicitly allow clients to remove the query string from any request, as
most of the web stuff does also tracking via request parameters (before
cookies).

This would prevent servers relying on per-user generated URLs in between
pages and the user can be queried if they want to remove the query
parameters.

> If you take this idea seriously, you are basically forced to choose
> one kind of "thing" a lot, and then have that thing be totally implicit.
> If there's only one kind of Gemini request (something analogous to GET),
> then we don't have to explicitly put anything in the request format
> saying "this is a GET-ish request".  And if there's nothing explicit
> there, nobody can write an "advanced" server which recognises a
> different value in that place.

Yeah that's why i asked for a specific PUT in the first place. It may
start to emerge that people want a more interactive version of
gemini-served pages and would start to abuse standard features like url
queries to introduce that kind of interactivity and it would be a point
where the server would be able to pretty easily "trick" the user into
following trackable links.

Having an explicit PUT option in the protocol and preventing servers to
rely on queries would make stuff simpler and more straightforward in the
long term

> If somebody can come up with a way to distinguish GET from POST style
> requests without also opening up an obvious door to arbitrarily many
> extra request types, I'll give it some thought.  But I'm not optimistic.

I actually came up with an idea, but i don't know how good it is in the end:

Respec the 10 INPUT so that it works like this:

1. Client sends usual request header
2. Server responds with "10 Your forum post:"
3. Client now has two options:
    1. The client drops the connection and sends no bytes. This would be
the status quo.
    2. The client now sends a single line with the mime type of the
data, then sends the data similar to the server responding with a 20
status code (so, instead of the server sending data to the client, the
client just sends data to the server)

This would allow several things:
1. Server can notify that the client needs to upload data, the client
can now chose to upload or not
2. With the mime type in the upload header, the server can just drop the
connection after the mime, displaying the client that the data sent is
unwanted.


> Insisting on non-extensibility necessarily imposes limits on how much
> Gemini can do.  That's okay.  Limitations encourage creativity, and give
> different things their own unique style/taste/whatever.  Gemini is never
> going to be able to do everything that the web can do - it can't
> possibly do that while remaining simpler.  We should accept this.

Yeah true. But the first idea that comes to my mind when i'd like to
upload a file would be:

Chunk the file into 256 byte large pieces, and upload the whole data via
a huge load of requests containing a query

  /path/?offset=X&length=Y&blob=Z

where X is the offset in the uploaded file, Y is the length of the
transferred data and Z would be the URL-encoded data itself.


> As recently mentioned, the spec doesn't actually explicitly say anything
> about line endings in text/gemini content itself (although it should).
> It does suggest that CRLF is needed at the end of => lines, but that was
> unintentional on my part.  I agree that requiring CRLF for actual
> content is strange and I suspect this will change in the next revision.
> 
> CRLF *is* clearly and deliberately specced in the non-content part of
> the protocol, i.e. for requests and response headers.  And the honest
> answer here is, well, that's how every internet protocol whose spec I've
> ever looked at works - HTTP, Gopher, SMTP, IRC, for example, all do
> this.  I admit to being ignorant as to the exact historical reason for 
> his convention.  But it's a deep and wide convention adhered to by
> people who know more than I do, and for that reason I'm reluctant to
> break it without very good reason.
Thanks for clarifying!

> If people have strong feelings in either direction about the line
> terminator to be used in the protocol and in text/gemini content, I'm
> very happy to hear it.
I'd like to see a pure <LF> version, especially for the protocol header.
My client atm just reads until the first <LF>, then checks if the <CR>
is there and if not, drops the connection to the server and respons with
"InvalidResponse"

I assume a lot of servers/clients either ignore the existence of <CR> or
drop the connection for protocol violation because both options are the
sane thing to do. It's not like a lone <CR> or <LF> are allowed anyways
in the header.

Regards
xq

Link to individual message.

5. plugd (plugd (a) thelambdalab.xyz)

solderpunk writes:
> On Sun, May 17, 2020 at 02:55:03PM +0200, Felix Quei?ner wrote:
> I have to admit, this idea has occasionally crossed my mind.  Most
> recently, when Dave wrote a helper for Git so that people could `git
> pull` over Gemini, which I thought was super cool - `git push` isn't
> possible with Gemini as a read-only protocol.
>
> It's not that I don't think there are good uses for this.

For what it's worth, I personally don't even think allowing git pushes
over gemini is a good reason for modifying the protocol.  Git has it's
own application-level protocol that works perfectly for this kind of
thing.  Gemini is very close to _perfect_ for it's intended use: serving
up primarily textual content to humans in a relatively secure but
pleasantly simple way.  I really think it'd be tragic to compromise on
this for reasons that are tangential to the original goal.

Just my 2c,

Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200517/ebf5
5f04/attachment-0001.sig>

Link to individual message.

---

Previous Thread: Video examples

Next Thread: ssh tunneling for gemini services