Questions on INPUT behavior

1. Michael Lazar (lazar.michael22 (a) gmail.com)

Hi all. I'm looking into implementing a few endpoints on my server that accept
user input, and I ended up with a few ambiguities about the gemini specification
that I would like to discuss:

 ```
The requested resource accepts a line of textual user input.
The <META> line is a prompt which should be displayed to the
user.  The same resource should then be requested again with
the user's input included as a query component.  Queries are
included in requests as per the usual generic URL definition
in RFC3986, i.e. separated from the path by a ?.  There is no
response body.
 ```

Here are my questions:

1. Should the query component be formatted as a "key=value" parameter?
   Or should it be added directly as the entire query component?

   A. "gemini://hostname.com/input?q=AbrahamLincoln"
   vs.
   B. "gemini://hostname.com/input?AbrahamLincoln"

2. Should the query component allow percent-sign escaping?
   If so, which characters should be escaped?

   A. "gemini://hostname.com/input?Hello%20world"
   vs.
   B. "gemini://hostname.com/input?Hello world"

3. Should a server be allowed to link to a URL with the user input pre-filled?
   E.g. Should this link, if placed in a text/gemini file, mean the same thing
   as the user manually typing in "Hello World"?

   A. "=>/input?Hello%20world"

   This also brings up a point that the above link would be impossible to
   define in a text/gemini file if percent-escaped spaces were not allowed.

4. If my server has an endpoint that does not request user input, can I
   re-purpose the query section for my own needs?

   A. "gemini://hostname.com/items?page=2&limit=20"

5. In the above example, what happens if a request to that URL returns a status
   code of 10? Should the client strip the existing query components from the
   URL, or append a new key=value pair to the end?

6. What widget should the client use to display the input prompt? A single line
   input, or a multi-line text box? Should newline characters even be allowed?

7. Should there be a maximum input length? Currently it is implicitly defined
   as 1024 bytes minus the length of the URL.

Personally, I have mixed feelings about gemini enabling user input in the first
place. I know that gopher supports it using the "7" item type, but I have only
seen a couple of compelling use cases for this in the wild. On the other hand,
it opens up a Pandora's box of complexity by allowing creative developers to
port over many features from HTTP.

I can already envision XSS style attacks. For example, say a gemini page
requests user input that will be rendered in a public text/gemini page like a
guestbook. The bad-actor submits a message that contains something like:

    "\n=>http://malicious.url\thttp://innocent.url".

Servers will need to be diligent in sanitizing their inputs, which kind of sucks
for what should be a simple protocol.

- mozz

Link to individual message.

2. Sean Conner (sean (a) conman.org)

It was thus said that the Great Michael Lazar once stated:
> Hi all. I'm looking into implementing a few endpoints on my server that accept
> user input, and I ended up with a few ambiguities about the gemini specification
> that I would like to discuss:
> 
> ```
> The requested resource accepts a line of textual user input.
> The <META> line is a prompt which should be displayed to the
> user.  The same resource should then be requested again with
> the user's input included as a query component.  Queries are
> included in requests as per the usual generic URL definition
> in RFC3986, i.e. separated from the path by a ?.  There is no
> response body.
> ```

  I think that should read "there is no request body"---the response can
include content, dependent upon the status code.

> Here are my questions:
> 
> 1. Should the query component be formatted as a "key=value" parameter?
>    Or should it be added directly as the entire query component?
> 
>    A. "gemini://hostname.com/input?q=AbrahamLincoln"
>    vs.
>    B. "gemini://hostname.com/input?AbrahamLincoln"

  B, unless there's a way to designate the variable name (which I did
suggest to solderpunk---perhaps I should float it here).

> 2. Should the query component allow percent-sign escaping?
>    If so, which characters should be escaped?
> 
>    A. "gemini://hostname.com/input?Hello%20world"
>    vs.
>    B. "gemini://hostname.com/input?Hello world"

  The following characters in the query should be escaped:

	SPACE # % < > [ \ ] ^ { | } "

and unless you are sending name/value pairs, 

	= &

should also be escaped (this per RFC-3986).

> 3. Should a server be allowed to link to a URL with the user input pre-filled?
>    E.g. Should this link, if placed in a text/gemini file, mean the same thing
>    as the user manually typing in "Hello World"?
> 
>    A. "=>/input?Hello%20world"

  I don't see why not.

	gemini://gemini.conman.org/cgi?Hello%20World

(this link works, by the way---although it's just a sample CGI script)

> 4. If my server has an endpoint that does not request user input, can I
>    re-purpose the query section for my own needs?
> 
>    A. "gemini://hostname.com/items?page=2&limit=20"

  Again, I don't see why not.  The query string is part of a URL, and
clients send URLs so this should be an issue client side.  What the server
side does with the query is up to the server.

> 5. In the above example, what happens if a request to that URL returns a status
>    code of 10? Should the client strip the existing query components from the
>    URL, or append a new key=value pair to the end?

  That's a good question and one I do not have an answer to.

> 6. What widget should the client use to display the input prompt? A single line
>    input, or a multi-line text box? Should newline characters even be allowed?

  It would depend upon the client.  I think the expectation is a single line
input, but I can see a multi-line box being useful as well.  

> 7. Should there be a maximum input length? Currently it is implicitly defined
>    as 1024 bytes minus the length of the URL.
> 
> Personally, I have mixed feelings about gemini enabling user input in the first
> place. I know that gopher supports it using the "7" item type, but I have only
> seen a couple of compelling use cases for this in the wild. On the other hand,
> it opens up a Pandora's box of complexity by allowing creative developers to
> port over many features from HTTP.

  True, and I've moved quite a few of my web-based projects to both Gemini
[1] and Gopher.

> I can already envision XSS style attacks. For example, say a gemini page
> requests user input that will be rendered in a public text/gemini page like a
> guestbook. The bad-actor submits a message that contains something like:
> 
>     "\n=>http://malicious.url\thttp://innocent.url".
> 
> Servers will need to be diligent in sanitizing their inputs, which kind of sucks
> for what should be a simple protocol.

  Experiement---point your favorite terminal based gopher client here:

	gopher://verisimilitudes.net/02019-08-18.ecma-48&utf-8	or
	gopher://verisimilitudes.net/02019-08-18.ecma-48%26utf-8

  This page directly embeds ECMA-48 escape sequences (aka ANSI escape codes)
in the text---what does your favorite terminal based gopher client do?  Does
it send the raw codes directly to the terminai?  Does it filter them out
entirely?  Does it show them? [2]

  Also, there's nothing that says a server *has* to support user input.

  -spc (

[1]	I even created a CGI interface to my Gemini server, and it can even
	translate HTTP status codes to Gemini ones.

[2]	My own gopher client (not yet published) does filter out escape
	sequences and not because of this page---it was a deliberate design
	choice I made when writing after doing a very deep dive into ECMA-48
	in the past year or two.

Link to individual message.

3. solderpunk (solderpunk (a) SDF.ORG)

>   I think that should read "there is no request body"---the response can
> include content, dependent upon the status code.

Once the user *submits* their input, the response to *that* request can
of course have any status code and a possible response body, but in that
part of the spec (maybe it's unclear and needs changing) I'm talking
about the response with code 10, which I shouldn't have a body, as it's
just delivering a prompt and the information that this resource wants an
input.

> >    A. "gemini://hostname.com/input?q=AbrahamLincoln"
> >    vs.
> >    B. "gemini://hostname.com/input?AbrahamLincoln"
> 
>   B, unless there's a way to designate the variable name (which I did
> suggest to solderpunk---perhaps I should float it here).

What I had in mind, and what I think most implementations do so far, is
indeed B.  It's possible I coud be convinced otherwise, but I don't
really see the value in speccing a fixed variable named like `q`.  It's
perfectly cromulent according to the URL RFC to use treat the query as a
string.  The key=value pair syntax is common in the web world mostly as
a way to make HTML forms work, and we don't have forms.

I *did* consider a 1x status code whose <META> was some kind of
machine-readable description of a form, but that doesn't degrade nicely
in simple clients which ignore the second status digit, as then the
human user has to imagine the form in their head and type a suitable
response.

>   The following characters in the query should be escaped:
> 
> 	SPACE # % < > [ \ ] ^ { | } "
> 
> and unless you are sending name/value pairs, 
> 
> 	= &
> 
> should also be escaped (this per RFC-3986).

Yes, this is right.  Remember, Gemini requests are URLs, and we don't
make the rules for URLs, we (hopefully!) follow them.

> >    A. "=>/input?Hello%20world"
> 
>   I don't see why not.
> 
> 	gemini://gemini.conman.org/cgi?Hello%20World
> 

I don't see why not either.

> >    A. "gemini://hostname.com/items?page=2&limit=20"
> 
>   Again, I don't see why not.  The query string is part of a URL, and
> clients send URLs so this should be an issue client side.  What the server
> side does with the query is up to the server.

Agreed.

> > 5. In the above example, what happens if a request to that URL returns a status
> >    code of 10? Should the client strip the existing query components from the
> >    URL, or append a new key=value pair to the end?

Hmm.  If a client requests the URL above, it should include the query
string in the request.  So why would the server respond with a status 10
in that case?  I mean, it's currently not prohibited in the spec for a
server to do that, so this is a fair question.  I'm not sure whether we

a use case in mind or are you just keeping an eye out for edge cases?
 
> > 6. What widget should the client use to display the input prompt? A single line
> >    input, or a multi-line text box? Should newline characters even be allowed?
> 
>   It would depend upon the client.  I think the expectation is a single line
> input, but I can see a multi-line box being useful as well.  

Good question, I think I agree with Sean above but I'll think on this...

> > 7. Should there be a maximum input length? Currently it is implicitly defined
> >    as 1024 bytes minus the length of the URL.

Hmm,  Do we think it's useful/worthwhile to spec a shorter explicit
limit?  I guess this is hard to answer without canonical applications of
user input being established...

Answers to all the stuff about inputs, XSS, etc. in the future...thanks
for the good questions!

- Solderpunk

Link to individual message.

4. Jason McBrayer (jmcbray (a) carcosa.net)

solderpunk writes:
>> > 5. In the above example, what happens if a request to that URL
>> > returns a status code of 10? Should the client strip the existing
>> > query components from the URL, or append a new key=value pair to
>> > the end?
>
> Hmm. If a client requests the URL above, it should include the query
> string in the request. So why would the server respond with a status
> 10 in that case? I mean, it's currently not prohibited in the spec for
> a server to do that, so this is a fair question. I'm not sure whether
> we *should* forbid it or spec some sensible client response. Did you
> have a use case in mind or are you just keeping an eye out for edge
> cases?

I can imagine a client asking successive questions in response to
answers. But I'm not sure what the client should do. We might want to
specify that Gemini URLs should not contain query parts except in
requests initiated by 1x responses. Or that the query part should never
be significant in a published URL; that it should be a pre-filled
suggestion that can be modified by the client.

--
Jason McBrayer      | ?Strange is the night where black stars rise,
jmcbray at carcosa.net | and strange moons circle through the skies,
                    | but stranger still is lost Carcosa.?
                    | ? Robert W. Chambers,The King in Yellow

Link to individual message.

5. Sean Conner (sean (a) conman.org)

It was thus said that the Great Jason McBrayer once stated:
> We might want to specify that Gemini URLs should not contain query parts
> except in requests initiated by 1x responses.

  What if I want to bookmark some search results?

> Or that the query part should never be significant in a published URL;

  See above.

> that it should be a pre-filled suggestion that can be modified by the
> client.

  One interesting aspect of Apache is that a generated index page accepts a
query string to sort the output.  "C=N" sorts by name; "C=S" sorts by size
(then by name); "O=A" sorts ascending; "O=D" sorts decending.  I would hate
to lose the ability to do that (not that I use it now).

  Again, a server doesn't have to support that, but it could.

  -spc (Or am I being evil for trying to bring too much of the web into
	Gemini?)

Link to individual message.

6. Jason McBrayer (jmcbray (a) carcosa.net)


Sean Conner writes:
>   -spc (Or am I being evil for trying to bring too much of the web into
> 	Gemini?)

No, bookmarking search results is a valid use-case.

Maybe servers should not be allowed to return a 1x response to a request
with a query string?

-- 
+----------------------------------------------------------------------+
| Jason F. McBrayer                                jmcbray at carcosa.net |
| The scalloped tatters of the King in Yellow must hide Yhtill forever.|

Link to individual message.

7. Sean Conner (sean (a) conman.org)

It was thus said that the Great Jason McBrayer once stated:
> 
> Sean Conner writes:
> >   -spc (Or am I being evil for trying to bring too much of the web into
> > 	Gemini?)
> 
> No, bookmarking search results is a valid use-case.
> 
> Maybe servers should not be allowed to return a 1x response to a request
> with a query string?

=> gemini://gemini.conman.org/hilo/	A Simple Guessing Game

  -spc

Link to individual message.

8. Michael Lazar (lazar.michael22 (a) gmail.com)

Jason McBrayer wrote:
> solderpunk writes:
> >> > 5. In the above example, what happens if a request to that URL
> >> > returns a status code of 10? Should the client strip the existing
> >> > query components from the URL, or append a new key=value pair to
> >> > the end?
> >
> > Hmm. If a client requests the URL above, it should include the query
> > string in the request. So why would the server respond with a status
> > 10 in that case? I mean, it's currently not prohibited in the spec for
> > a server to do that, so this is a fair question. I'm not sure whether
> > we *should* forbid it or spec some sensible client response. Did you
> > have a use case in mind or are you just keeping an eye out for edge
> > cases?
>
> I can imagine a client asking successive questions in response to
> answers. But I'm not sure what the client should do. We might want to
> specify that Gemini URLs should not contain query parts except in
> requests initiated by 1x responses. Or that the query part should never
> be significant in a published URL; that it should be a pre-filled
> suggestion that can be modified by the client.

Here's the example that I was thinking about when I wrote this question. I have
a gopher site that acts as a search engine for drink recipes [1]. Users can
filter their search results based on predefined tags, or they can search by
keyword using a gopher menu "7" query.

After adding some search filters, your path might look like this*:

    gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1

And then if you submit a search to that path, it will append the "q=" param to
the existing filters:

    gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1&q=irish

I'm trying to think of how I could re-implement this type of application in
gemini. And I can't really come up with a good solution if we assume that user
input clears the current query params.

This is admittedly a fringe case and I haven't seen anybody else do this type
of advanced search filtering in gopher. But it's fairly common to use query
params like this in HTTP when designing REST-like endpoints. For example, I
might have a link at the top of my search results that sets
"?limit=20" to control
how many items are returned. I would want the user to be able to submit a new
search string while preserving their previous choice for the limit.


incorrectly treat everything after the "?" as a search query, or they strip it
off completely (I'm looking at you, lynx!) [2].

[1] gopher://mozz.us:7003/1/search
[2] https://www.w3.org/Addressing/URL/4_1_Gopher+.html

- mozz

Link to individual message.

9. Sean Conner (sean (a) conman.org)

It was thus said that the Great Michael Lazar once stated:
> 
> Here's the example that I was thinking about when I wrote this question. I have
> a gopher site that acts as a search engine for drink recipes [1]. Users can
> filter their search results based on predefined tags, or they can search by
> keyword using a gopher menu "7" query.
> 
> After adding some search filters, your path might look like this*:
> 
>     gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1

  Wait a second ... I thought gopher URLs don't support the query type. 
It's not mentioned at all in RFC-4266 (and there are no updates as far as I
can see).

> And then if you submit a search to that path, it will append the "q=" param to
> the existing filters:
> 
>     gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1&q=irish
> 
> I'm trying to think of how I could re-implement this type of application in
> gemini. And I can't really come up with a good solution if we assume that user
> input clears the current query params.

  Add on to the path component?  I did a very short version of that for my
guessing game:

	gemini://gemini.conman.org/hilo/

  -spc

Link to individual message.

10. Michael Lazar (lazar.michael22 (a) gmail.com)

On Wed, Aug 28, 2019 at 11:59 AM Sean Conner <sean at conman.org> wrote:
>   Wait a second ... I thought gopher URLs don't support the query type.
> It's not mentioned at all in RFC-4266 (and there are no updates as far as I
> can see).

>From RFC-4266:

   <selector> is the Gopher selector string.  In the Gopher protocol,
   Gopher selector strings are a sequence of octets that may contain any
   octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal
   (US-ASCII character LF), and 0D (US-ASCII character CR).

Aside for those 3 reserved octets, you can stick anything you want in the
selector part of the URL. Query string don't have any special meaning to
gopher, they're just additional bytes in the selector that get sent to the
server.

>> And then if you submit a search to that path, it will append the "q=" param to
>> the existing filters:
>>
>>     gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1&q=irish
>>
>> I'm trying to think of how I could re-implement this type of application in
>> gemini. And I can't really come up with a good solution if we assume that user
>> input clears the current query params.
>
>  Add on to the path component?  I did a very short version of that for my
> guessing game:

This would lead to a bunch of different paths that point to the same resource:

    /search/tag_37/tag_5/image_1/
    /search/tag_5/tag_37/image_1/
    /search/image_1/tag_5/tag_37/
    ...

Of course, I could write my server to be able to understand and parse these
path components. But it throws the whole "path as a hierarchy" paradigm out
the window. I can only assume that this issue is why query param key-value
pairs were invented in the first place.

- mozz

Link to individual message.

11. julienXX (julien (a) sideburns.eu)

> ?? [..]? I did a very short version of that for my
> guessing game:
>
> ????gemini://gemini.conman.org/hilo/
>
> ?? -spc
>
I can't get your guessing game working in either Asuka or AV-98. When 
issuing requests
like gemini://gemini.conman.org/hilo/1058?query=32 I still get back a 
status 10 code with
the "Guess..." prompt. Is it supposed to be working?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20190828/0628
7878/attachment.htm>

Link to individual message.

12. Sean Conner (sean (a) conman.org)

It was thus said that the Great Michael Lazar once stated:
> On Wed, Aug 28, 2019 at 11:59 AM Sean Conner <sean at conman.org> wrote:
> >   Wait a second ... I thought gopher URLs don't support the query type.
> > It's not mentioned at all in RFC-4266 (and there are no updates as far as I
> > can see).
> 
> >From RFC-4266:
> 
>    <selector> is the Gopher selector string.  In the Gopher protocol,
>    Gopher selector strings are a sequence of octets that may contain any
>    octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal
>    (US-ASCII character LF), and 0D (US-ASCII character CR).
> 
> Aside for those 3 reserved octets, you can stick anything you want in the
> selector part of the URL. Query string don't have any special meaning to
> gopher, they're just additional bytes in the selector that get sent to the
> server.

  I did try your site with my gopher client and it worked fine.  I'm just
not used to seeing query strings with gopher URLs.

> >  Add on to the path component?  I did a very short version of that for my
> > guessing game:
> 
> This would lead to a bunch of different paths that point to the same resource:
> 
>     /search/tag_37/tag_5/image_1/
>     /search/tag_5/tag_37/image_1/
>     /search/image_1/tag_5/tag_37/
>     ...

  Unless you maintain then in some canonical ordering (say, tags in sort
order, then images in sort order, etc.).  Might be more problem than it's
worth, but it is a solution.

  The path of a URL has nothing to do with a filesystem---it's just a
convention.  The link gemini://gemini.conman.org/qotd is not a file, nor a
directory.  Nor is gemini://gemini.conman.org/bible/ a directory (or a file
for that matter).  

> Of course, I could write my server to be able to understand and parse these
> path components. But it throws the whole "path as a hierarchy" paradigm out
> the window. I can only assume that this issue is why query param key-value
> pairs were invented in the first place.

  The path is an abstract reference to a resource.  It's only hierarchial in
your mind.  There is no spoon.

  -spc

Link to individual message.

13. Sean Conner (sean (a) conman.org)

It was thus said that the Great julienXX once stated:
> >?? [..]? I did a very short version of that for my
> >guessing game:
> >
> >????gemini://gemini.conman.org/hilo/
> >
> >?? -spc
> >
> I can't get your guessing game working in either Asuka or AV-98. When 
> issuing requests
> like gemini://gemini.conman.org/hilo/1058?query=32 I still get back a 
> status 10 code with
> the "Guess..." prompt. Is it supposed to be working?

  I think this is the reqeust your client (not that you wrote the client,
but it's the one you are using) is sending:

	gemini://gemini.conman.org/hilo/1058?query=32

  There's nothing in the spec that says the name is "query".  It should be

	gemini://gemini.conman.org/hilo/1058?32

  Or at least, that's what I'm expecting.  I'm thinking this part of the
spec needs a bit of clarification.

  -spc

Link to individual message.

14. Michael Lazar (lazar.michael22 (a) gmail.com)

On Wed, Aug 28, 2019 at 1:16 PM Sean Conner <sean at conman.org> wrote:
> > Of course, I could write my server to be able to understand and parse these
> > path components. But it throws the whole "path as a hierarchy" paradigm out
> > the window. I can only assume that this issue is why query param key-value
> > pairs were invented in the first place.
>
>   The path is an abstract reference to a resource.  It's only hierarchial in
> your mind.  There is no spoon.
>
>   -spc

I used to think that way too, but then my gemini proxy got pwned by your torture
test :-) Then I discovered that I had to deal with splitting path components,
concatenating relative paths, resolving dot-segments, and a bunch of
other hierarchical
junk that I didn't consider before and am probably still getting wrong now.

All hail the spoon

- mozz

Link to individual message.

15. Sean Conner (sean (a) conman.org)

It was thus said that the Great Michael Lazar once stated:
> On Wed, Aug 28, 2019 at 1:16 PM Sean Conner <sean at conman.org> wrote:
> > > Of course, I could write my server to be able to understand and parse these
> > > path components. But it throws the whole "path as a hierarchy" paradigm out
> > > the window. I can only assume that this issue is why query param key-value
> > > pairs were invented in the first place.
> >
> >   The path is an abstract reference to a resource.  It's only hierarchial in
> > your mind.  There is no spoon.
> >
> >   -spc
> 
> I used to think that way too, but then my gemini proxy got pwned by your torture
> test :-) 

  Heh.  Sorry about that.

> Then I discovered that I had to deal with splitting path components,
> concatenating relative paths, resolving dot-segments, and a bunch of other
> hierarchical junk that I didn't consider before and am probably still
> getting wrong now.

  Well, the resolving dot-segments algorithm is mentioned in RFC-3986,
section 5.2.4, and the URL merging algorithm is section 5.2.2.  Implement
those (and there are test cases in the RFC) and you should be okay.

> All hail the spoon

  -spc (Spoooooooooooooooooooooon!)

Link to individual message.

16. julienXX (julien (a) sideburns.eu)


>    I think this is the reqeust your client (not that you wrote the client,
> but it's the one you are using) is sending:
>
> 	gemini://gemini.conman.org/hilo/1058?query=32
>
>    There's nothing in the spec that says the name is "query".  It should be
>
> 	gemini://gemini.conman.org/hilo/1058?32
>
>    Or at least, that's what I'm expecting.  I'm thinking this part of the
> spec needs a bit of clarification.
>
>    -spc
>
>
Oh my bad, I did not remember I had this keyword in my code.

Link to individual message.

---

Previous Thread: ANN: lobste.rs in Gemini-space

Next Thread: Alive