Hi all. I'm looking into implementing a few endpoints on my server that accept user input, and I ended up with a few ambiguities about the gemini specification that I would like to discuss: ``` The requested resource accepts a line of textual user input. The <META> line is a prompt which should be displayed to the user. The same resource should then be requested again with the user's input included as a query component. Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a ?. There is no response body. ``` Here are my questions: 1. Should the query component be formatted as a "key=value" parameter? Or should it be added directly as the entire query component? A. "gemini://hostname.com/input?q=AbrahamLincoln" vs. B. "gemini://hostname.com/input?AbrahamLincoln" 2. Should the query component allow percent-sign escaping? If so, which characters should be escaped? A. "gemini://hostname.com/input?Hello%20world" vs. B. "gemini://hostname.com/input?Hello world" 3. Should a server be allowed to link to a URL with the user input pre-filled? E.g. Should this link, if placed in a text/gemini file, mean the same thing as the user manually typing in "Hello World"? A. "=>/input?Hello%20world" This also brings up a point that the above link would be impossible to define in a text/gemini file if percent-escaped spaces were not allowed. 4. If my server has an endpoint that does not request user input, can I re-purpose the query section for my own needs? A. "gemini://hostname.com/items?page=2&limit=20" 5. In the above example, what happens if a request to that URL returns a status code of 10? Should the client strip the existing query components from the URL, or append a new key=value pair to the end? 6. What widget should the client use to display the input prompt? A single line input, or a multi-line text box? Should newline characters even be allowed? 7. Should there be a maximum input length? Currently it is implicitly defined as 1024 bytes minus the length of the URL. Personally, I have mixed feelings about gemini enabling user input in the first place. I know that gopher supports it using the "7" item type, but I have only seen a couple of compelling use cases for this in the wild. On the other hand, it opens up a Pandora's box of complexity by allowing creative developers to port over many features from HTTP. I can already envision XSS style attacks. For example, say a gemini page requests user input that will be rendered in a public text/gemini page like a guestbook. The bad-actor submits a message that contains something like: "\n=>http://malicious.url\thttp://innocent.url". Servers will need to be diligent in sanitizing their inputs, which kind of sucks for what should be a simple protocol. - mozz
It was thus said that the Great Michael Lazar once stated: > Hi all. I'm looking into implementing a few endpoints on my server that accept > user input, and I ended up with a few ambiguities about the gemini specification > that I would like to discuss: > > ``` > The requested resource accepts a line of textual user input. > The <META> line is a prompt which should be displayed to the > user. The same resource should then be requested again with > the user's input included as a query component. Queries are > included in requests as per the usual generic URL definition > in RFC3986, i.e. separated from the path by a ?. There is no > response body. > ``` I think that should read "there is no request body"---the response can include content, dependent upon the status code. > Here are my questions: > > 1. Should the query component be formatted as a "key=value" parameter? > Or should it be added directly as the entire query component? > > A. "gemini://hostname.com/input?q=AbrahamLincoln" > vs. > B. "gemini://hostname.com/input?AbrahamLincoln" B, unless there's a way to designate the variable name (which I did suggest to solderpunk---perhaps I should float it here). > 2. Should the query component allow percent-sign escaping? > If so, which characters should be escaped? > > A. "gemini://hostname.com/input?Hello%20world" > vs. > B. "gemini://hostname.com/input?Hello world" The following characters in the query should be escaped: SPACE # % < > [ \ ] ^ { | } " and unless you are sending name/value pairs, = & should also be escaped (this per RFC-3986). > 3. Should a server be allowed to link to a URL with the user input pre-filled? > E.g. Should this link, if placed in a text/gemini file, mean the same thing > as the user manually typing in "Hello World"? > > A. "=>/input?Hello%20world" I don't see why not. gemini://gemini.conman.org/cgi?Hello%20World (this link works, by the way---although it's just a sample CGI script) > 4. If my server has an endpoint that does not request user input, can I > re-purpose the query section for my own needs? > > A. "gemini://hostname.com/items?page=2&limit=20" Again, I don't see why not. The query string is part of a URL, and clients send URLs so this should be an issue client side. What the server side does with the query is up to the server. > 5. In the above example, what happens if a request to that URL returns a status > code of 10? Should the client strip the existing query components from the > URL, or append a new key=value pair to the end? That's a good question and one I do not have an answer to. > 6. What widget should the client use to display the input prompt? A single line > input, or a multi-line text box? Should newline characters even be allowed? It would depend upon the client. I think the expectation is a single line input, but I can see a multi-line box being useful as well. > 7. Should there be a maximum input length? Currently it is implicitly defined > as 1024 bytes minus the length of the URL. > > Personally, I have mixed feelings about gemini enabling user input in the first > place. I know that gopher supports it using the "7" item type, but I have only > seen a couple of compelling use cases for this in the wild. On the other hand, > it opens up a Pandora's box of complexity by allowing creative developers to > port over many features from HTTP. True, and I've moved quite a few of my web-based projects to both Gemini [1] and Gopher. > I can already envision XSS style attacks. For example, say a gemini page > requests user input that will be rendered in a public text/gemini page like a > guestbook. The bad-actor submits a message that contains something like: > > "\n=>http://malicious.url\thttp://innocent.url". > > Servers will need to be diligent in sanitizing their inputs, which kind of sucks > for what should be a simple protocol. Experiement---point your favorite terminal based gopher client here: gopher://verisimilitudes.net/02019-08-18.ecma-48&utf-8 or gopher://verisimilitudes.net/02019-08-18.ecma-48%26utf-8 This page directly embeds ECMA-48 escape sequences (aka ANSI escape codes) in the text---what does your favorite terminal based gopher client do? Does it send the raw codes directly to the terminai? Does it filter them out entirely? Does it show them? [2] Also, there's nothing that says a server *has* to support user input. -spc ( [1] I even created a CGI interface to my Gemini server, and it can even translate HTTP status codes to Gemini ones. [2] My own gopher client (not yet published) does filter out escape sequences and not because of this page---it was a deliberate design choice I made when writing after doing a very deep dive into ECMA-48 in the past year or two.
> I think that should read "there is no request body"---the response can > include content, dependent upon the status code. Once the user *submits* their input, the response to *that* request can of course have any status code and a possible response body, but in that part of the spec (maybe it's unclear and needs changing) I'm talking about the response with code 10, which I shouldn't have a body, as it's just delivering a prompt and the information that this resource wants an input. > > A. "gemini://hostname.com/input?q=AbrahamLincoln" > > vs. > > B. "gemini://hostname.com/input?AbrahamLincoln" > > B, unless there's a way to designate the variable name (which I did > suggest to solderpunk---perhaps I should float it here). What I had in mind, and what I think most implementations do so far, is indeed B. It's possible I coud be convinced otherwise, but I don't really see the value in speccing a fixed variable named like `q`. It's perfectly cromulent according to the URL RFC to use treat the query as a string. The key=value pair syntax is common in the web world mostly as a way to make HTML forms work, and we don't have forms. I *did* consider a 1x status code whose <META> was some kind of machine-readable description of a form, but that doesn't degrade nicely in simple clients which ignore the second status digit, as then the human user has to imagine the form in their head and type a suitable response. > The following characters in the query should be escaped: > > SPACE # % < > [ \ ] ^ { | } " > > and unless you are sending name/value pairs, > > = & > > should also be escaped (this per RFC-3986). Yes, this is right. Remember, Gemini requests are URLs, and we don't make the rules for URLs, we (hopefully!) follow them. > > A. "=>/input?Hello%20world" > > I don't see why not. > > gemini://gemini.conman.org/cgi?Hello%20World > I don't see why not either. > > A. "gemini://hostname.com/items?page=2&limit=20" > > Again, I don't see why not. The query string is part of a URL, and > clients send URLs so this should be an issue client side. What the server > side does with the query is up to the server. Agreed. > > 5. In the above example, what happens if a request to that URL returns a status > > code of 10? Should the client strip the existing query components from the > > URL, or append a new key=value pair to the end? Hmm. If a client requests the URL above, it should include the query string in the request. So why would the server respond with a status 10 in that case? I mean, it's currently not prohibited in the spec for a server to do that, so this is a fair question. I'm not sure whether we
solderpunk writes: >> > 5. In the above example, what happens if a request to that URL >> > returns a status code of 10? Should the client strip the existing >> > query components from the URL, or append a new key=value pair to >> > the end? > > Hmm. If a client requests the URL above, it should include the query > string in the request. So why would the server respond with a status > 10 in that case? I mean, it's currently not prohibited in the spec for > a server to do that, so this is a fair question. I'm not sure whether > we *should* forbid it or spec some sensible client response. Did you > have a use case in mind or are you just keeping an eye out for edge > cases? I can imagine a client asking successive questions in response to answers. But I'm not sure what the client should do. We might want to specify that Gemini URLs should not contain query parts except in requests initiated by 1x responses. Or that the query part should never be significant in a published URL; that it should be a pre-filled suggestion that can be modified by the client. -- Jason McBrayer | ?Strange is the night where black stars rise, jmcbray at carcosa.net | and strange moons circle through the skies, | but stranger still is lost Carcosa.? | ? Robert W. Chambers,The King in Yellow
It was thus said that the Great Jason McBrayer once stated: > We might want to specify that Gemini URLs should not contain query parts > except in requests initiated by 1x responses. What if I want to bookmark some search results? > Or that the query part should never be significant in a published URL; See above. > that it should be a pre-filled suggestion that can be modified by the > client. One interesting aspect of Apache is that a generated index page accepts a query string to sort the output. "C=N" sorts by name; "C=S" sorts by size (then by name); "O=A" sorts ascending; "O=D" sorts decending. I would hate to lose the ability to do that (not that I use it now). Again, a server doesn't have to support that, but it could. -spc (Or am I being evil for trying to bring too much of the web into Gemini?)
Sean Conner writes: > -spc (Or am I being evil for trying to bring too much of the web into > Gemini?) No, bookmarking search results is a valid use-case. Maybe servers should not be allowed to return a 1x response to a request with a query string? -- +----------------------------------------------------------------------+ | Jason F. McBrayer jmcbray at carcosa.net | | The scalloped tatters of the King in Yellow must hide Yhtill forever.|
It was thus said that the Great Jason McBrayer once stated: > > Sean Conner writes: > > -spc (Or am I being evil for trying to bring too much of the web into > > Gemini?) > > No, bookmarking search results is a valid use-case. > > Maybe servers should not be allowed to return a 1x response to a request > with a query string? => gemini://gemini.conman.org/hilo/ A Simple Guessing Game -spc
Jason McBrayer wrote: > solderpunk writes: > >> > 5. In the above example, what happens if a request to that URL > >> > returns a status code of 10? Should the client strip the existing > >> > query components from the URL, or append a new key=value pair to > >> > the end? > > > > Hmm. If a client requests the URL above, it should include the query > > string in the request. So why would the server respond with a status > > 10 in that case? I mean, it's currently not prohibited in the spec for > > a server to do that, so this is a fair question. I'm not sure whether > > we *should* forbid it or spec some sensible client response. Did you > > have a use case in mind or are you just keeping an eye out for edge > > cases? > > I can imagine a client asking successive questions in response to > answers. But I'm not sure what the client should do. We might want to > specify that Gemini URLs should not contain query parts except in > requests initiated by 1x responses. Or that the query part should never > be significant in a published URL; that it should be a pre-filled > suggestion that can be modified by the client. Here's the example that I was thinking about when I wrote this question. I have a gopher site that acts as a search engine for drink recipes [1]. Users can filter their search results based on predefined tags, or they can search by keyword using a gopher menu "7" query. After adding some search filters, your path might look like this*: gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1 And then if you submit a search to that path, it will append the "q=" param to the existing filters: gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1&q=irish I'm trying to think of how I could re-implement this type of application in gemini. And I can't really come up with a good solution if we assume that user input clears the current query params. This is admittedly a fringe case and I haven't seen anybody else do this type of advanced search filtering in gopher. But it's fairly common to use query params like this in HTTP when designing REST-like endpoints. For example, I might have a link at the top of my search results that sets "?limit=20" to control how many items are returned. I would want the user to be able to submit a new search string while preserving their previous choice for the limit.
It was thus said that the Great Michael Lazar once stated: > > Here's the example that I was thinking about when I wrote this question. I have > a gopher site that acts as a search engine for drink recipes [1]. Users can > filter their search results based on predefined tags, or they can search by > keyword using a gopher menu "7" query. > > After adding some search filters, your path might look like this*: > > gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1 Wait a second ... I thought gopher URLs don't support the query type. It's not mentioned at all in RFC-4266 (and there are no updates as far as I can see). > And then if you submit a search to that path, it will append the "q=" param to > the existing filters: > > gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1&q=irish > > I'm trying to think of how I could re-implement this type of application in > gemini. And I can't really come up with a good solution if we assume that user > input clears the current query params. Add on to the path component? I did a very short version of that for my guessing game: gemini://gemini.conman.org/hilo/ -spc
On Wed, Aug 28, 2019 at 11:59 AM Sean Conner <sean at conman.org> wrote: > Wait a second ... I thought gopher URLs don't support the query type. > It's not mentioned at all in RFC-4266 (and there are no updates as far as I > can see). >From RFC-4266: <selector> is the Gopher selector string. In the Gopher protocol, Gopher selector strings are a sequence of octets that may contain any octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal (US-ASCII character LF), and 0D (US-ASCII character CR). Aside for those 3 reserved octets, you can stick anything you want in the selector part of the URL. Query string don't have any special meaning to gopher, they're just additional bytes in the selector that get sent to the server. >> And then if you submit a search to that path, it will append the "q=" param to >> the existing filters: >> >> gopher://mozz.us:7003/1/search?tag=37&tag=5&image=1&q=irish >> >> I'm trying to think of how I could re-implement this type of application in >> gemini. And I can't really come up with a good solution if we assume that user >> input clears the current query params. > > Add on to the path component? I did a very short version of that for my > guessing game: This would lead to a bunch of different paths that point to the same resource: /search/tag_37/tag_5/image_1/ /search/tag_5/tag_37/image_1/ /search/image_1/tag_5/tag_37/ ... Of course, I could write my server to be able to understand and parse these path components. But it throws the whole "path as a hierarchy" paradigm out the window. I can only assume that this issue is why query param key-value pairs were invented in the first place. - mozz
> ?? [..]? I did a very short version of that for my > guessing game: > > ????gemini://gemini.conman.org/hilo/ > > ?? -spc > I can't get your guessing game working in either Asuka or AV-98. When issuing requests like gemini://gemini.conman.org/hilo/1058?query=32 I still get back a status 10 code with the "Guess..." prompt. Is it supposed to be working? -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20190828/0628 7878/attachment.htm>
It was thus said that the Great Michael Lazar once stated: > On Wed, Aug 28, 2019 at 11:59 AM Sean Conner <sean at conman.org> wrote: > > Wait a second ... I thought gopher URLs don't support the query type. > > It's not mentioned at all in RFC-4266 (and there are no updates as far as I > > can see). > > >From RFC-4266: > > <selector> is the Gopher selector string. In the Gopher protocol, > Gopher selector strings are a sequence of octets that may contain any > octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal > (US-ASCII character LF), and 0D (US-ASCII character CR). > > Aside for those 3 reserved octets, you can stick anything you want in the > selector part of the URL. Query string don't have any special meaning to > gopher, they're just additional bytes in the selector that get sent to the > server. I did try your site with my gopher client and it worked fine. I'm just not used to seeing query strings with gopher URLs. > > Add on to the path component? I did a very short version of that for my > > guessing game: > > This would lead to a bunch of different paths that point to the same resource: > > /search/tag_37/tag_5/image_1/ > /search/tag_5/tag_37/image_1/ > /search/image_1/tag_5/tag_37/ > ... Unless you maintain then in some canonical ordering (say, tags in sort order, then images in sort order, etc.). Might be more problem than it's worth, but it is a solution. The path of a URL has nothing to do with a filesystem---it's just a convention. The link gemini://gemini.conman.org/qotd is not a file, nor a directory. Nor is gemini://gemini.conman.org/bible/ a directory (or a file for that matter). > Of course, I could write my server to be able to understand and parse these > path components. But it throws the whole "path as a hierarchy" paradigm out > the window. I can only assume that this issue is why query param key-value > pairs were invented in the first place. The path is an abstract reference to a resource. It's only hierarchial in your mind. There is no spoon. -spc
It was thus said that the Great julienXX once stated: > >?? [..]? I did a very short version of that for my > >guessing game: > > > >????gemini://gemini.conman.org/hilo/ > > > >?? -spc > > > I can't get your guessing game working in either Asuka or AV-98. When > issuing requests > like gemini://gemini.conman.org/hilo/1058?query=32 I still get back a > status 10 code with > the "Guess..." prompt. Is it supposed to be working? I think this is the reqeust your client (not that you wrote the client, but it's the one you are using) is sending: gemini://gemini.conman.org/hilo/1058?query=32 There's nothing in the spec that says the name is "query". It should be gemini://gemini.conman.org/hilo/1058?32 Or at least, that's what I'm expecting. I'm thinking this part of the spec needs a bit of clarification. -spc
On Wed, Aug 28, 2019 at 1:16 PM Sean Conner <sean at conman.org> wrote: > > Of course, I could write my server to be able to understand and parse these > > path components. But it throws the whole "path as a hierarchy" paradigm out > > the window. I can only assume that this issue is why query param key-value > > pairs were invented in the first place. > > The path is an abstract reference to a resource. It's only hierarchial in > your mind. There is no spoon. > > -spc I used to think that way too, but then my gemini proxy got pwned by your torture test :-) Then I discovered that I had to deal with splitting path components, concatenating relative paths, resolving dot-segments, and a bunch of other hierarchical junk that I didn't consider before and am probably still getting wrong now. All hail the spoon - mozz
It was thus said that the Great Michael Lazar once stated: > On Wed, Aug 28, 2019 at 1:16 PM Sean Conner <sean at conman.org> wrote: > > > Of course, I could write my server to be able to understand and parse these > > > path components. But it throws the whole "path as a hierarchy" paradigm out > > > the window. I can only assume that this issue is why query param key-value > > > pairs were invented in the first place. > > > > The path is an abstract reference to a resource. It's only hierarchial in > > your mind. There is no spoon. > > > > -spc > > I used to think that way too, but then my gemini proxy got pwned by your torture > test :-) Heh. Sorry about that. > Then I discovered that I had to deal with splitting path components, > concatenating relative paths, resolving dot-segments, and a bunch of other > hierarchical junk that I didn't consider before and am probably still > getting wrong now. Well, the resolving dot-segments algorithm is mentioned in RFC-3986, section 5.2.4, and the URL merging algorithm is section 5.2.2. Implement those (and there are test cases in the RFC) and you should be okay. > All hail the spoon -spc (Spoooooooooooooooooooooon!)
> I think this is the reqeust your client (not that you wrote the client, > but it's the one you are using) is sending: > > gemini://gemini.conman.org/hilo/1058?query=32 > > There's nothing in the spec that says the name is "query". It should be > > gemini://gemini.conman.org/hilo/1058?32 > > Or at least, that's what I'm expecting. I'm thinking this part of the > spec needs a bit of clarification. > > -spc > > Oh my bad, I did not remember I had this keyword in my code.
---