We already have good support for multiple encodings and (in the case of text/gemini) languages. However, two questions arise: a) What character encoding is used for META parts intended for human consumption? TL;dr answer: UTF-8. b) What language is used for those META parts, since the server does not know what languages are acceptable to the user? TL;dr answer: start with English, add other languages as necessarily or useful. Details: BCP 18, IETF Policy on Character Sets and Languages < https://tools.ietf.org/html/bcp18>, says what a spec should say about character sets and languages. The MUSTard of this BCP is: 1) Specs MUST say which parts of the protocol are meant to be human-readable. The answer should be that the META of status lines 1x, 4x (except 44), 5x, and 6x are human-readable and everything else is part of the protocol. 2) Protocols MUST specify which character encoding is in use, and it MUST be possible for it to be UTF-8. Nailing that down for human-readable META text is what needs to be done. See (a). 3) Encodings that are used MUST be in the IANA registry. Because we are using media types, that happens already. No action needed. 4) Protocols MUST have a way (which can be a default) of communicating the encoding in use. Fixing (2) will fix this one also. 5) Protocols in which users have text presented to them MUST have a way of dealing with multiple languages. We have a problem here for 1x that isn't trivial to solve: what should a Russian search engine indexing both English and Russian documents return as the META to a 1x response? (6) is one approach. 6) Where there is no ability to negotiate languages (Gemini doesn't), then "i-default" language SHOULD be used. "i-default" text MUST be understandable to an English-speaking person, but MAY include text in other languages if appropriate (e.g. the languages of the capsule or server). See (b) and (6). 7) Protocols SHOULD use BCP 47 language tags to specify languages. We do. 8) Material on i18n SHOULD be collected into a special section so that it can be found by people concerned with i18n or L10n. That one's up to Solderpunk, though it will be necessary if the spec becomes one or more RFCs. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201227/d573 6fe1/attachment.htm>
On Sun, Dec 27, 2020 at 02:40:42PM -0500, John Cowan wrote: > b) What language is used for those META parts, since the server does not > know what languages are acceptable to the user? TL;dr answer: start with > English, add other languages as necessarily or useful. The best-case scenario, of course, is that everybody sees the human-readable META parts in their own language. The issue with that is either the client has to specify what language they expect it in, or the server has to provide it in every language it supports. Both are obviously flawed. One counterproposal to this best-case scenario is that the response body being sent over (for successful requests) is also (probably) only in a single language. It would thus be natural to have the whole interface in that same language. If the server offers the same file / page in different languages, they will have different URLs (most commonly <lang>.example.com/... or example.com/<lang>/...). In both of these cases, the server can easily recognize from the URL what language is expected and should provide an interface (including human-readable META text) in that same language. That would mean, for example, that the entire ://fr.example.com site should use a French interface. We probably also want to disallow using example.com/...?lang=<lang> or anything similar, even if it's just in the Best Practices document. It's the server's responsibility, but also their prerogative, to provide an interface in multiple languages. If they don't, and if the users of that server choose not to as well, then it is up to the client (and the user controlling it) to translate stuff. text/gemini's lang parameter helps here. I think that this proposal resolves the general interface language issue. Have I missed anything? ~aravk | ~nothien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201227/028f 134d/attachment.sig>
On Sun, Dec 27, 2020 at 3:06 PM Arav K. <nothien at uber.space> wrote: > the server can easily recognize from the URL what language is > expected and should provide an interface (including human-readable META > text) in that same language. That would mean, for example, that the > entire ://fr.example.com site should use a French interface. And if you request gemini://example.com/la/non-exsistens.gmi and there is no support for Latin error messages, as there probably is not? Then what language should be used? With the exception of 1x responses, human-readable <META> reflects error situations, where by definition the server doesn't know what the user can or cannot understand. > We probably also want to disallow using example.com/...?lang=<lang> or > anything similar, even if it's just in the Best Practices document. > I have no idea why you would want to disallow that. Changes to the query string *are* changes to the URL, so that a particular language could be equally well indicated using the domain, the path, or the query, depending on the server's conventions. It's the server's responsibility, but also their prerogative, to provide > an interface in multiple languages. If they don't, and if the users of > that server choose not to as well, then it is up to the client (and the > user controlling it) to translate stuff. That's ideal, but it's a big burden on the client, which has to use something as general as Google Translate to convert the Russian error message being returned by the server to the Welsh expected by the user. text/gemini's lang parameter helps here. > Not really: again, we are talking about the language of error messages. Another point is that people often google for the meaning of error messages, and that's made easier if they always look the same, or at least some part of them always looks the same. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org Where the wombat has walked, it will inevitably walk again. (even through brick walls!) -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201227/fa65 6b98/attachment-0001.htm>
It was thus said that the Great Arav K. once stated: > On Sun, Dec 27, 2020 at 02:40:42PM -0500, John Cowan wrote: > > b) What language is used for those META parts, since the server does not > > know what languages are acceptable to the user? TL;dr answer: start with > > English, add other languages as necessarily or useful. > > The best-case scenario, of course, is that everybody sees the > human-readable META parts in their own language. The issue with that is > either the client has to specify what language they expect it in, or the > server has to provide it in every language it supports. Both are > obviously flawed. Here's a list of resonse codes with the type of META information they use: 10 prompt, human text 11 prmopt, human text 20 MIME type 30 URI 31 URI 40 error message, human text 41 error message, human text 42 error message, human text 43 error message, human text 44 SECONDS 50 error message, human text 51 error message, human text 52 error message, human text 53 error message, human text 59 error message, human text 60 error message, numan text 61 error message, human text 62 error message, human text The META types for the ranges 40-62 are a formality and can be safely ignored (I'm talking about the human text portion, not the actual status code) except for 44 which contains machine usable data. My own server just spits out a generic text entry for each error code (the specific error is logged on my end---there's no need for me to send such info to the client). It's really the META data for response codes 10 and 11 that need to be displayed directly to the user. How to deal with languages here is difficult. -spc
> On Dec 27, 2020, at 23:57, Sean Conner <sean at conman.org> wrote: > > It's really the META data for response codes 10 and 11 that need to be > displayed directly to the user. How to deal with languages here is > difficult. Could we prefix META with a language tag and call it a day? 10 EN Indica or Sativa? ???
It was thus said that the Great Petite Abeille once stated: > > > > On Dec 27, 2020, at 23:57, Sean Conner <sean at conman.org> wrote: > > > > It's really the META data for response codes 10 and 11 that need to be > > displayed directly to the user. How to deal with languages here is > > difficult. > > Could we prefix META with a language tag and call it a day? > > 10 EN Indica or Sativa? ??? Potentially breaking change, and you see how people are reacting to the IRI/IDN threads. -spc
> On Dec 28, 2020, at 00:13, Sean Conner <sean at conman.org> wrote: > > Potentially breaking change, Not really. META is free form text. We could just structure it a bit by prefixing a language tag. No one gets hurt: it's just a display issue.
> On Dec 28, 2020, at 00:04, Petite Abeille <petite.abeille at gmail.com> wrote: > > Could we prefix META with a language tag and call it a day? Alternative/additionally could we use the X.509 certificate structure to shoehorn such information? There are a lot of free form text in there... Both in term of language negotiation: META matches the client certificate, if any. And tagging: the server certificate advertise the META language. A bit of a side-channel, but why not. Perhaps overdoing it though.
On Sun, Dec 27, 2020 at 6:04 PM Petite Abeille <petite.abeille at gmail.com> wrote: Could we prefix META with a language tag and call it a day? > > 10 EN Indica or Sativa? ??? I think that's the wrong way around. The server doesn't normally have to tell the client what language it's using (though there are obvious bad cases like "Chat?") The problem is that the client can't tell the server what language the user would like to be prompted in. Now that I think about it, though, that *can* be encoded in the URL readily enough, though not in a universal way If a text/gemini file is in Greek, the textual part of a link line will also normally be Greek, in which case the URL should have "gr" in it someplace (assuming the server can handle it). -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201227/d9f0 1395/attachment.htm>
lun. 28 d?c. 2020 ? 03:04, cowan at ccil.org a ?crit?: > On Sun, Dec 27, 2020 at 6:04 PM Petite Abeille <petite.abeille at gmail.com> > wrote: > > Now that I think about it, though, that *can* be encoded in the URL readily > enough, though not in a universal way If a text/gemini file is in Greek, > the textual part of a link line will also normally be Greek, in which case > the URL should have "gr" in it someplace (assuming the server can handle > it). I have a similar thought: I think we should somehow avoid too generic input. I mean, when the user is prompted by an input, it?s normally after having click on some link. Thus maybe we should think differently. For an internalized website, it?s easy to imagine different section, each one for different language. And thus each of these pages will be on a different specific language, maybe reflected in their URL. Then, it?s up to the CGIs running the site to be able, for a similar function, to serve an input request with the correct language, following the page where the user was when they click (again, obviously because of some difference in the URL). Said otherwise, we should maybe avoid to think too much on a specific problem, and think it again as part of a much broader situation, with easier solutions around the corner. -- ?tienne Deparis gemini://alltext.umaneti.net/ xmpp: etienne at depar.is
> On Dec 28, 2020, at 03:04, John Cowan <cowan at ccil.org> wrote: > > I think that's the wrong way around. Oh, then I misunderstood the problematic. I thought we had to systematically tag any end-user oriented text. My bad. Apologies.
On Sun, Dec 27, 2020 at 03:41:14PM -0500, John Cowan wrote: > And if you request gemini://example.com/la/non-exsistens.gmi and there > is no support for Latin error messages, as there probably is not? > Then what language should be used? With the exception of 1x > responses, human-readable <META> reflects error situations, where by > definition the server doesn't know what the user can or cannot > understand. If the server has a Latin section, it is expected to have a complete Latin interface. And the language that the user is expecting is generally encoded into the URL itself, as others have mentioned: the server knows that the /la/ section is requested, so it can use Latin error messages. > I have no idea why you would want to disallow that. Changes to the > query string *are* changes to the URL, so that a particular language > could be equally well indicated using the domain, the path, or the > query, depending on the server's conventions. Because we don't want the query string to be used as it is in HTML, i.e. for arbitrary parameters. Using ?lang=<lang> is setting an arguably dangerous precedent. > That's ideal, but it's a big burden on the client, which has to use > something as general as Google Translate to convert the Russian error > message being returned by the server to the Welsh expected by the > user. You're right, clients can't do translation. But the idea is that if you came across a site that only had <insert language you don't understand>, and you really wanted to see it, you would translate it manually. Similarly, if you use the <language> interface / section of a site, it's your responsibility (not the server's) to translate it. If the site offers a language interface / section that you do understand, use that. Otherwise, you'll have to translate. > Not really: again, we are talking about the language of error messages. You're right, never mind. > Another point is that people often google for the meaning of error > messages, and that's made easier if they always look the same, or at > least some part of them always looks the same. That's the whole point of the status code. The user's client can also present a generic description of the status code (in the user's language of choice) in addition to the error message from the META line. The user can reasonably expect that the error messages are in the language of the interface / section specified in the URL, e.g. requesting gemini://example.com/la/~foo could return 51 "non est usor". If the user doesn't understand Latin and has still for some reason requested the Latin interface URL, they can still get a good idea of what's going on thanks to the code 51, which their client can/should explain as "Permanent Failure - Not Found". ~aravk | ~nothien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201228/53b7 6d84/attachment.sig>
On Sun Dec 27, 2020 at 9:41 PM CET, John Cowan wrote: > On Sun, Dec 27, 2020 at 3:06 PM Arav K. <nothien at uber.space> wrote: > > > > the server can easily recognize from the URL what language is > > expected and should provide an interface (including human-readable META > > text) in that same language. That would mean, for example, that the > > entire ://fr.example.com site should use a French interface. > > > And if you request gemini://example.com/la/non-exsistens.gmi and there > is > no support for Latin error messages, as there probably is not? Then what > language should be used? With the exception of 1x responses, > human-readable <META> reflects error situations, where by definition the > server doesn't know what the user can or cannot understand. One of the motivations for having the second digits clarifying the exact nature of the error (besides allowing useful logging on the server side for identifying problems, and allowing the writing of more robust bots) was that clients could use them to provide *some* degree of localised error message. E.g. if a server written by an English-speaking programmer sends back "51 Not found", a client with a Finnish language interface could recognise the 51 status code and say to its users: > Ei l?ytynyt! Palvelin sanoi: "Not found" which a non-English-reading Finn would perceive as: > Not found! Server said: "<mysterious foreign message>" Which is slightly better than just: > Server said: "<mysterious foreign message>" And for people who read *some* English (or whatever language the server uses for errors) but not very much, having a localised translation of the error category first might be enough context to enable them to make enough sense of the full error message to have some understanding of what's going on. Cheers, Solderpunk
> On Dec 28, 2020, at 11:10, Solderpunk <solderpunk at posteo.net> wrote: > > One of the motivations for having the second digits clarifying the exact > nature of the error Yes, but this doesn't help with 1x responses. They are meant to be presented to the end user, as a prompt. And they lack a language tag. This is my understanding of the crux of the issue. But perhaps I missed something.
On Mon, Dec 28, 2020 at 11:45:56AM +0100, Petite Abeille wrote: > Yes, but this doesn't help with 1x responses. They are meant to be > presented to the end user, as a prompt. And they lack a language tag. > This is my understanding of the crux of the issue. But perhaps I > missed something. My point was that the client is communicating the language the server should use in the URL itself, e.g. for gemini://fr.example.com/search the server would send a prompt in French. If no language is specified, the server should assume a default language, which would often be English. I thought this didn't need to be mentioned, but if the server supports the URL gemini://fr.example.com/, then it is expected to have a French interface for it. If it doesn't support a language, then it just shouldn't offer it, and it then becomes the user's responsibility to translate appropriately. ~aravk | ~nothien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201228/2d3e 964f/attachment.sig>
> On Dec 28, 2020, at 11:57, Arav K. <nothien at uber.space> wrote: > > I thought this didn't need to be mentioned, but if the server supports > the URL gemini://fr.example.com/, then it is expected to have a French > interface for it. If it doesn't support a language, then it just > shouldn't offer it, and it then becomes the user's responsibility to > translate appropriately. Sounds reasonable enough to me: language tags are conveyed through the expedient of embedding them in the user generated content, i.e. the URL, by convention. And not the protocol machinery, i.e. status codes, as per the specification. I thought that John's point of contention was the protocol machinery, as opposed to the user generated content.
Le dimanche 27 d?cembre 2020, 20:40:42 CET John Cowan a ?crit : > b) What language is used for those META parts, since the server does not > know what languages are acceptable to the user? TL;dr answer: start with > English, add other languages as necessarily or useful. I agree with what has been said by some, the text in META should be in the same language as the rest of what the server is hosting. If it is a multi-language capsule, most likely there is an indication in the address to convey language choice. Or in the session if using client certs or whatever to keep a session open. In all cases, the server knows in which language the pages are and should use the same one for META.
On Mon, Dec 28, 2020 at 4:12 AM Arav K. <nothien at uber.space> wrote: > If the server has a Latin section, it is expected to have a complete > Latin interface. That makes very little sense to me. It's true that www.vatican.va provides a Latin user interface, but a site presenting English law is going to have an English interface only, even though the older laws are in Latin or Old Norman French. *Nobody* needs an Old Norman French user interface. Similarly, gutenberg.org provides only an English interface, even though it provides e-books in 55 languages, from 37527 in English and 2356 in French down to 21 languages with a single book each. (There are other PG-like sites in and for many countries; see the WP article.) > Because we don't want the query string to be used as it is in HTML, i.e. > for arbitrary parameters. Using ?lang=<lang> is setting an arguably > dangerous precedent. > I can't agree there either. The only requirement in Gemini imposed on the query string is that the URL sent after a 10 or 11 response contains whatever the user entered as the query string. There is nothing to prevent link lines from containing query strings themselves. In the Gemini PG interface I plan to write as soon as I have a chance, the UI will be entirely in English, the only language I speak. When you are searching, you can include words in the query like "lang:en" or "media:text/plain" or "author:Twain", as well as plain words in any script, more or less like Google Search. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org All Gaul is divided into three parts: the part that cooks with lard and goose fat, the part that cooks with olive oil, and the part that cooks with butter. --David Chessler -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201230/05e5 ba76/attachment-0001.htm>
John Cowan <cowan at ccil.org> wrote: > That makes very little sense to me. It's true that www.vatican.va > provides a Latin user interface, but a site presenting English law is > going to have an English interface only, even though the older laws > are in Latin or Old Norman French. *Nobody* needs an Old Norman > French user interface. Sorry, I wasn't clear. When I said "Latin section", I meant "section meant for Latin users". The site you're talking about isn't meant for Latin-speaking users, so it doesn't have a Latin interface. That's perfectly fine. > Similarly, gutenberg.org provides only an English interface, even > though it provides e-books in 55 languages, from 37527 in English and > 2356 in French down to 21 languages with a single book each. (There > are other PG-like sites in and for many countries; see the WP > article.) One example doesn't make the rule. I would argue that Project Gutenberg _should_ have interfaces in other languages, because it is offering (some) content that is almost exclusively going to be consumed by people speaking (possibly only) these other languages. I'm assuming that not that many monolingual English speakers/readers are reading those 2,356 French books - Mainly French-speaking people are reading those books, and a French interface should be made available to them. I can understand, however, that PG doesn't currently have the resources to translate its interface. But that doesn't mean that it should not be a goal. > I can't agree there either. The only requirement in Gemini imposed on > the query string is that the URL sent after a 10 or 11 response > contains whatever the user entered as the query string. There is > nothing to prevent link lines from containing query strings > themselves. We're talking about different things. I'm not talking about link lines and query strings. My point of contention is the use of HTML-style (<key>=<value>)* formatting for query strings. > In the Gemini PG interface I plan to write as soon as I have a chance, > the UI will be entirely in English, the only language I speak. When > you are searching, you can include words in the query like "lang:en" > or "media:text/plain" or "author:Twain", as well as plain words in any > script, more or less like Google Search. That makes perfect sense: you only speak English, you only design English interfaces. I have absolutely no problem with that. But you should at least open up the possibility of having other interfaces, even if you don't write them yourself. Non-English-readers will thank you for it. ~aravk | ~nothien
On Wed, Dec 30, 2020 at 3:25 AM <nothien at uber.space> wrote: Sorry, I wasn't clear. When I said "Latin section", I meant "section > meant for Latin users". The site you're talking about isn't meant for > Latin-speaking users, so it doesn't have a Latin interface. That's > perfectly fine. > Ah, okay. > One example doesn't make the rule. I would argue that Project Gutenberg > _should_ have interfaces in other languages, because it is offering > (some) content that is almost exclusively going to be consumed by people > speaking (possibly only) these other languages. Up to a point, certainly. The tail end of languages probably don't need interfaces. > We're talking about different things. I'm not talking about link lines > and query strings. My point of contention is the use of HTML-style > (<key>=<value>)* formatting for query strings. > Suppose you have written an essay in French on the novels of Jules Verne in text/gemini format, and you want to link to a collection of the novels themselves. You can then insert this link line: => gemini:// gemguten.example.com/advsearch.gmi?lang=fr&author=Jules&author=Verne Les oeuvres de Jules Verne en fran?ais The user who selects this link will receive a text/gemini document, something like this: => gemini://gemguten.example.com/etext/5082 Verne, Jules, 1828-1905. Le chate?u des Carpathe. [fr] => gemini://gemguten.example.com/etext/8174 Verne, Jules, 1828-1905. K?raban-Le-T?tu, Volume I. [fr] => gemini://gemguten.example.com/etext/17832 Verne, Jules, 1828-1905. Une ville flottante. [fr] ... Since this is not an _interactive_ search, the Gemini conventions about status 1x and the query string don't apply. But you > should at least open up the possibility of having other interfaces, even > if you don't write them yourself. Non-English-readers will thank you > for it. > Absolutely. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org If I have not seen as far as other giants, it?s because I have been standing on my head. --Trond Engen -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201230/4c0c 71fa/attachment-0001.htm>
John Cowan <cowan at ccil.org> wrote: > Up to a point, certainly. The tail end of languages probably don't > need interfaces. Yep, makes sense. Too much work (unless someone's willing to do it all for you). > Suppose you have written an essay in French on the novels of Jules > Verne in text/gemini format, and you want to link to a collection of > the novels themselves. You can then insert this link line: > > => gemini://gemguten.example.com/advsearch.gmi?lang=fr&author=Jules&autho r=Verne Les oeuvres de Jules Verne en fran?ais > > The user who selects this link will receive a text/gemini document, > something like this: > > => gemini://gemguten.example.com/etext/5082 Verne, Jules, 1828-1905. Le chate?u des Carpathe. [fr] > => gemini://gemguten.example.com/etext/8174 Verne, Jules, 1828-1905. K?raban-Le-T?tu, Volume I. [fr] > => gemini://gemguten.example.com/etext/17832 Verne, Jules, 1828-1905. Une ville flottante. [fr] > ... > > Since this is not an _interactive_ search, the Gemini conventions > about status 1x and the query string don't apply. But it is an interactive search. When you point to someone that they can use the advanced search page, you're going to point them to gemini://gemguten.example.com/advsearch.gmi. If you fill in a query string for them, that's fine, but you also had to write that out manually. There's no way in Gemini to automatically create that link (as to do so you would have to give what you're looking for to a page to translate it to that format for you, but if it has that translation ability it would be supported in the advsearch.gmi page itself). In the end, someone had to write it out by hand, and that's not the right way to do it. I completely understand that such a search function is needed, and I obviously can't stop you from using this format if you want, but I do feel that there is a better way to pull it off. For example, if you're just searching for an author, you could make an author-searching page where the query string is only the author name. But I don't know what the better way, if there is one, is yet. Also, under my system, the URL you've given says nothing about the language of the interface (e.g. the "Les oeuvres de Jules Verne en fran?ais", which would presumably be in the header of the search page). Under my system, prepending 'fr.' to the domain would effectively request that the server use a French interface, so that everything from the returned text/gemini documents to error messages would be in French. But the URL would be otherwise unaffected. ~aravk | ~nothien
On Wed, Dec 30, 2020 at 5:15 PM <nothien at uber.space> wrote: > But it is an interactive search. When you point to someone that they > can use the advanced search page, you're going to point them to > gemini://gemguten.example.com/advsearch.gmi. If you chose such a link, you'd in principle get all 40,000+ document links, since there are no restrictions. But in fact you'd get an error page telling you that you asked for too many documents. There will be a different link altogether for interactive search, where you would be asked using a 10 response to enter keywords from the metadata (language, author, title, Library of Congress subject classification, etc.) However, that's inherently less precise: if you provided keywords "Mark Twain", you'd get both books by him and books about him, such as _My Mark Twain_ by William Dean Howells. > In the end, someone had to write it out by hand, True. But it isn't particularly difficult, either. I'll put some samples on the interactive search page along with the actual interactive link. > I completely understand that such a search function is > needed, and I obviously can't stop you from using this format if you > want, but I do feel that there is a better way to pull it off. For > example, if you're just searching for an author, you could make an > author-searching page where the query string is only the author name. > With so many pages to search, all the search terms are ANDed together: the more keywords, the less output to look through. (I'm not sure what the upper limit on results will be: for Google it's 1000.) You probably don't want all the works by one author anyhow: you want the ones you can read. Also, under my system, the URL you've given says nothing about the > language of the interface (e.g. the "Les oeuvres de Jules Verne en > fran?ais", which would presumably be in the header of the search page). > The search engine doesn't know what the link looks like. I suppose that could be passed in the query too: "... &linktext=Les%20oeuvres%20de%20Jules%20Verne%20en%20fran?ais", for example. Under my system, prepending 'fr.' to the domain would effectively > request that the server use a French interface, so that everything from > the returned text/gemini documents to error messages would be in French. > But the URL would be otherwise unaffected. > Fine as far as the error messages are concerned. But just because you want a French interface, it doesn't necessarily mean you want to reject English or German books from the search. John Cowan http://vrici.lojban.org/~cowan cowan at ccil.org It's the old, old story. Droid meets droid. Droid becomes chameleon. Droid loses chameleon, chameleon becomes blob, droid gets blob back again. It's a classic tale. --Kryten, Red Dwarf -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201230/50cf fefa/attachment.htm>
John Cowan <cowan at ccil.org> wrote: > If you chose such a link, you'd in principle get all 40,000+ document > links, since there are no restrictions. But in fact you'd get an > error page telling you that you asked for too many documents. There > will be a different link altogether for interactive search, where you > would be asked using a 10 response to enter keywords from the metadata > (language, author, title, Library of Congress subject classification, > etc.) You could just combine the two pages and return a 10 when no query string is provided to advsearch.gmi. > However, that's inherently less precise: if you provided keywords > "Mark Twain", you'd get both books by him and books about him, such as > _My Mark Twain_ by William Dean Howells. So you provide no interactive way to create an advanced search filter, and you are replacing it with an interactive way to create not an advanced search filter. > With so many pages to search, all the search terms are ANDed together: > the more keywords, the less output to look through. (I'm not sure > what the upper limit on results will be: for Google it's 1000.) You > probably don't want all the works by one author anyhow: you want the > ones you can read. You could simply prioritize books written in the language of the interface - so with the fr.gemguten.example.com/author/Jules%20Verne.gmi page, French books would show up at the top. But this solution doesn't scale to finding books in an arbitrary language. > The search engine doesn't know what the link looks like. I suppose > that could be passed in the query too: "... > &linktext=Les%20oeuvres%20de%20Jules%20Verne%20en%20fran?ais", for > example. The server is smart enough to generate something along those lines on its own. Please don't make URLs that much longer. > Fine as far as the error messages are concerned. But just because you > want a French interface, it doesn't necessarily mean you want to > reject English or German books from the search. Of course not. The interface language is completely dissociated from the actual content of the pages, it only affects the language they are written in. ~aravk | ~nothien
On Mon, Dec 28, 2020 at 10:12:30AM +0100, Arav K. <nothien at uber.space> wrote a message of 84 lines which said: > Because we don't want the query string to be used as it is in HTML, i.e. > for arbitrary parameters. Using ?lang=<lang> is setting an arguably > dangerous precedent. This opinion requires some elaboration. There is no reason to choose paths rather than queries, both are part of the URL. The difference between the two is purely historical (at a time, ? indicated a dynamic page). Said otherwise, <gemini://capsule.example/foo/bar> or <gemini://capsule.example/foo?bar> have identical semantics. A Gemini client can deduce nothing from the fact that one uses a path and the other a query. Note that Amazon managed to *patent* the idea of using parameters in the path. US "land of the crazy parents" patent n? 7,287,042 <http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p =1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=7,287,042.PN.&OS=PN/7, 287,042&RS=PN/7,287,042>
---
Previous Thread: [spec] Proposed changes
Next Thread: [spec] Adapting the HTTP Common Logging Format for use by Gemini servers