💾 Archived View for gemi.dev › gemini-mailing-list › 000576.gmi captured on 2023-11-04 at 12:57:01. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

IETF policy on encodings and languages

📧 Messages: 24
🗣️ Authors: 9
📅 First Message: 2020-12-27 19:40
📅 Last Message: 2021-01-03 13:14

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-12-27 19:40
📧 Message 1 of 24

We already have good support for multiple encodings and (in the case of
text/gemini) languages.  However, two questions arise:

a) What character encoding is used for META parts intended for human
consumption?  TL;dr answer: UTF-8.

b) What language is used for those META parts, since the server does not
know what languages are acceptable to the user?  TL;dr answer: start with
English, add other languages as necessarily or useful.

Details:

BCP 18, IETF Policy on Character Sets and Languages <
https://tools.ietf.org/html/bcp18>, says what a spec should say about
character sets and languages.  The MUSTard of this BCP is:

1) Specs MUST say which parts of the protocol are meant to be
human-readable.  The answer should be that the META of status lines 1x, 4x
(except 44), 5x, and 6x are human-readable and everything else is part of
the protocol.

2) Protocols MUST specify which character encoding is in use, and it MUST
be possible for it to be UTF-8.  Nailing that down for human-readable META
text is what needs to be done.  See (a).

3) Encodings that are used MUST be in the IANA registry.  Because we are
using media types, that happens already.  No action needed.

4) Protocols MUST have a way (which can be a default) of communicating the
encoding in use.  Fixing (2) will fix this one also.

5) Protocols in which users have text presented to them MUST have a way of
dealing with multiple languages.  We have a problem here for 1x that isn't
trivial to solve: what should a Russian search engine indexing both English
and Russian documents return as the META to a 1x response?  (6) is one
approach.

6) Where there is no ability to negotiate languages (Gemini doesn't), then
"i-default" language SHOULD be used.  "i-default" text MUST be
understandable to an English-speaking person, but MAY include text in other
languages if appropriate (e.g. the languages of the capsule or server).
See (b) and (6).

7) Protocols SHOULD use BCP 47 language tags to specify languages.  We do.

8) Material on i18n SHOULD be collected into a special section so that it
can be found by people concerned with i18n or L10n.  That one's up to
Solderpunk, though it will be necessary if the spec becomes one or more
RFCs.

Link to individual message.

Arav K. <nothien (a) uber.space>

📅 Sent: 2020-12-27 20:06
📧 Message 2 of 24

On Sun, Dec 27, 2020 at 02:40:42PM -0500, John Cowan wrote:
> b) What language is used for those META parts, since the server does not
> know what languages are acceptable to the user?  TL;dr answer: start with
> English, add other languages as necessarily or useful.

The best-case scenario, of course, is that everybody sees the
human-readable META parts in their own language.  The issue with that is
either the client has to specify what language they expect it in, or the
server has to provide it in every language it supports.  Both are
obviously flawed.

One counterproposal to this best-case scenario is that the response body
being sent over (for successful requests) is also (probably) only in a
single language.  It would thus be natural to have the whole interface
in that same language.  If the server offers the same file / page in
different languages, they will have different URLs (most commonly
<lang>.example.com/... or example.com/<lang>/...).  In both of these
cases, the server can easily recognize from the URL what language is
expected and should provide an interface (including human-readable META
text) in that same language.  That would mean, for example, that the
entire ://fr.example.com site should use a French interface.  We
probably also want to disallow using example.com/...?lang=<lang> or
anything similar, even if it's just in the Best Practices document.

It's the server's responsibility, but also their prerogative, to provide
an interface in multiple languages.  If they don't, and if the users of
that server choose not to as well, then it is up to the client (and the
user controlling it) to translate stuff.  text/gemini's lang parameter
helps here.

I think that this proposal resolves the general interface language
issue.  Have I missed anything?

~aravk | ~nothien

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-12-27 20:41
📧 Message 3 of 24

On Sun, Dec 27, 2020 at 3:06 PM Arav K. <nothien at uber.space> wrote:

> the server can easily recognize from the URL what language is
> expected and should provide an interface (including human-readable META
> text) in that same language.  That would mean, for example, that the
> entire ://fr.example.com site should use a French interface.

And if you request gemini://example.com/la/non-exsistens.gmi and there is
no support for Latin error messages, as there probably is not?  Then what
language should be used?  With the exception of 1x responses,
human-readable <META> reflects error situations, where by definition the
server doesn't know what the user can or cannot understand.

> We probably also want to disallow using example.com/...?lang=<lang> or
> anything similar, even if it's just in the Best Practices document.
>

I have no idea why you would want to disallow that.  Changes to the query
string *are* changes to the URL, so that a particular language could be
equally well indicated using the domain, the path, or the query, depending
on the server's conventions.

It's the server's responsibility, but also their prerogative, to provide
> an interface in multiple languages.  If they don't, and if the users of
> that server choose not to as well, then it is up to the client (and the
> user controlling it) to translate stuff.

That's ideal, but it's a big burden on the client, which has to use
something as general as Google Translate to convert the Russian error
message being returned by the server to the Welsh expected by the user.

text/gemini's lang parameter helps here.
>

Not really: again, we are talking about the language of error messages.

Another point is that people often google for the meaning of error
messages, and that's made easier if they always look the same, or at least
some part of them always looks the same.

John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
Where the wombat has walked, it will inevitably walk again.
   (even through brick walls!)

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-12-27 22:57
📧 Message 4 of 24

It was thus said that the Great Arav K. once stated:
> On Sun, Dec 27, 2020 at 02:40:42PM -0500, John Cowan wrote:
> > b) What language is used for those META parts, since the server does not
> > know what languages are acceptable to the user?  TL;dr answer: start with
> > English, add other languages as necessarily or useful.
> 
> The best-case scenario, of course, is that everybody sees the
> human-readable META parts in their own language.  The issue with that is
> either the client has to specify what language they expect it in, or the
> server has to provide it in every language it supports.  Both are
> obviously flawed.

  Here's a list of resonse codes with the type of META information they use:

	10	prompt, human text
	11	prmopt, human text
	20	MIME type
	30	URI
	31	URI
	40	error message, human text
	41	error message, human text
	42	error message, human text
	43	error message, human text
	44	SECONDS
	50	error message, human text
	51	error message, human text
	52	error message, human text
	53	error message, human text
	59	error message, human text
	60	error message, numan text
	61	error message, human text
	62	error message, human text

  The META types for the ranges 40-62 are a formality and can be safely
ignored (I'm talking about the human text portion, not the actual status
code) except for 44 which contains machine usable data.  My own server just
spits out a generic text entry for each error code (the specific error is
logged on my end---there's no need for me to send such info to the client).

  It's really the META data for response codes 10 and 11 that need to be
displayed directly to the user.  How to deal with languages here is
difficult.

  -spc

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-27 23:04
📧 Message 5 of 24



> On Dec 27, 2020, at 23:57, Sean Conner <sean at conman.org> wrote:
> 
>  It's really the META data for response codes 10 and 11 that need to be
> displayed directly to the user.  How to deal with languages here is
> difficult.

Could we prefix META with a language tag and call it a day?

10 EN Indica or Sativa? ???

Link to individual message.

Sean Conner <sean (a) conman.org>

📅 Sent: 2020-12-27 23:13
📧 Message 6 of 24

It was thus said that the Great Petite Abeille once stated:
> 
> 
> > On Dec 27, 2020, at 23:57, Sean Conner <sean at conman.org> wrote:
> > 
> >  It's really the META data for response codes 10 and 11 that need to be
> > displayed directly to the user.  How to deal with languages here is
> > difficult.
> 
> Could we prefix META with a language tag and call it a day?
> 
> 10 EN Indica or Sativa? ???

  Potentially breaking change, and you see how people are reacting to the
IRI/IDN threads.

  -spc

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-27 23:17
📧 Message 7 of 24



> On Dec 28, 2020, at 00:13, Sean Conner <sean at conman.org> wrote:
> 
>  Potentially breaking change,

Not really. META is free form text. We could just structure it a bit by 
prefixing a language tag. No one gets hurt: it's just a display issue.

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-28 01:14
📧 Message 8 of 24

> On Dec 28, 2020, at 00:04, Petite Abeille <petite.abeille at gmail.com> wrote:
> 
> Could we prefix META with a language tag and call it a day?

Alternative/additionally could we use the X.509 certificate structure to 
shoehorn such information? There are a lot of free form text in there...

Both in term of language negotiation: META matches the client certificate, 
if any. And tagging: the server certificate advertise the META language.

A bit of a side-channel, but why not. Perhaps overdoing it though.

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-12-28 02:04
📧 Message 9 of 24

On Sun, Dec 27, 2020 at 6:04 PM Petite Abeille <petite.abeille at gmail.com>
wrote:

Could we prefix META with a language tag and call it a day?
>
> 10 EN Indica or Sativa? ???

I think that's the wrong way around.  The server doesn't normally have to
tell the client what language it's using (though there are obvious bad
cases like "Chat?")  The problem is that the client can't tell the server
what language the user would like to be prompted in.

Now that I think about it, though, that *can* be encoded in the URL readily
enough, though not in a universal way If a text/gemini file is in Greek,
the textual part of a link line will also normally be Greek, in which case
the URL should have "gr" in it someplace (assuming the server can handle
it).

Link to individual message.

Étienne Deparis <etienne (a) depar.is>

📅 Sent: 2020-12-28 08:02
📧 Message 10 of 24

lun. 28 d?c. 2020 ? 03:04, cowan at ccil.org a ?crit?:

> On Sun, Dec 27, 2020 at 6:04 PM Petite Abeille <petite.abeille at gmail.com>
> wrote:
>
> Now that I think about it, though, that *can* be encoded in the URL readily
> enough, though not in a universal way If a text/gemini file is in Greek,
> the textual part of a link line will also normally be Greek, in which case
> the URL should have "gr" in it someplace (assuming the server can handle
> it).

I have a similar thought: I think we should somehow avoid too generic
input. I mean, when the user is prompted by an input, it?s normally
after having click on some link. Thus maybe we should think
differently. For an internalized website, it?s easy to imagine different
section, each one for different language. And thus each of these pages
will be on a different specific language, maybe reflected in their
URL. Then, it?s up to the CGIs running the site to be able, for a
similar function, to serve an input request with the correct language,
following the page where the user was when they click (again, obviously
because of some difference in the URL).

Said otherwise, we should maybe avoid to think too much on a specific
problem, and think it again as part of a much broader situation, with
easier solutions around the corner.

--
?tienne Deparis

gemini://alltext.umaneti.net/
xmpp: etienne at depar.is

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-28 08:36
📧 Message 11 of 24



> On Dec 28, 2020, at 03:04, John Cowan <cowan at ccil.org> wrote:
> 
> I think that's the wrong way around. 

Oh, then I misunderstood the problematic. I thought we had to 
systematically tag any end-user oriented text. My bad. Apologies.

Link to individual message.

Arav K. <nothien (a) uber.space>

📅 Sent: 2020-12-28 09:12
📧 Message 12 of 24

On Sun, Dec 27, 2020 at 03:41:14PM -0500, John Cowan wrote:
> And if you request gemini://example.com/la/non-exsistens.gmi and there
> is no support for Latin error messages, as there probably is not?
> Then what language should be used?  With the exception of 1x
> responses, human-readable <META> reflects error situations, where by
> definition the server doesn't know what the user can or cannot
> understand.

If the server has a Latin section, it is expected to have a complete
Latin interface.  And the language that the user is expecting is
generally encoded into the URL itself, as others have mentioned: the
server knows that the /la/ section is requested, so it can use Latin
error messages.

> I have no idea why you would want to disallow that.  Changes to the
> query string *are* changes to the URL, so that a particular language
> could be equally well indicated using the domain, the path, or the
> query, depending on the server's conventions.

Because we don't want the query string to be used as it is in HTML, i.e.
for arbitrary parameters.  Using ?lang=<lang> is setting an arguably
dangerous precedent.

> That's ideal, but it's a big burden on the client, which has to use
> something as general as Google Translate to convert the Russian error
> message being returned by the server to the Welsh expected by the
> user.

You're right, clients can't do translation.  But the idea is that if you
came across a site that only had <insert language you don't understand>,
and you really wanted to see it, you would translate it manually.
Similarly, if you use the <language> interface / section of a site, it's
your responsibility (not the server's) to translate it.  If the site
offers a language interface / section that you do understand, use that.
Otherwise, you'll have to translate.

> Not really: again, we are talking about the language of error messages.

You're right, never mind.

> Another point is that people often google for the meaning of error
> messages, and that's made easier if they always look the same, or at
> least some part of them always looks the same.

That's the whole point of the status code.  The user's client can also
present a generic description of the status code (in the user's language
of choice) in addition to the error message from the META line.  The
user can reasonably expect that the error messages are in the language
of the interface / section specified in the URL, e.g. requesting
gemini://example.com/la/~foo could return 51 "non est usor".  If the
user doesn't understand Latin and has still for some reason requested
the Latin interface URL, they can still get a good idea of what's going
on thanks to the code 51, which their client can/should explain as
"Permanent Failure - Not Found".

~aravk | ~nothien

Link to individual message.

Solderpunk <solderpunk (a) posteo.net>

📅 Sent: 2020-12-28 10:10
📧 Message 13 of 24

On Sun Dec 27, 2020 at 9:41 PM CET, John Cowan wrote:
> On Sun, Dec 27, 2020 at 3:06 PM Arav K. <nothien at uber.space> wrote:
>
>
> > the server can easily recognize from the URL what language is
> > expected and should provide an interface (including human-readable META
> > text) in that same language.  That would mean, for example, that the
> > entire ://fr.example.com site should use a French interface.
>
>
> And if you request gemini://example.com/la/non-exsistens.gmi and there
> is
> no support for Latin error messages, as there probably is not? Then what
> language should be used? With the exception of 1x responses,
> human-readable <META> reflects error situations, where by definition the
> server doesn't know what the user can or cannot understand.

One of the motivations for having the second digits clarifying the exact
nature of the error (besides allowing useful logging on the server side
for identifying problems, and allowing the writing of more robust bots)
was that clients could use them to provide *some* degree of localised
error message.  E.g. if a server written by an English-speaking programmer
sends back "51 Not found", a client with a Finnish language interface could
recognise the 51 status code and say to its users:

> Ei l?ytynyt!  Palvelin sanoi: "Not found"

which a non-English-reading Finn would perceive as:

> Not found!  Server said: "<mysterious foreign message>"

Which is slightly better than just:

> Server said: "<mysterious foreign message>"

And for people who read *some* English (or whatever language the server
uses for errors) but not very much, having a localised translation of the
error category first might be enough context to enable them to make
enough sense of the full error message to have some understanding of
what's going on.

Cheers,
Solderpunk

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-28 10:45
📧 Message 14 of 24

> On Dec 28, 2020, at 11:10, Solderpunk <solderpunk at posteo.net> wrote:
> 
> One of the motivations for having the second digits clarifying the exact
> nature of the error

Yes, but this doesn't help with 1x responses. They are meant to be 
presented to the end user, as a prompt. And they lack a language tag. This 
is my understanding of the crux of the issue. But perhaps I missed something.

Link to individual message.

Arav K. <nothien (a) uber.space>

📅 Sent: 2020-12-28 10:57
📧 Message 15 of 24

On Mon, Dec 28, 2020 at 11:45:56AM +0100, Petite Abeille wrote:
> Yes, but this doesn't help with 1x responses. They are meant to be
> presented to the end user, as a prompt. And they lack a language tag.
> This is my understanding of the crux of the issue. But perhaps I
> missed something.

My point was that the client is communicating the language the server
should use in the URL itself, e.g. for gemini://fr.example.com/search
the server would send a prompt in French.  If no language is specified,
the server should assume a default language, which would often be
English.

I thought this didn't need to be mentioned, but if the server supports
the URL gemini://fr.example.com/, then it is expected to have a French
interface for it.  If it doesn't support a language, then it just
shouldn't offer it, and it then becomes the user's responsibility to
translate appropriately.

~aravk | ~nothien

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-28 11:03
📧 Message 16 of 24

> On Dec 28, 2020, at 11:57, Arav K. <nothien at uber.space> wrote:
> 
> I thought this didn't need to be mentioned, but if the server supports
> the URL gemini://fr.example.com/, then it is expected to have a French
> interface for it.  If it doesn't support a language, then it just
> shouldn't offer it, and it then becomes the user's responsibility to
> translate appropriately.

Sounds reasonable enough to me: language tags are conveyed through the 
expedient of embedding them in the user generated content, i.e. the URL, 
by convention. And not the protocol machinery, i.e. status codes, as per 
the specification.

I thought that John's point of contention was the protocol machinery, as 
opposed to the user generated content.

Link to individual message.

Côme Chilliet <come (a) chilliet.eu>

📅 Sent: 2020-12-28 11:48
📧 Message 17 of 24

Le dimanche 27 d?cembre 2020, 20:40:42 CET John Cowan a ?crit :
> b) What language is used for those META parts, since the server does not
> know what languages are acceptable to the user?  TL;dr answer: start with
> English, add other languages as necessarily or useful.

I agree with what has been said by some, the text in META should be in the 
same language as the rest of what the server is hosting.
If it is a multi-language capsule, most likely there is an indication in 
the address to convey language choice. Or in the session if using client 
certs or whatever to keep a session open.
In all cases, the server knows in which language the pages are and should 
use the same one for META.

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-12-30 06:35
📧 Message 18 of 24

On Mon, Dec 28, 2020 at 4:12 AM Arav K. <nothien at uber.space> wrote:

> If the server has a Latin section, it is expected to have a complete
> Latin interface.

That makes very little sense to me.  It's true that www.vatican.va provides
a Latin user interface, but a site presenting English law is going to have
an English interface only, even though the older laws are in Latin or Old
Norman French.  *Nobody* needs an Old Norman French user interface.

Similarly, gutenberg.org provides only an English interface, even though it
provides e-books in 55 languages, from 37527 in English and 2356 in French
down to 21 languages with a single book each.  (There are other PG-like
sites in and for many countries; see the WP article.)

> Because we don't want the query string to be used as it is in HTML, i.e.
> for arbitrary parameters.  Using ?lang=<lang> is setting an arguably
> dangerous precedent.
>

I can't agree there either.  The only requirement in Gemini imposed on the
query string is that the URL sent after a 10 or 11 response contains
whatever the user entered as the query string.  There is nothing to prevent
link lines from containing query strings themselves.

In the Gemini PG interface I plan to write as soon as I have a chance, the
UI will be entirely in English, the only language I speak.  When you are
searching, you can include words in the query like "lang:en" or
"media:text/plain" or "author:Twain", as well as plain words in any script,
more or less like Google Search.

John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
All Gaul is divided into three parts: the part that cooks with lard and
goose
fat, the part that cooks with olive oil, and the part that cooks with
butter.
  --David Chessler

Link to individual message.

nothien@uber.space <nothien (a) uber.space>

📅 Sent: 2020-12-30 08:25
📧 Message 19 of 24

John Cowan <cowan at ccil.org> wrote:
> That makes very little sense to me.  It's true that www.vatican.va
> provides a Latin user interface, but a site presenting English law is
> going to have an English interface only, even though the older laws
> are in Latin or Old Norman French.  *Nobody* needs an Old Norman
> French user interface.

Sorry, I wasn't clear.  When I said "Latin section", I meant "section
meant for Latin users".  The site you're talking about isn't meant for
Latin-speaking users, so it doesn't have a Latin interface.  That's
perfectly fine.

> Similarly, gutenberg.org provides only an English interface, even
> though it provides e-books in 55 languages, from 37527 in English and
> 2356 in French down to 21 languages with a single book each.  (There
> are other PG-like sites in and for many countries; see the WP
> article.)

One example doesn't make the rule.  I would argue that Project Gutenberg
_should_ have interfaces in other languages, because it is offering
(some) content that is almost exclusively going to be consumed by people
speaking (possibly only) these other languages.  I'm assuming that not
that many monolingual English speakers/readers are reading those 2,356
French books - Mainly French-speaking people are reading those books,
and a French interface should be made available to them.  I can
understand, however, that PG doesn't currently have the resources to
translate its interface.  But that doesn't mean that it should not be a
goal.

> I can't agree there either.  The only requirement in Gemini imposed on
> the query string is that the URL sent after a 10 or 11 response
> contains whatever the user entered as the query string.  There is
> nothing to prevent link lines from containing query strings
> themselves.

We're talking about different things.  I'm not talking about link lines
and query strings.  My point of contention is the use of HTML-style
(<key>=<value>)* formatting for query strings.

> In the Gemini PG interface I plan to write as soon as I have a chance,
> the UI will be entirely in English, the only language I speak.  When
> you are searching, you can include words in the query like "lang:en"
> or "media:text/plain" or "author:Twain", as well as plain words in any
> script, more or less like Google Search.

That makes perfect sense: you only speak English, you only design
English interfaces.  I have absolutely no problem with that.  But you
should at least open up the possibility of having other interfaces, even
if you don't write them yourself.  Non-English-readers will thank you
for it.

~aravk | ~nothien

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-12-30 20:26
📧 Message 20 of 24

On Wed, Dec 30, 2020 at 3:25 AM <nothien at uber.space> wrote:

Sorry, I wasn't clear.  When I said "Latin section", I meant "section
> meant for Latin users".  The site you're talking about isn't meant for
> Latin-speaking users, so it doesn't have a Latin interface.  That's
> perfectly fine.
>

Ah, okay.

> One example doesn't make the rule.  I would argue that Project Gutenberg
> _should_ have interfaces in other languages, because it is offering
> (some) content that is almost exclusively going to be consumed by people
> speaking (possibly only) these other languages.

Up to a point, certainly.  The tail end of languages probably don't need
interfaces.

> We're talking about different things.  I'm not talking about link lines
> and query strings.  My point of contention is the use of HTML-style
> (<key>=<value>)* formatting for query strings.
>

Suppose you have written an essay in French on the novels of Jules Verne in
text/gemini format, and you want to link to a collection of the novels
themselves.  You can then insert this link line:

=> gemini://
gemguten.example.com/advsearch.gmi?lang=fr&author=Jules&author=Verne Les
oeuvres de Jules Verne en fran?ais

The user who selects this link will receive a text/gemini document,
something like this:

=> gemini://gemguten.example.com/etext/5082 Verne, Jules, 1828-1905. Le
chate?u des Carpathe. [fr]
=> gemini://gemguten.example.com/etext/8174 Verne, Jules, 1828-1905.
K?raban-Le-T?tu, Volume I. [fr]
=> gemini://gemguten.example.com/etext/17832 Verne, Jules, 1828-1905. Une
ville flottante. [fr]
...

Since this is not an _interactive_ search, the Gemini conventions about
status 1x and the query string don't apply.

But you
> should at least open up the possibility of having other interfaces, even
> if you don't write them yourself.  Non-English-readers will thank you
> for it.
>

Absolutely.

John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
If I have not seen as far as other giants, it?s because I have been
standing on my head.  --Trond Engen

Link to individual message.

nothien@uber.space <nothien (a) uber.space>

📅 Sent: 2020-12-30 22:16
📧 Message 21 of 24

John Cowan <cowan at ccil.org> wrote:
> Up to a point, certainly.  The tail end of languages probably don't
> need interfaces.

Yep, makes sense.  Too much work (unless someone's willing to do it all
for you).

> Suppose you have written an essay in French on the novels of Jules
> Verne in text/gemini format, and you want to link to a collection of
> the novels themselves.  You can then insert this link line:
> 
> => gemini://gemguten.example.com/advsearch.gmi?lang=fr&author=Jules&autho
r=Verne Les oeuvres de Jules Verne en fran?ais
> 
> The user who selects this link will receive a text/gemini document,
> something like this:
> 
> => gemini://gemguten.example.com/etext/5082 Verne, Jules, 1828-1905. Le 
chate?u des Carpathe. [fr]
> => gemini://gemguten.example.com/etext/8174 Verne, Jules, 1828-1905. 
K?raban-Le-T?tu, Volume I. [fr]
> => gemini://gemguten.example.com/etext/17832 Verne, Jules, 1828-1905. 
Une ville flottante. [fr]
> ...
> 
> Since this is not an _interactive_ search, the Gemini conventions
> about status 1x and the query string don't apply.

But it is an interactive search.  When you point to someone that they
can use the advanced search page, you're going to point them to
gemini://gemguten.example.com/advsearch.gmi.  If you fill in a query
string for them, that's fine, but you also had to write that out
manually.  There's no way in Gemini to automatically create that link
(as to do so you would have to give what you're looking for to a page to
translate it to that format for you, but if it has that translation
ability it would be supported in the advsearch.gmi page itself).  In the
end, someone had to write it out by hand, and that's not the right way
to do it.  I completely understand that such a search function is
needed, and I obviously can't stop you from using this format if you
want, but I do feel that there is a better way to pull it off.  For
example, if you're just searching for an author, you could make an
author-searching page where the query string is only the author name.
But I don't know what the better way, if there is one, is yet.

Also, under my system, the URL you've given says nothing about the
language of the interface (e.g. the "Les oeuvres de Jules Verne en
fran?ais", which would presumably be in the header of the search page).
Under my system, prepending 'fr.' to the domain would effectively
request that the server use a French interface, so that everything from
the returned text/gemini documents to error messages would be in French.
But the URL would be otherwise unaffected.

~aravk | ~nothien

Link to individual message.

John Cowan <cowan (a) ccil.org>

📅 Sent: 2020-12-31 03:54
📧 Message 22 of 24

On Wed, Dec 30, 2020 at 5:15 PM <nothien at uber.space> wrote:

> But it is an interactive search.  When you point to someone that they
> can use the advanced search page, you're going to point them to
> gemini://gemguten.example.com/advsearch.gmi.

If you chose such a link, you'd in principle get all 40,000+ document
links, since there are no restrictions.  But in fact you'd get an error
page telling you that you asked for too many documents.  There will be a
different link altogether for interactive search, where you would be asked
using a 10 response to enter keywords from the metadata (language, author,
title, Library of Congress subject classification, etc.)  However, that's
inherently less precise: if you provided keywords "Mark Twain", you'd get
both books by him and books about him, such as _My Mark Twain_ by William
Dean Howells.

> In the end, someone had to write it out by hand,

True.  But it isn't particularly difficult, either.  I'll put some samples
on the interactive search page along with the actual interactive link.

> I completely understand that such a search function is
> needed, and I obviously can't stop you from using this format if you
> want, but I do feel that there is a better way to pull it off.  For
> example, if you're just searching for an author, you could make an
> author-searching page where the query string is only the author name.
>

With so many pages to search, all the search terms are ANDed together: the
more keywords, the less output to look through.  (I'm not sure what the
upper limit on results will be: for Google it's 1000.)  You probably don't
want all the works by one author anyhow: you want the ones you can read.

Also, under my system, the URL you've given says nothing about the
> language of the interface (e.g. the "Les oeuvres de Jules Verne en
> fran?ais", which would presumably be in the header of the search page).
>

The search engine doesn't know what the link looks like.  I suppose that
could be passed in the query too: "...
&linktext=Les%20oeuvres%20de%20Jules%20Verne%20en%20fran?ais", for example.

Under my system, prepending 'fr.' to the domain would effectively
> request that the server use a French interface, so that everything from
> the returned text/gemini documents to error messages would be in French.
> But the URL would be otherwise unaffected.
>

Fine as far as the error messages are concerned.  But just because you want
a French interface, it doesn't necessarily mean you want to reject English
or German books from the search.

John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
It's the old, old story.  Droid meets droid.  Droid becomes chameleon.
Droid loses chameleon, chameleon becomes blob, droid gets blob back
again.  It's a classic tale.  --Kryten, Red Dwarf

Link to individual message.

nothien@uber.space <nothien (a) uber.space>

📅 Sent: 2020-12-31 07:31
📧 Message 23 of 24

John Cowan <cowan at ccil.org> wrote:
> If you chose such a link, you'd in principle get all 40,000+ document
> links, since there are no restrictions.  But in fact you'd get an
> error page telling you that you asked for too many documents.  There
> will be a different link altogether for interactive search, where you
> would be asked using a 10 response to enter keywords from the metadata
> (language, author, title, Library of Congress subject classification,
> etc.)

You could just combine the two pages and return a 10 when no query
string is provided to advsearch.gmi.

> However, that's inherently less precise: if you provided keywords
> "Mark Twain", you'd get both books by him and books about him, such as
> _My Mark Twain_ by William Dean Howells.

So you provide no interactive way to create an advanced search filter,
and you are replacing it with an interactive way to create not an
advanced search filter.

> With so many pages to search, all the search terms are ANDed together:
> the more keywords, the less output to look through.  (I'm not sure
> what the upper limit on results will be: for Google it's 1000.)  You
> probably don't want all the works by one author anyhow: you want the
> ones you can read.

You could simply prioritize books written in the language of the
interface - so with the fr.gemguten.example.com/author/Jules%20Verne.gmi
page, French books would show up at the top.  But this solution doesn't
scale to finding books in an arbitrary language.

> The search engine doesn't know what the link looks like.  I suppose
> that could be passed in the query too: "...
> &linktext=Les%20oeuvres%20de%20Jules%20Verne%20en%20fran?ais", for
> example.

The server is smart enough to generate something along those lines on
its own.  Please don't make URLs that much longer.

> Fine as far as the error messages are concerned.  But just because you
> want a French interface, it doesn't necessarily mean you want to
> reject English or German books from the search.

Of course not.  The interface language is completely dissociated from
the actual content of the pages, it only affects the language they are
written in.

~aravk | ~nothien

Link to individual message.

Stephane Bortzmeyer <stephane (a) sources.org>

📅 Sent: 2021-01-03 13:14
📧 Message 24 of 24

On Mon, Dec 28, 2020 at 10:12:30AM +0100,
 Arav K. <nothien at uber.space> wrote 
 a message of 84 lines which said:

> Because we don't want the query string to be used as it is in HTML, i.e.
> for arbitrary parameters.  Using ?lang=<lang> is setting an arguably
> dangerous precedent.

This opinion requires some elaboration. There is no reason to choose paths
rather than queries, both are part of the URL. The difference between
the two is purely historical (at a time, ? indicated a dynamic page).

Said otherwise, <gemini://capsule.example/foo/bar> or
<gemini://capsule.example/foo?bar> have identical semantics. A Gemini
client can deduce nothing from the fact that one uses a path and the
other a query.

Note that Amazon managed to *patent* the idea of using parameters in
the path. US "land of the crazy parents" patent n? 7,287,042
<http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p
=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=7,287,042.PN.&OS=PN/7,
287,042&RS=PN/7,287,042>

Link to individual message.

---

Previous Thread: [spec] Proposed changes

Next Thread: [spec] Adapting the HTTP Common Logging Format for use by Gemini servers