[SPEC-CHANGE] lang parameter, minor line type changes, clarifications...

1. solderpunk (solderpunk (a) SDF.ORG)

Ahoy!

I have just pushed some changes to the Gemini specification.  You can
see the new new v0.13.0 spec at:



Perhaps the biggest change, in conceptual terms, is the introduction of
the "lang" parameter for text/gemini.  However, most clients will not
need to make any changes whatsoever on account of this.

The other changes are either small clarifications or enhancements of
existing functionality and have all been discussed previously on the
mailing list.

SUMMARY OF CHANGES:

text/gemini documents can now specify which natural language(s) they are
written in via a "lang" parameter to the media type.  Valid values of
"lang" are comma-separated lists of language tags, as defined in
RFC4646.  These are actually pretty powerful tags and can specify
language, script (which implies a particular direction of text
rendering), usage of regional variant, etc.  Of course, clients can pay
as much or as little attention to these details as their authors seem
fit.  Servers never have to provide this parameter (although it would be
good practice to do so) and clients never have to pay attention to it.
See section 5.2 for full details.

The definition of unordered list item lines has been changed so that
they begin not just with "*" but with "* ".  This allows the first word
of a regular text line to be *emphasised* in a common fashion without
the line being accidentally considered a list item.  GUS data suggests
that everybody, or almost everybody, is already writing their list items
this way, so this should not require any content updates by authors.
See section 5.4.2 for full details.

Lines beginning with ">" are now defined to be quote lines, as per
popular demand.  Nothing is prescribed about how clients should display
this.  I expect terminal-based clients will simply keep the ">" visible
as its use to convey quotation is extremely widely used and familiar
from email and usenet.  When wrapping long lines to fit the screen, each
resultant line may have a ">" placed at the front.  This is mostly just
a styling matter, but I consider it to be styling which conveys
important semantic information and when quoting multiple paragraphs of
text it helps to disambiguate where the quotation ends.  This is the
last advanced line type I expect to ever add to the spec.  See section
5.4.3 for full details.

Status code 11 has been defined for requesting "sensitive" input, like
passwords.  Clients should treat it exactly like status code 10 except
they should not echo the user's input to the screen.  This will allow us
to experiment with different authentication paradigms as part of client
certificate work-flows.  See Appendix 1 for full details.

The need to use percent-encoding on reserved characters and spaces in
URLs, both in requests and in the link lines of text/gemini bodies, has
been made explicit due to observed variation in how clients/servers
actually handle this.  See sections 3.2.1 and 5.4.2 for full details.

The definition of link lines now clarifies that clients "MUST NOT
automatically make any network connections as part of displaying links
whose scheme corresponds to a network protocol (e.g. gemini://,
gopher://, https://, ftp://, etc.)".  See section 5.4.2 for full
details.

IMPLICATIONS FOR SERVER AUTHORS:

You SHOULD consider providing a way for admins and/or users to specify
which value of the "lang" parameter should be sent for text/gemini
content.

If your server automatically generates text/gemini content (e.g.
directory listings), you MUST make sure it uses percent-encoding in
its URLs (filenames with spaces in them are a good test case!).

IMPLICATIONS FOR CLIENT AUTHORS:

Your client MAY make use of the value of the "lang" parameter in
interpreting text/gemini content (this will mostly be relevant for the
Rhapsode audio browser and perhaps for search engines).

If your client recognises unordered list item lines and treats them
differently from plain text lines, you MUST change the code which
identifies them to require a space after the *.

You MAY update your client to recognise the new quote line type.

You MAY update your client to treat status code 11 differently from
status code 10.

If your client supports status codes beginning with 1, you MUST be
percent-encoding the user input when formatting the subsequent
request.

If your client has been automatically making network connections you
MUST remove this behaviour and atone for your sins!

IMPLICATIONS FOR CONTENT AUTHORS:

If your content has unordered list item lines which do not include a
space after the initial *, you MUST insert that space.

Cheers,
Solderpunk

Link to individual message.

2. James Tomasino (tomasino (a) lavabit.com)

On 6/7/20 4:27 PM, solderpunk wrote:
> The definition of unordered list item lines has been changed so that
> they begin not just with "*" but with "* ".
> Lines beginning with ">" are now defined to be quote lines, as per
> popular demand.

Just to be clear, lists have a mandatory space after them, but for
quotes and headings and links the whitespace is still set to zero-or-more?

Link to individual message.

3. solderpunk (solderpunk (a) SDF.ORG)

On Sun, Jun 07, 2020 at 04:41:49PM +0000, James Tomasino wrote:
> On 6/7/20 4:27 PM, solderpunk wrote:
> > The definition of unordered list item lines has been changed so that
> > they begin not just with "*" but with "* ".
> > Lines beginning with ">" are now defined to be quote lines, as per
> > popular demand.
> 
> Just to be clear, lists have a mandatory space after them, but for
> quotes and headings and links the whitespace is still set to zero-or-more?

That's how it current is, yeah.

This is, indeed, an inconsistency.  It arises from the fact that * is
the only one of the characters which are significant for determining
line types which has a strongly-engrained alternative use which has led
to actual erroneous formatting in the wild.

We *could* change this.  I don't see a really strong argument against
it, and consistency is good for learnability.  Then again, assuming that
clients are stripping leading whitespace from these line types, if
people get "the wrong idea" that those spaces are mandatory, it doesn't
actually cause any harm.

Cheers,
Solderpunk

Link to individual message.

4. Frank LENORMAND (lenormfml (a) gmail.com)

Hi,

Thanks for this update!

On Sun Jun  7 19:27:23 2020, solderpunk wrote:
> The definition of unordered list item lines has been changed so that
> they begin not just with "*" but with "* ".  This allows the first word
> of a regular text line to be *emphasised* in a common fashion without
> the line being accidentally considered a list item.  GUS data suggests
> that everybody, or almost everybody, is already writing their list items
> this way, so this should not require any content updates by authors.
> See section 5.4.2 for full details.

This amendment that prevents conflicts between list items and

render *emphasised text* verbatim.

Writers are no longer ensured by the standard that their text surrounded with
asterisks will not be decorated by the client, by extension.

Are there any plans to mention inline text decoration explicitly in the
specification? If yes, it follows that there should be a way for writers
to escape asterisks around words, in non pre-formatted blocks.

Regards,
-- 
Frank LENORMAND

Link to individual message.

5. solderpunk (solderpunk (a) SDF.ORG)

On Sun, Jun 07, 2020 at 09:38:04PM +0300, Frank LENORMAND wrote:
 
> This amendment that prevents conflicts between list items and
> *emphasised text* indirectly acknowledges that clients may not
> render *emphasised text* verbatim.
> 
> Writers are no longer ensured by the standard that their text surrounded with
> asterisks will not be decorated by the client, by extension.

Writers were *never* assured by the standard that text surrounded with
asterisks would not be decorated by the client!

Clients that want to do that can, but it's strictly an optional extra
out-of-spec nicety.  If anybody wants to do this, the onus is on them to
do is smartly enough that it doesn't interfere with the specced use of
asterisks for line items.  If they mess that up, presumably their users
will move to an either less ambitious or better written client which
doesn't mangle things.

> Are there any plans to mention inline text decoration explicitly in the
> specification? If yes, it follows that there should be a way for writers
> to escape asterisks around words, in non pre-formatted blocks.

There aren't.  Even if there were, it wouldn't follow that there would
need to be a way to escape them.  The spec says "authors should not expect
to exercise any control over the precise rendering of their text lines,
only of their actual textual content".  This extends to not being able
to opt out of various optional niceties that clients may choose to
implement above and beyond the spec.

Authors of text/gemini should never expect to influence the size,
weight, colour, font, alignment etc. of any of their text.  That's in
the client's hands, and that's a good thing.  The advanced line types
that exist may have common and semi-predictable consequences for
stylisastion in extant clients, but the reason they are in there is
primarily to give some way to convey important *semantic* information.
It's true that the list item type doesn't *quite* live up to that ideal.
But I like pretty lists, so...

Cheers,
Solderpunk

Link to individual message.

6. Luke Emmet (luke.emmet (a) gmail.com)

Hello all

On 07-Jun-2020 17:27, solderpunk wrote:
> The definition of link lines now clarifies that clients "MUST NOT
> automatically make any network connections as part of displaying links
> whose scheme corresponds to a network protocol (e.g. gemini://,
> gopher://, https://, ftp://, etc.)".  See section 5.4.2 for full
> details.
>
> <snip>
>
> IMPLICATIONS FOR CLIENT AUTHORS:
>
> <snip>
>
> If your client has been automatically making network connections you
> MUST remove this behaviour and atone for your sins!
>
I think all the changes are sensible, apart from the wording that tries 
to specify client behaviour. It is not for the spec IMO to prescribe the 
client behaviour, rather it should specify the exchange format and 
markup (both of which it does well).

If a client must not make subsequent network requests when interpreting 
a page, does this mean that search engines and crawlers are now 
non-compliant clients? This seems to go much too far.

I would think the "MUST NOT" would be better as a "SHOULD NOT" in case 
you are adamant to try to shape client behaviour. In my view this is not 
in scope of a protocol and markup format specification.

Also minor point, I would recommend removing the "atone for your sins" 
sentence as it is overly informal for a spec.

I like the explicit requirements covering URL encoding, lang, and 
bullets. I wonder how authors will reliably signal the language to the 
server though, particularly as it may be on a page by page basis.

Otherwise keep up the good work!

Best wishes

  - Luke

Link to individual message.

7. Luke Emmet (luke.emmet (a) gmail.com)

A minor clarification below...

On 07-Jun-2020 22:47, Luke Emmet wrote:
> Hello all
>
> On 07-Jun-2020 17:27, solderpunk wrote:
>> The definition of link lines now clarifies that clients "MUST NOT
>> automatically make any network connections as part of displaying links
>> whose scheme corresponds to a network protocol (e.g. gemini://,
>> gopher://, https://, ftp://, etc.)".  See section 5.4.2 for full
>> details.
>>
>> <snip>
>>
>> IMPLICATIONS FOR CLIENT AUTHORS:
>>
>> <snip>
>>
>> If your client has been automatically making network connections you
>> MUST remove this behaviour and atone for your sins!
>>
> I think all the changes are sensible, apart from the wording that 
> tries to specify client behaviour. It is not for the spec IMO to 
> prescribe the client behaviour, rather it should specify the exchange 
> format and markup (both of which it does well).

Sorry, to clarify that particular point, my intended point was that it 
is not for the spec to prescribe *when* the client makes or does not 
make its network requests in light of on its interpretation of the 
textual content. The spec should stick to correct computer to computer 
exchange (protocol matters) and the markup format (both of which it does 
well).
>
> If a client must not make subsequent network requests when 
> interpreting a page, does this mean that search engines and crawlers 
> are now non-compliant clients? This seems to go much too far.
>
> I would think the "MUST NOT" would be better as a "SHOULD NOT" in case 
> you are adamant to try to shape client behaviour. In my view this is 
> not in scope of a protocol and markup format specification.
>
> Also minor point, I would recommend removing the "atone for your sins" 
> sentence as it is overly informal for a spec.
>
> I like the explicit requirements covering URL encoding, lang, and 
> bullets. I wonder how authors will reliably signal the language to the 
> server though, particularly as it may be on a page by page basis.
>
> Otherwise keep up the good work!
>
> Best wishes
>
>  - Luke
>

Link to individual message.

8. Sean Conner (sean (a) conman.org)

It was thus said that the Great solderpunk once stated:
> 
> IMPLICATIONS FOR CLIENT AUTHORS:
> 
> If your client has been automatically making network connections you
> MUST remove this behaviour and atone for your sins!

  [Citation needed]

  -spc

Link to individual message.

9. Matthew Graybosch (hello (a) matthewgraybosch.com)

On Sun, 07 Jun 2020 21:38:04 +0300
Frank LENORMAND <lenormfml at gmail.com> wrote:

> Writers are no longer ensured by the standard that their text
> surrounded with asterisks will not be decorated by the client, by
> extension.

Writer here, but speaking strictly for myself. I don't see a problem
with this change. Details below.

---

I use asterisks inline for emphasis because that's a habit I've taken
from Markdown. My understanding of the standard was that the
behavior of inline asterisks (as opposed to asterisks at at the
beginning of a line to indicate a list item) is undefined and thus
client-dependent.

If the client uses them to denote *italics* and **bold**, great. If
not, I figure that readers familiar with plain-text email will still
interpret such text accordingly. Either way, as a self-hosting writer,
my job is to make the words and reliably serve them. What happens on
the client side is none of my business.

My understanding of the new changes to the standard is that the
behavior of inline asterisks is *still* client-dependent. That's fine
with me as a writer, TBH. If I wanted the illusion of precise control
over how clients render my documents, I'd go back to HTML/CSS--or
pick up groff again and just post PDFs. :)

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg,PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

10. Matthew Graybosch (hello (a) matthewgraybosch.com)

On Sun, 7 Jun 2020 16:27:23 +0000
solderpunk <solderpunk at SDF.ORG> wrote:

> I have just pushed some changes to the Gemini specification.  You can
> see the new new v0.13.0 spec at:

Thanks for the work you've been doing. If you don't mind, I have a
couple of questions as a content author.

> Perhaps the biggest change, in conceptual terms, is the introduction
> of the "lang" parameter for text/gemini.  However, most clients will
> not need to make any changes whatsoever on account of this.

I just read the relevant part of the spec, but I'm still not clear on
how I should go about specifying that my text is US English. Do I just
have to add "lang=en_US" on the first line of a text/gemini file?

> Lines beginning with ">" are now defined to be quote lines, as per
> popular demand.

This will be handy, but now I'm wondering about pre-formatted quotes
(mainly for poetry, song lyrics, screenplays, etc.)

For example, if a Gemini content author wanted to quote from a modern
poem like T. S. Eliot's _The Waste Land_ and preserve the original
formatting, would they do something like this?

 ```
> "What is that noise?"
>                           The wind under the door.
> "What is that noise now? What is the wind doing?"
>                            Nothing again nothing.
>                                                         "Do
> "You know nothing? Do you see nothing? Do you remember
> "Nothing?"
 ```

I suspect that most clients would interpret this block as just
pre-formatted text and render it in monospace--which I acknowledge is
correct behavior according to the current spec but not necessarily what
a content author wants if they decide to load their own page to see how
it looks in various clients.

Furthermore, looking for quote characters inside a preformatted block
and then rendering that block in a variable-width font instead (when
applicable) sounds like a good way to introduce bugs.

I'd like to suggest instead that we import the | character followed by
a space to denote line blocks from reStructuredText as an advanced line
type.

=>
https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#line-blocks

Of course, I suppose content authors could just use reStructuredText
instead, which should probably be the goto format for text that uses
footnotes/endnotes as well.

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg,PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

11. James Tomasino (tomasino (a) lavabit.com)

On 6/7/20 11:06 PM, Matthew Graybosch wrote:
> For example, if a Gemini content author wanted to quote from a modern
> poem like T. S. Eliot's _The Waste Land_ and preserve the original
> formatting, would they do something like this?
> 
> ```
>> "What is that noise?"
>>                           The wind under the door.
>> "What is that noise now? What is the wind doing?"
>>                            Nothing again nothing.
>>                                                         "Do
>> "You know nothing? Do you see nothing? Do you remember
>> "Nothing?"
> ```
> 
> I suspect that most clients would interpret this block as just
> pre-formatted text and render it in monospace--which I acknowledge is
> correct behavior according to the current spec but not necessarily what
> a content author wants if they decide to load their own page to see how
> it looks in various clients.

Using the preformatted fences is exactly what you'd want to use. Your
example above would preserve whitespace. That's one of the main purposes
of the ```.

I don't understand the need for an additional line type from your
description.

Link to individual message.

12. Matthew Graybosch (hello (a) matthewgraybosch.com)

On Sun, 7 Jun 2020 23:18:36 +0000
James Tomasino <tomasino at lavabit.com> wrote:

> Using the preformatted fences is exactly what you'd want to use. Your
> example above would preserve whitespace. That's one of the main
> purposes of the ```.
> 
> I don't understand the need for an additional line type from your
> description.
 
I must have been overthinking things, trying to get the best of both
worlds: a blockquote that preserves whitespace.

Sorry to have wasted people's time.

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg,PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

13. int 80h (int (a) 80h.dev)

On Sun Jun 7, 2020 at 3:06 PM EDT, Matthew Graybosch wrote:
> I just read the relevant part of the spec, but I'm still not clear on
> how I should go about specifying that my text is US English. Do I just
> have to add "lang=en_US" on the first line of a text/gemini file?

I'm not sure how other server writers did it but I added "lang" as an
option per vhost, currently on the dev branch.

I've suggested adding "lang" to the first line before[1] but I need to
reread that discussion to see if anyone else commented on it.

[1] gemini://gemi.dev/gemini-mailing-list/messages/000845.gmi

int 80h

Link to individual message.

14. Frank LENORMAND (lenormfml (a) gmail.com)

On Sun Jun  7 23:10:49 2020, solderpunk wrote:
> On Sun, Jun 07, 2020 at 09:38:04PM +0300, Frank LENORMAND wrote:
> > This amendment that prevents conflicts between list items and
> > *emphasised text* indirectly acknowledges that clients may not
> > render *emphasised text* verbatim.
> > 
> > Writers are no longer ensured by the standard that their text surrounded with
> > asterisks will not be decorated by the client, by extension.
> 
> Writers were *never* assured by the standard that text surrounded with
> asterisks would not be decorated by the client!
> 
> Clients that want to do that can, but it's strictly an optional extra
> out-of-spec nicety.  If anybody wants to do this, the onus is on them to
> do is smartly enough that it doesn't interfere with the specced use of
> asterisks for line items.  If they mess that up, presumably their users
> will move to an either less ambitious or better written client which
> doesn't mangle things.

The adjustment made to the bullet item format is an admission that in-line
formatting text is a thing. Writers MAY de-facto influence rendering,
otherwise you wouldn't have needed to make the amendment.

Before, writers could not reproach the clients' behaviour w.r.t to
interpreting asterisks, because nothing in the specification hinted that
it was acknowledged by the standard.

Now, clients MAY highlight \*\S.*\* patterns. The specification was amended
to make sure no ambiguous cases occur with bullet item lines.

Which means that clients who do choose to implement emphasising will be
asked for a way NOT to emphasise ALL such patterns, because the specification
never implied that was a thing, originally. But they are left with a
gaping hole, in the current state of the specification.

It follows that there should be a way for writers to escape asterisks around
words, in non pre-formatted blocks.

Regards,
-- 
Frank LENORMAND

Link to individual message.

15. solderpunk (solderpunk (a) SDF.ORG)

On Sun, Jun 07, 2020 at 06:40:33PM -0400, Matthew Graybosch wrote:

> I use asterisks inline for emphasis because that's a habit I've taken
> from Markdown. My understanding of the standard was that the
> behavior of inline asterisks (as opposed to asterisks at at the
> beginning of a line to indicate a list item) is undefined and thus
> client-dependent.
> 
> If the client uses them to denote *italics* and **bold**, great. If
> not, I figure that readers familiar with plain-text email will still
> interpret such text accordingly. Either way, as a self-hosting writer,
> my job is to make the words and reliably serve them. What happens on
> the client side is none of my business.
> 
> My understanding of the new changes to the standard is that the
> behavior of inline asterisks is *still* client-dependent. That's fine
> with me as a writer, TBH. If I wanted the illusion of precise control
> over how clients render my documents, I'd go back to HTML/CSS--or
> pick up groff again and just post PDFs. :)

This is all correct - technically *and* ideologically. :)

Cheers,
Solderpunk

Link to individual message.

16. solderpunk (solderpunk (a) SDF.ORG)

On Sun, Jun 07, 2020 at 07:06:38PM -0400, Matthew Graybosch wrote:
 
> I just read the relevant part of the spec, but I'm still not clear on
> how I should go about specifying that my text is US English. Do I just
> have to add "lang=en_US" on the first line of a text/gemini file?

No, the "lang" parameter is a parameter to the text/gemini MIME type
which is part of the response header.  It doesn't go in the document
itself.  Server software will need to provide admins and/or users some
way to configure this.

This will be fairly easy for people who run their own serer, and hence
have access to the config file, and who write in only a single language
- they can just set a single value which the server includes for all
.gmi files (or whatever extension has been configured to serve as
text/gemini).

Multi-lingual sites would probably work best with content in different
languages separated by the path hierarchy, and servers could let people
designate different languages depending on which regex the path matches.

Multi-user sites will be trickiest of all and will require users to
either bother the admin, or for servers to implement something like
Apache's .htaccess files.

I don't deny that this is kind of painful.  But I don't see a way around
it - if we say that the first line of a text/gemini file should be
"#lang: en-US" or anything like that we have immediately opened the door
to arbitrarly many additional options.  This basically gives us HTTP's
open-ended response header structure and completely defeats the point of
having a response header which is explicitly a single line.
 
> > Lines beginning with ">" are now defined to be quote lines, as per
> > popular demand.
> 
> This will be handy, but now I'm wondering about pre-formatted quotes
> (mainly for poetry, song lyrics, screenplays, etc.)

I would expect those to be handled just with pre-formatted lines.  I get
that they're also a quote of sorts, but simplicity necessitates
sacrifcing the ability for total semantic precision.  I hope we can live
with this.

(of course, something like an entire screenplay can always just be
served in its own document as text/plain)

> Furthermore, looking for quote characters inside a preformatted block
> and then rendering that block in a variable-width font instead (when
> applicable) sounds like a good way to introduce bugs.

It's also a clear violation of the spec!

Cheers,
Solderpunk

Link to individual message.

17. solderpunk (solderpunk (a) SDF.ORG)

On Sun, Jun 07, 2020 at 10:47:41PM +0100, Luke Emmet wrote:
 
> If a client must not make subsequent network requests when interpreting a
> page, does this mean that search engines and crawlers are now non-compliant
> clients? This seems to go much too far.
> 
> I would think the "MUST NOT" would be better as a "SHOULD NOT" in case you
> are adamant to try to shape client behaviour. In my view this is not in
> scope of a protocol and markup format specification.

You raise a good point, about search engines and the like.  Perhaps I
should specify "interactive clients".

And I get that this kind of prescription is beyond what would usually be
thought of as reasonable scope for a protocol spec.  I also realise that
it's entirely unenforcable.  In some very unlikely distant future where
the spec is being preened for actual IETF consideration I might remove
it.  But for now I want to do my best to make sure that Gemini develops
not just a technical specification but a strong cultural sense of the
right and wrong way to use that spec.

Cheers,
Solderpunk

Link to individual message.

18. Frank LENORMAND (lenormfml (a) gmail.com)

On Mon Jun  8 10:22:33 2020, solderpunk wrote:
> On Sun, Jun 07, 2020 at 06:40:33PM -0400, Matthew Graybosch wrote:
> > I use asterisks inline for emphasis because that's a habit I've taken
> > from Markdown. My understanding of the standard was that the
> > behavior of inline asterisks (as opposed to asterisks at at the
> > beginning of a line to indicate a list item) is undefined and thus
> > client-dependent.
> > 
> > If the client uses them to denote *italics* and **bold**, great. If
> > not, I figure that readers familiar with plain-text email will still
> > interpret such text accordingly. Either way, as a self-hosting writer,
> > my job is to make the words and reliably serve them. What happens on
> > the client side is none of my business.
> > 
> > My understanding of the new changes to the standard is that the
> > behavior of inline asterisks is *still* client-dependent. That's fine
> > with me as a writer, TBH. If I wanted the illusion of precise control
> > over how clients render my documents, I'd go back to HTML/CSS--or
> > pick up groff again and just post PDFs. :)
> 
> This is all correct - technically *and* ideologically. :)

How are clients supposed to render words that are censored with asterisks?

Consider:



The spec was modified to make sure clients understand the above isn't a
bullet item. Therefore writers are indirectly allowed by the standard to
influence the rendering, otherwise there would have been no conflict to
address in the first place.

So clients will render the above as "sshle", which isn't the writer's
intent. Only a matter of time until you see writers start escaping asterisks
in texts:

\*ssh\*le

And now, because you've put the responsibility of dealing with a concept
that the standard implies is allowed, some clients will render the above
either verbatim, or have a concept of escaping, the latter being against
the philosophy of Gemini, from what I gather.

You can't pretend clients will fill-in the gaps by themselves if the
specification isn't pedantic about it.

Regards,
-- 
Frank LENORMAND

Link to individual message.

19. Hannu Hartikainen (hannu.hartikainen+gemini (a) gmail.com)

> On Sun, Jun 07, 2020 at 10:47:41PM +0100, Luke Emmet wrote:
> > If a client must not make subsequent network requests when interpreting
a
> > page, does this mean that search engines and crawlers are now
non-compliant
> > clients? This seems to go much too far.

The spec says

> Clients can present links to users in whatever fashion the client author
wishes, however clients MUST NOT automatically make any network connections
as part of displaying links whose scheme corresponds to a network protocol
(e.g. gemini://, gopher://, https://, ftp://, etc.).

I find this reasonable: a crawler does not make any extra network
connections *when interpreting a page* or *as part of displaying links*.
Rather, it fetches single pages per spec, while building a graph of all
known pages (which it then fetches, still as single pages in a way
compatible with the spec). A crawler need not fetch any other pages in
order to add a single page to its index. If a search engine started
supporting inlining content from links it would be breaking the spec.

My two cents.

-Hannu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200608/0890
33a4/attachment.htm>

Link to individual message.

20. defdefred (defdefred (a) protonmail.com)

I wonder how many data consumption these search engine are representing all over the web.
Good thing with static site is that they can be cached and don't need to 
be downloaded each time to check is changed occurred.
A timestamp should suffice.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200608/e3a4
91a0/attachment.htm>

Link to individual message.

21. Matthew Graybosch (hello (a) matthewgraybosch.com)

On Mon, 08 Jun 2020 00:07:32 -0400
"int 80h" <int at 80h.dev> wrote:

> I'm not sure how other server writers did it but I added "lang" as an
> option per vhost, currently on the dev branch.

lang in vhost config makes sense to me. Thanks.

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg,PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

22. int 80h (int (a) 80h.dev)

On Mon Jun 8, 2020 at 3:36 AM EDT, solderpunk wrote:
> No, the "lang" parameter is a parameter to the text/gemini MIME type
> which is part of the response header. It doesn't go in the document
> itself. Server software will need to provide admins and/or users some
> way to configure this.

The way I was thinking is have the server look at the first line for
"#lang" then strip it and put it in the response header. That way it
could be an implementation of the server and not the spec itself.

> Multi-lingual sites would probably work best with content in different
> languages separated by the path hierarchy, and servers could let people
> designate different languages depending on which regex the path matches.
>
> Multi-user sites will be trickiest of all and will require users to
> either bother the admin, or for servers to implement something like
> Apache's .htaccess files.

Something like .htaccess could work. This morning I was also thinking of
it being in the file name. Then it could be on a per file basis like
"index.fr.gmi" would be sent as "index.gmi" with "lang=fr" in the
response header.

int 80h

Link to individual message.

23. solderpunk (solderpunk (a) SDF.ORG)

On Mon, Jun 08, 2020 at 01:10:42PM -0400, int 80h wrote:
 
> The way I was thinking is have the server look at the first line for
> "#lang" then strip it and put it in the response header. That way it
> could be an implementation of the server and not the spec itself.

Oh, sorry, I misunderstood!  That seems perfectly cromulent, although
I'd be a bit wary myself of tying my content to a specific server
(unless this convention became widely adopted).

Cheers,
Solderpunk

Link to individual message.

24. int 80h (int (a) 80h.dev)

On Mon Jun 8, 2020 at 1:26 PM EDT, solderpunk wrote:
> Oh, sorry, I misunderstood! That seems perfectly cromulent, although
> I'd be a bit wary myself of tying my content to a specific server
> (unless this convention became widely adopted).

True that could get annoying if it wasn't widely adopted.

int 80h

Link to individual message.

25. Jason McBrayer (jmcbray (a) carcosa.net)

"int 80h" <int at 80h.dev> writes:

> On Sun Jun 7, 2020 at 3:06 PM EDT, Matthew Graybosch wrote:
>> I just read the relevant part of the spec, but I'm still not clear on
>> how I should go about specifying that my text is US English. Do I just
>> have to add "lang=en_US" on the first line of a text/gemini file?
>
> I'm not sure how other server writers did it but I added "lang" as an
> option per vhost, currently on the dev branch.

My intention (in Germinal) is to have a global default option, a default
per-virtual host (once I add virtual hosts), and provide some way of
supplying individual files with metadata ? possibly a TOML file in each
directory with a section for each file or something similar.

-- 
+-----------------------------------------------------------+  
| Jason F. McBrayer                    jmcbray at carcosa.net  |  
| If someone conquers a thousand times a thousand others in |  
| battle, and someone else conquers himself, the latter one |  
| is the greatest of all conquerors.  --- The Dhammapada    |

Link to individual message.

26. Martin Keegan (martin (a) no.ucant.org)

On Mon, 8 Jun 2020, solderpunk wrote:

> Oh, sorry, I misunderstood!  That seems perfectly cromulent, although
> I'd be a bit wary myself of tying my content to a specific server
> (unless this convention became widely adopted).

There is a danger here of unwittingly reinventing YAML frontmatter.

Mk

-- 
Martin Keegan, +44 7779 296469, @mk270, https://mk.ucant.org/

Link to individual message.

27. Matthew Graybosch (hello (a) matthewgraybosch.com)

On Mon, 08 Jun 2020 11:24:44 +0300
Frank LENORMAND <lenormfml at gmail.com> wrote:

> How are clients supposed to render words that are censored with
> asterisks?

I can't speak for anybody else, but I stopped using asterisks for
self-censorship when I started using Markdown. Instead, I self-censor
with hyphens when I do it at all, because I can't be bothered to make
sure I've escaped every asterisk.

-- 
Matthew Graybosch           https://www.matthewgraybosch.com
#include <disclaimer.h>	    gemini://starbreaker.org
Harrisburg,PA	 	    gemini://demifiend.org
"Out of order?! Even in the future nothing works."

Link to individual message.

28. Case Duckworth (acdw (a) acdw.net)

On Mon, Jun 8, 2020, at 8:24 AM, Frank LENORMAND wrote:

> How are clients supposed to render words that are censored with asterisks?
> 
> Consider:
> 
> *ssh*le
> 
> The spec was modified to make sure clients understand the above isn't a
> bullet item. Therefore writers are indirectly allowed by the standard to
> influence the rendering, otherwise there would have been no conflict to
> address in the first place.
> 
> So clients will render the above as "sshle", which isn't the writer's
> intent. Only a matter of time until you see writers start escaping asterisks
> in texts:
> 
> \*ssh\*le
> 
> And now, because you've put the responsibility of dealing with a concept
> that the standard implies is allowed, some clients will render the above
> either verbatim, or have a concept of escaping, the latter being against
> the philosophy of Gemini, from what I gather.
> 
> You can't pretend clients will fill-in the gaps by themselves if the
> specification isn't pedantic about it.

AFAICT most clients do nothing with in-text formatting, and in fact if the 
document type is text/gemini they *should* not. text/gemini *only* 
specifies the 6 line-types for possibilities of special rendering, which 
means that your example of '*ssh*le' will render exactly as '*ssh*le' -- 
no escaping needed. If the mimetype is text/markdown, clients *could* 
render it differently, but text/markdown has its own escaping rules that a 
client would follow.

-- Case (acdw)

Link to individual message.

29. solderpunk (solderpunk (a) SDF.ORG)

On Mon, Jun 08, 2020 at 11:24:44AM +0300, Frank LENORMAND wrote:
 
> How are clients supposed to render words that are censored with asterisks?

However they like!  But "verbatim" seems like the only sensible option.

> The spec was modified to make sure clients understand the above isn't a
> bullet item. Therefore writers are indirectly allowed by the standard to
> influence the rendering, otherwise there would have been no conflict to
> address in the first place.

I really don't follow you here.  Writers are of course *allowed* to put
asterisks around their words if they want to but this doesn't imply that
they should expect clients to do anything in particular with that
information.  People use asterisks to emphasise words when writing
plain text content for Gopher where they know for sure that clients
won't do anything with it.

> So clients will render the above as "sshle", which isn't the writer's
> intent.

What on Earth makes you think clients will render that as sshle?  I
wouldn't use a client that did that.

Cheers,
Solderpunk

Link to individual message.

30. peteyboy (a) sdf.org (peteyboy (a) sdf.org)



>The adjustment made to the bullet item format is an admission that
>in-line
>formatting text is a thing. 

I don't see it that way. It's simply an admission that people use * in 
plaintext for emphasis, and that there is a pretty straightforward way to 
make a markup that shouldn't interfere with people's text. It's really 
only important if people are turning existing files into gmi files, I 
would think.  It fits the Gemini "philosophy" of determining the line type 
unambiguously in first 3 chars.

I've dreamed of a wiki-native protocol as well, but Gemini ain't trying to 
be it. There are several servers as I understand that just serve markdown, 
too. You can use one of those, as was suggested.



-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200609/3e5c
211e/attachment.htm>

Link to individual message.

31. Petite Abeille (petite.abeille (a) gmail.com)



> On Jun 7, 2020, at 18:27, solderpunk <solderpunk at SDF.ORG> wrote:
> 
> If your client has been automatically making network connections you
> MUST remove this behaviour and atone for your sins!

Tuning down the zealotry MAY be beneficial.

Link to individual message.

32. Sean Conner (sean (a) conman.org)

It was thus said that the Great solderpunk once stated:
> On Mon, Jun 08, 2020 at 11:24:44AM +0300, Frank LENORMAND wrote:
>  
> > How are clients supposed to render words that are censored with asterisks?
> 
> However they like!  But "verbatim" seems like the only sensible option.
> 
> > The spec was modified to make sure clients understand the above isn't a
> > bullet item. Therefore writers are indirectly allowed by the standard to
> > influence the rendering, otherwise there would have been no conflict to
> > address in the first place.
> 
> I really don't follow you here.  Writers are of course *allowed* to put
> asterisks around their words if they want to but this doesn't imply that
> they should expect clients to do anything in particular with that
> information.  People use asterisks to emphasise words when writing
> plain text content for Gopher where they know for sure that clients
> won't do anything with it.

  A graphical Gemini browser *might* render text between astericks with
italicised or bold text, to *emphasize* the emphasis that astericks imply.
A text based Gemini browser might instead decide to bold or recolor the
text, again to emphasize the emphasis.

  -spc

Link to individual message.

33. James Tomasino (tomasino (a) lavabit.com)

On 6/9/20 8:46 PM, Sean Conner wrote:
>   A graphical Gemini browser *might* render text between astericks with
> italicised or bold text, to *emphasize* the emphasis that astericks imply.
> A text based Gemini browser might instead decide to bold or recolor the
> text, again to emphasize the emphasis.
> 
>   -spc

I'm reading this in thunderbird, which has chosen to bold your examples
of emphatic text. It does not remove the asterisks, though. Also, if I
_underline_ a word, it renders it as such, but preserves the characters.
If clients are looking for good real-world examples to emulate when
going above-and-beyond the spec, I'd point there.

Link to individual message.

34. solderpunk (solderpunk (a) SDF.ORG)

On Tue, Jun 09, 2020 at 04:46:00PM -0400, Sean Conner wrote:
> 
>   A graphical Gemini browser *might* render text between astericks with
> italicised or bold text, to *emphasize* the emphasis that astericks imply.
> A text based Gemini browser might instead decide to bold or recolor the
> text, again to emphasize the emphasis.

This is all fine.  I would gently advise that clients which do this

yields unpleasant results with some documents.

Cheers,
Solderpunk

Link to individual message.

35. jzs (jzs (a) sketchground.dk)

> Multi-user sites will be trickiest of all and will require users to
> either bother the admin, or for servers to implement something like
> Apache's .htaccess files.
> Something like .htaccess could work. This morning I was also thinking of 
it being in the file name. Then it could be on a per file basis like 
"index.fr.gmi" would be sent as "index.gmi" with "lang=fr" in the response header.
int 80h

One could also let the server interpret gmi files and strip the first x 
lines in each file. You could use pandoc title block, multimarkdown 
metadata or yaml front matter. That way you can also insert date etc to 
automatically generate for example atom feeds. Of cause these "file 
headers" will be stripped before sending the file to the client.
Its really up to the server on how to deal with it.

/Jens.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200614/5c13
91c0/attachment.htm>

Link to individual message.

---

Previous Thread: Core Gemini docs converted to text/gemini

Next Thread: implementing client certificate support