💾 Archived View for gemi.dev › gemini-mailing-list › 000564.gmi captured on 2023-11-04 at 12:56:22. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

[tech] [spec] Decide on use of URL fragment

📧 Messages: 3
🗣️ Authors: 3
📅 First Message: 2020-12-24 21:01
📅 Last Message: 2020-12-26 16:35

nervuri@disroot.org <nervuri (a) disroot.org>

📅 Sent: 2020-12-24 21:01
📧 Message 1 of 3

Hello and thank you all for Gemini! Coming to Gemini from the web feels 
like leaving the smog and breathing clean air again. The documentation was 
a delight to read through, the trade-offs Gemini makes are well thought 
out and argued for. I respect the effort, so I'm here to lend a hand.

One of the missing bits is the ability to link to a specific part of a 
Gemini page, which is quite important when dealing with large documents. 
At one point there was talk of a library on Gemini ( 
gemini://booksin.space/ ), with complete books in plain text. So, indeed, 
this function would be very useful.

  https://lists.orbitalfox.eu/archives/gemini/2020/003702.html

Gemini doesn't yet use the URL fragment (the part after "#"). The web 
does, but in a rigid way: if a web page doesn't contain elements with IDs 
or names, there's nothing visitors can do to link to a specific section.

Gemini browsers can do better. I'm opening this thread to gather all 
proposed solutions, so that a standard may be reached. In this message 
I'll provide an overview of the ground that has been covered so far. At 
the end I highlight the solutions that I think best fit the Gemini ethos.

This topic has been discussed on the mailing list, but the discussion didn't go very far:

  https://lists.orbitalfox.eu/archives/gemini/2020/001454.html
  https://lists.orbitalfox.eu/archives/gemini/2020/004291.html

Proposed solutions:


	Numbering schemes, such as #20 (line 20) or #3.2 (heading 3, subheading 

2). Short and simple, but easily broken if the page is modified. RFC 5147 
is another such scheme, which introduces optional integrity checks and 
allows highlighting passages. But even with integrity checking, the 
fragment becomes useless if changes are made before (or within) the 
specified section. The fragility of these solutions is their major downside.


	Linking to headings, either by URL-encoding them directly into the 

fragment (ex: #5.4%20Core%20line%20types) or converting them to kebab case 
(ex: #core-line-types). Such links would be more robust than using 
number-based fragments, but why limit them to headings?

A more flexible option for for text/gemini and text/plain is to use the 
fragment for in-page search. Advanced browsers are bound to develop a 
"find in page" function which can be reused here, taking the fragment as 
input. This solution allows for linking to any point in the page and can 
be used to highlight passages.

I made a demo which includes a "link to here" option when selecting text:

https://nervuri.net/fragment-search/

It uses the following syntax:

#<text>[:<occurrence>]

, where <text> is any URL-encoded excerpt from the page and <occurrence> 
is an optional integer (negative numbers can be used for backwards 
search). It's still partially a numeric scheme, so a link like "#a:98" 
(the 98th occurrence of the letter "a") is very easily broken when the 
document changes. But if a long enough search string is used, the link 
will withstand most changes.

It turns out that this idea has been around on the web for a long time:

  https://indieweb.org/fragmention

Most importantly, there is an emerging web standard which we can learn from:

  https://github.com/WICG/scroll-to-text-fragment
  https://temp.treora.com/text-fragments-ts/demo.html

The syntax is:

#:~:text=[prefix-,]textStart[,textEnd][,-suffix]

Leaving aside its complexity (which I know is a sticking point for 
Gemini), it is more robust and versatile than basic search, because:


	it allows specifying any fragment of text, no matter how long, using 

only its beginning and end

	the "prefix-" and "-suffix" are better than an integer at disambiguating 

a specific instance of repeating text on a dynamic page

	it allows highlighting multiple fragments of text, using the "&" separator.


The standard could be simplified for Gemini, as the security 
considerations don't apply and partial implementations are possible 
(having multiple quotes in the same URL is arguably overkill). Even so, 
its complexity probably rules it out for Gemini.

It's interesting to note that all of these solutions fit together like 
pieces of Lego. Browsers can use all or none (or partial implementations), 
in a way that allows for graceful degradation. Multiple standards can even 
be used in the same link. Examples:

#char=100       (RFC 5147)
#line=10,20     (RFC 5147)
#heading=3.1.2
#heading=core-line-types
#text=some text
#text=some text:2
#text=from here,to here
#text=pre-,from here,to here,-post
#text=first,passage&text=second,passage
#text=word:3&text=some text
#line=10&text=some text
#regex=^(.+?):(-?\d+)$      (URL-encoded)
etc.

Ok, this got out of hand. Nobody wants all of these standards floating 
around in Gemini space, right? So, to conclude, I think the best options for Gemini are:


	basic search (#search%20me), perhaps with occurrence number 

(#search%20me:3). I consider this the best fit for Gemini, as it's the 
lowest complexity/(robustness*flexibility) value. It's trivial to 
implement for browsers that already have a "find in page" function. Search 
engines can also use this feature to generate links that take you directly 
to the part of the page that matches your query. On the other hand, if 
avoiding long ugly links is considered important, we can go with:


	headings only, using "lowercase kebab case of the alphabetic content, 

with single hyphens" (ex: #core-line-types) - which is what Luke Emmet 
suggested back in June ( 
https://lists.orbitalfox.eu/archives/gemini/2020/001462.html ). This is 
also low-complexity, gives us shorter links and is very robust to changes. 
But only linking to headings is quite restrictive. Also, it wouldn't work 
for identical (sub)headings or headings without letters, unless an 
occurrence number is added.


	just yesterday Luke Emmet had another good proposal on this topic: using 

the first 12 chars of the base64-encoding of the heading (details here: 
https://lists.orbitalfox.eu/archives/gemini/2020/004341.html ). Note that 
it can be applied to any line in the file. Again, an occurrence number can 
be added if needed. Hashes can be considered as alternatives to base64. 
The problem with this solution is that it's the most opaque of all. 
There's no way to tell what the fragment points to when looking at it, and 
creating the link without the assistance of an advanced browser is pretty 
much out of the question. But it's otherwise a good compromise which takes 
URL length into account.

Honorable mention to just using line numbers. Fragility aside, this is the 
most simple, straightforward option. Maybe append a hash of the line for 
integrity checking, like in RFC 5147.

Fragment search can encourage use of very long links. I, for one, don't 
consider this to be a serious downside. Links are primarily meant for 
computers - that's why Gemini has the optional user-friendly link names, after all.

Whichever solution is chosen (if any), I think it should be part of the 
Gemini best practices document, because we want links to work the same way 
for everyone - at least in the more advanced browsers that will support 
this. I do encourage the community to choose one, as even the most 
rudimentary solution is better than nothing.

Finally, the fragment may also be used for other purposes. For instance, a 
hash could be used by the browser to check if a page has been modified, or 
if a file has been downloaded correctly. So it might be wise to use 
prefixes, like #find= and #hash:sha256=.

Link to individual message.

nervuri <nervuri (a) disroot.org>

📅 Sent: 2020-12-26 16:28
📧 Message 2 of 3

> Finally, the fragment may also be used for other purposes. For instance, 
a hash could be used by the browser to check if a page has been modified, 
or if a file has been downloaded correctly. So it might be wise to use 
prefixes, like #find= and #hash:sha256=.

Two more quirky ideas for alternate uses of fragments:


	content-length (#size=1436) - can be added automatically in a 

server-generated file listing. Optionally, a Gemini server could append 
#size= to all links pointing to domains which it hosts. And browsers could 
limit their use of #size to instances where the destination file is hosted 
on the same domain as the page that links to it.


	certificate fingerprint (#cert:sha256=EB07E03...) - embeds a server's 

(or a user's) perspective on the cert used by a linked-to domain. Can be 
periodically refreshed by the server. If it doesn't match, browser shows a warning.

Link to individual message.

Petite Abeille <petite.abeille (a) gmail.com>

📅 Sent: 2020-12-26 16:35
📧 Message 3 of 3

> On Dec 26, 2020, at 17:28, nervuri <nervuri at disroot.org> wrote:
> 
> Two more quirky ideas for alternate uses of fragments:

Oh my... perhaps you should not have mentioned all these use cases so 
explicitly... another side channel to kill :D

The choices are clear:

(1) Keep fragments, and define their behavior in minute, baroque details.
(2) Keep them, but mark them as "undefined", i.e. anything goes.
(3) Kill them altogether.

I would just kill them in the context of gemini.

Link to individual message.

---

Previous Thread: [ANN] motm - the messageboard on the moon

Next Thread: [spec] <URL> is a UTF-8 erratum