💾 Archived View for lonelysilo.ca › article › links-and-anchors.gmi captured on 2021-12-17 at 13:26:06. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
I thought I'd write an article about one foundational piece about the gemini semantics RFC from earlier. One of the things that separates the kind of text-based writing from the pre-internet computing era and post is the use of hyperlinks. I recall reading simple text documents on my Apple ][ and they were rich with information, but there was only the most informal way of refering to other documents even on the same disk. Often it was just simple file name reference in the text itself.
For more information about ordering from Beagle Bros. print the order form in ORDERS
This worked fine for the most part because cross references between text files wasn't very common, files were accessed for the most part from the local computer's file systems and computer users were accustomed to the manual process of finding the file to open it. Once the internet arrived with its explosion of easily accessible content and protocols it became a real chore to cross easily from document to document.
This is where the idea of hypertext and hyperlinking became such a huge advance and the whole concept of internet browsing took off. It was such an enormous boost to usability once the knowledge of their existence and how they are used became common. Maybe they were one of the first internet memes as people began to understand them by seeing how other people used them, myself included. I find that discovering the full power of them is a journey.
Once hyperlinking was in common use there was a slight problem. Documents can be large and links only brought you to the document. The user was left to sift through it to find a specific spot that the referring link was highlighting in some way. Maybe it's a specific paragraph, or perhaps even a snippet of text or bullet point? Sometimes source documents would quote the particular portion of interest, which can be helpful somewhat to breaking the user's concentration. There must be a better way.
This is where the concepts of fragments and anchors were developed. HTML produced a special case for the <a> (anchor) tag where you could place anchors in your document. URL's began supporting the concept of a fragment at the end of a hyperlink using the # character and the two worked fairly well together. Meanwhile, Plan 9's acme editor devised a parallel system based on its own concept of a single filesystem namespace and a special mouse gesture to navigate file paths and a special Unix regeular expression like notation at the end to find a specific text snippet or line number anchor within the destination file.
place1.html:
<h1>Heading <a id="section5"></a></h1>
place2.html:
<a href="http://someserver/plance1.html#section5"></a>
output.txt:
some/path/to/file1.cpp:19 Missing declaration for type 'foo'
README:
You'll notice that in some/path/to/file1.cpp:/type foo struct/ that
it sometimes doesn't compile.
Both approaches made a big difference for the navigability within a document. The HTML/URL approach rise to much more prominence as The Web exploded in usage several times over the last couple of decades. However, there are serious problems with invisiblility, complication and malicious links in HTML. URL's lack a way to highlight a specific text region in the destination document without participation of that destination document with a convenient anchor in place and so it can be imprecisely referenced.
The Gemini protocol standard is in some ways an interesting mixture of the systems described so far. The focus on the text content really reminds me of the Apple ][ and Unix/Plan 9 days. But, it also combines stateless internet connection protocol and URL's. Because the focus is on the text, it enforces the separation of the URL's themselves, which can be rather verbose, from the text lines and paragraphs. That last part is an interesting approach that contrasts HTML, that prefers now to hide the content of the URL's from the user.
There are questions about how Gemini handles both the ease of navigation of links and the precision problem of the target resource. Since the protocol doesn't spell out how to relate a place in a text line with a specific link a convention has been observed where the URL is given a name in the link line and some long paragraph can refer to it in square brackets. This does appear to help the user to sift through the links that appear later in the document to find the correct one to click, although, I'm sure that there is more that can be done to optimize this workflow in the Gemini browsers that I've been using. I think it would be interesting to add mouse/keyboard gestures to observe this convention and navigate quickly to the matching link line and I feel that Plan 9 Acme's approach could be a really good place to start.
This is my favorite [quiche recipe] because it artfully combines the cheese
and spinach with the egg in such a tender crust.
...
=> gemini://recipesr.us/quiche1.gmi (quicke recipe)
On the problem of precise navigation of links the Gemini protocol similarly doesn't provide specific guidance on this, which is a good thing in my opinion, but time will be the ultimate judge. I might point out a convention that I've seen both in documents that predate Gemini and some Gemini ones. There is a kind of implicit anchoring mechanism that's used in English (and likely other) writing where acronyms are first spelled out in full form with brackets to provide the short form that is later used elsewhere in the document.
The Ontario Academic Credit (OAC) system was an extra year of high-school in the province.
...
The OAC system was latter dropped in an effort to cut costs to the public education system.
...
I have noticied that a similar convention is sometimes applied in simple text files to provide short forms for document headings in the title line. The short form can then be referenced later in the document. The idea is to further formalize these conventions as a way to declare anchors in a document. Because Gemini proptocol uses URI's for the hyperlinks then the fragments can be used to hit the first instance of the fragment in round-brackets in the destination document.
course_curriculum.gmi:
...
# Introduction to Macro Economics (intro-macro-econ)
...
Note that intro-macro-econ is a pre-requisite for...
...
my_courses_this_semester:
...
=> gemini://some_host/course_curriculum.gmi#intro-macro-econ
While this kind of precise reference works great with the destination document provides convenient anchors there are times when you're not so lucky. For this case I recommend a new convention based on the Plan 9 Acme linking mechanism where a snippet of text can be referenced using a Unix regex-like syntax for the URI fragment. I there is a special / character at the beginning of the URI fragment then the remaining text is treated as a regular expression with a trailing / character at the end.
=> gemini://somehost/somedocument.gmi#/some.snippet.goes.here/
Regular expression pattern matching might seem like overkill here, but when someone is dealing with a destination document for which they have no control it can be a super powerful and useful tool to dig into the specific detail that is needed. Also, this syntax doesn't preclude the development of an alternate syntax that is is wildcard or just simple text based.
Note that Gemini has taken the design decision (for better or worse) to require encoding the URI's and so spaces and other reserved characters will need to be escaped with the ugly syntax (e.g. %20). While I feel that this isn't ideal, it should be workable in practice. This design decision could be discussed in a larger context over time.
Because Gemini is explicitly line oriented, unlike HTML, linke number references could be useful in makin precise references too. Again, I suggest using Plan 9 Acme as an inspiration for a anchoring to a specific line number.
gemini://my_source_code_repo/main.c#:255
I've decided to move some of these extra discussions into a separate section because they might be a bit further removed from the conventions that I've already observed with Gemini and also these might be more edge cases of less interest to many people right now.
One problem that has come up with the work in Gemini Semantics is hyperlink reptition, which can harm readability. This can happen when the hyperlinks are very similar, but not identical. For example, if you are linking to different anchors within a destination document, you will need to repeat the same protocol, host, path several times and the only difference is the fragment at the end of them. I would like to propose a solution to this problem based on some of the work that was done in the XML and RDF worlds that encountered the same problem with the URI's. They permitted link references to have two parts separated by a colon. The first part refers to the URI elsewhere in the document and the second logically appends that text to the link for the instance.
I am looking at the [emissions:co2-particles] in the report and finding
...
Meanwhile, the [emissions:ch4-particles] seem to be
...
=> gemini://scientific-journals/emissions-report-2021.gmi# (emissions)
Note that the example above is semantically equivalent to this example.
I am looking at the [emissions-co2-particles] in the report and finding
...
Meanwhile, the [emissions-ch4-particles] seem to be
...
=> gemini://scientific-journals/emissions-report-2021.gmi#co2-particles (emissions-co2-particles)
=> gemini://scientific-journals/emissions-report-2021.gmi#ch4-particles (emissions-ch4-particles)
A nice side-effect of this proposal is a slightly different syntax to help with in-page navigation to anchors, which is something that wasn't adequately addressed earlier in this document. How does a Gemini document writer refer to another anchor in the same document? Here is an example of how this can be achieved as a variant of the example above.
# Section 1 (section-1)
...
# Section 10 (section-10)
Please refer to [:section-1] before completing this section.
In this case the link reference is refering to the previous anchor. Because the link reference is annotated in square brackets then tooling such as browsers can make it easier for the user to navigate these or discover internal back-links.
Now that we've come down this path there remains one more question. How does one create a link reference to the current document itself? I suggest that in this trivial case it would be an empty pair of square brackets []. You might be wondering why this would ever be needed in practice. Well, I found some cases while developing Gemini Semantics. You can learn more about them [here].