Metadata Without A Proposal

On Fri, 26 Feb 2021 11:51:08 +0100
nothien at uber.space wrote:

> Hi!
> 
> I've lost track of the currently raging metadata thread entirely, and so
> I've started this as a new post.
> 
> Thus far, I think there's general consensus on the following needs for
> any metadata proposal:
> 
> 1. Must degrade gracefully for clients that don't understand metadata.

Agreed.

> 2. Must not be English-specific.

What is the preferable alternative? We could use numbers to indicate
element type, but ultimately numbers are dependent on numeral systems,
which depend on language and culture.

If instead of using English directly, we define opaque strings of
characters for the tags, such that the tag "author" consistently means
"author", we really achieve the same thing. That is a simple solution
that is language independent.

Or we could use emoji, although I believe most computer users in the
world would have a harder time typing out a given emoji than a given
opaque, ASCII- and English-compatible string.

> 3. Must be machine-parsable.

We should consider the difference between needs and wants here. If I
have no interest in specifying another license to use my work than what
is implied from my sharing it, that doesn't necessarily mean I don't
want to specify date or author, so perhaps all or most elements should
be optional.

>   
> 4. Should affect presentation.
>   
>    gemtext as a whole is about separating content from presentation.
>    Some of the earlier metadata proposals referred to metadata for
>    presentation, e.g. to specify a color to view the text in.  This is
>    against the spirit of gemtext/Gemini (if not the spec).

Agreed, but as I understand it you do *not* want it to affect
presentation.

>   
> 5. Must be difficult to extend.
>   
>    Again, this comes from the general Gemini philosophy that anything
>    that can be misused will be misused.  This rules out lots of current
>    proposals because they specify tags, and the usage of tags can only be
>    controlled by convention, which is subject to change.

What do you propose that prevents conventional use from dictating
reality? And why is it important that the specification can not be
extended? Unlike e.g. text/gemini, if a client doesn't support some
superset of the tags initially specified, there is no degradation. If
in the future we want to extend a meta data format to support e.g.
specifying where, in addition to when, it was written, the clients that
don't support it shouldn't suffer from it.

The only important concern to me is that there is a canonical
description of tags. That description can be extended indefinitely as
far as I'm concerned, for as long as the original meanings of the
initial set of supported tags aren't changed or overloaded by newer
tags.

> 6. Must be accessible.
>   
>    Some proposals discussed the usage of emojis, and others have opted
>    for creating new unofficial line types.  These don't degrade
>    gracefully for things like screen readers, until they adopt the
>    metadata proposal.  That's not great.

Agreed.

I think that instead of defining ourselves what fields are important we
should start from a standard, e.g. DCMI with the element set defined in
IETF RFC 5013.

With that as a basis, if there is no suitable format already, we can
define a human readable, text-compatible data format and a corresponding
text/xyz MIME type. Then, a text/gemini document that feels like
supplying additional metadata can link to a metadata file which the
server serves with the above MIME type. A client that does not support
the MIME type should defer to serving unknown text/* types as plain
text. A client that does support it can localize the elements, including
things like names and date and time formats. If the client is a
crawler, it should find the linked metadata document as a matter of its
normal operation because it is linked from the document.

Such formats already exist, but there is little interest in authoring
such files.

In that way, no extension or change to Gemini is necessary. No
specialized sub-formats for existing line types either.

Personally I don't think this is a standard I would use either way.
It's mostly for the benefit of robots that there's a point in
formalizing information like this. Humans can interpret such
information as indicated in the document itself in a much wider variety
of formats. It's not my intention, primarily, to serve robots.

-- 
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20210226/5de2
92a7/attachment.sig>

---

Previous in thread (4 of 16): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (6 of 16): 🗣️ Solene Rapenne (solene (a) perso.pw)

View entire thread.