<-- back to the mailing list

[SPEC] Backwards-compatible metadata in Gemini

Lars Noodén lars.nooden at gmx.com

Wed Feb 24 13:01:34 GMT 2021

- - - - - - - - - - - - - - - - - - - 

On 2/24/21 12:31 AM, Oliver Simmons wrote:[snip]

# My conclusion on the metadata topic
I think the discussion of the how (format and what it would affect) has
been good, but we should stop with it untill a good why (reasoning and use
case) has been found.[snip]

One example: when the capsule's owner has included date metadata, thensearches there can be expanded or narrowed according to date. The samefor title, author, subject, or any other field that has been maintainedthroughout the capsule. With machine readable metadata in place, thesystem can then do the work when limiting searches.

Another example: browsing through a 'tag cloud', also known as subjectcategories, is very common. That is one way of browsing through a setof subjects in document metadata and is a case where it is not niche.

An example of when metadata is is not allowed or missing would be plainfull-text searches. One well-known confound for full-text searching iswhen a page talks about a topic in great detail without actuallyincluding repetition of strings pertaining to that topic. A lot oftechnical writers within ICT know of this problem and pepper theirwriting with expected search terms. Writers, especially researchers, inother fields just write to the topic and might not even include thesubject terms more than once if even that. Thus full-text searching isquite inaccurate, even with stemming, and does not scale well.

The actual metadata content can be made up as needed, as in uncontrolledvocabulary, or it can conform to an agreed upon, restricted set, such asERIC or LCSH.

After capsules reach substantial size, both in number and length ofdocuments, it becomes impractical to rummage for content manually andfull-text becomes increasingly inaccurate. It gets even harder to dorelevant searches if the content in the pages is all about the sametopic or similar, overlapping topics which share vocabulary. There,without the fielded searches which document metadata enables, thematerial is effectively lost. So some of use-cases for documentmetadata deal with trying to retrieve material from largish capsules.

/Lars