[SPEC] Backwards-compatible metadata in Gemini

On 2/24/21 12:31 AM, Oliver Simmons wrote:
[snip]
> # My conclusion on the metadata topic
>
> I think the discussion of the how (format and what it would affect) has
> been good, but we should stop with it untill a good why (reasoning and use
> case) has been found.
[snip]

One example: when the capsule's owner has included date metadata, then
searches there can be expanded or narrowed according to date.  The same
for title, author, subject, or any other field that has been maintained
throughout the capsule.  With machine readable metadata in place, the
system can then do the work when limiting searches.

Another example: browsing through a 'tag cloud', also known as subject
categories, is very common.  That is one way of browsing through a set
of subjects in document metadata and is a case where it is not niche.

An example of when metadata is is not allowed or missing would be plain
full-text searches.  One well-known confound for full-text searching is
when a page talks about a topic in great detail without actually
including repetition of strings pertaining to that topic.  A lot of
technical writers within ICT know of this problem and pepper their
writing with expected search terms.  Writers, especially researchers, in
other fields just write to the topic and might not even include the
subject terms more than once if even that.  Thus full-text searching is
quite inaccurate, even with stemming, and does not scale well.

The actual metadata content can be made up as needed, as in uncontrolled
vocabulary, or it can conform to an agreed upon, restricted set, such as
ERIC or LCSH.

After capsules reach substantial size, both in number and length of
documents, it becomes impractical to rummage for content manually and
full-text becomes increasingly inaccurate.  It gets even harder to do
relevant searches if the content in the pages is all about the same
topic or similar, overlapping topics which share vocabulary.  There,
without the fielded searches which document metadata enables, the
material is effectively lost.  So some of use-cases for document
metadata deal with trying to retrieve material from largish capsules.

/Lars

---

Previous in thread (38 of 99): 🗣️ Petite Abeille (petite.abeille (a) gmail.com)

Next in thread (40 of 99): 🗣️ Jason McBrayer (jmcbray (a) carcosa.net)

View entire thread.