On 11/19/20 5:44 PM, Solderpunk wrote: > * The text of the first heading line (e.g. # My gemlog) to appear in the > document is equivalent to the <feed><title> element of an Atom feed. > All other heading lines are ignored. > * All other lines which are not link lines are ignored. > * All link lines with labels where the first whitespace-separated > component of the label is a recognised datestamp are treated as > equivalent to an Atom <entry> element, where: > - The link's URL provides <entry><link> > - The datestamp provides <entry><updated> > - The content of the label after the datestamp provides > <entry><title> > * All other link lines are ignored So I went and updated my aggregator code to match this format. Then went to SpaceWalk to look for pages to parse. The output: gemini://tilde.team/~emilis/aggregated-test.gmi The code: https://tildegit.org/emilis/gmi-feed-aggregator Some notes: 1. I added support for relative URLs to my aggregator. At the moment it became the most complex part of the code. 2. Due to relative URL support it is possible to just parse Home/index pages of some of the gemlogs -- no separate feed file is needed. 3. Some gemlogs are missing the h1 or it is too generic ("Blog", "Updates"). I skipped those. 4. Some gemlogs have characters added to date. E.g. "2020-11-19: And then the title". I had to skip these too. 5. I removed the timestamps from link text and prefixed the feed-titles to the entry-titles to separate between sources. Seems OK, but one may need to hover over the link to be sure of where the link is coming from. I think I could add hostnames to link text. Not sure if link text will become too long. -- Emilis Dambauskas
---
Previous in thread (33 of 37): 🗣️ Rohan Kumar (seirdy (a) seirdy.one)
Next in thread (35 of 37): 🗣️ cbabcock (a) asciiking.com (cbabcock (a) asciiking.com)