Again on feeds in Gemini format

On 11/19/20 5:44 PM, Solderpunk wrote:
> * The text of the first heading line (e.g. # My gemlog) to appear in the
>    document is equivalent to the <feed><title> element of an Atom feed.
>    All other heading lines are ignored.
> * All other lines which are not link lines are ignored.
> * All link lines with labels where the first whitespace-separated
>    component of the label is a recognised datestamp are treated as
>    equivalent to an Atom <entry> element, where:
>      - The link's URL provides <entry><link>
>      - The datestamp provides <entry><updated>
>      - The content of the label after the datestamp provides
>        <entry><title>
> * All other link lines are ignored

So I went and updated my aggregator code to match this format. Then went 
to SpaceWalk to look for pages to parse.

The output:
gemini://tilde.team/~emilis/aggregated-test.gmi

The code:
https://tildegit.org/emilis/gmi-feed-aggregator

Some notes:

1. I added support for relative URLs to my aggregator. At the moment it 
became the most complex part of the code.

2. Due to relative URL support it is possible to just parse Home/index 
pages of some of the gemlogs -- no separate feed file is needed.

3. Some gemlogs are missing the h1 or it is too generic ("Blog", 
"Updates"). I skipped those.

4. Some gemlogs have characters added to date. E.g. "2020-11-19: And 
then the title". I had to skip these too.

5. I removed the timestamps from link text and prefixed the feed-titles 
to the entry-titles to separate between sources. Seems OK, but one may 
need to hover over the link to be sure of where the link is coming from. 
I think I could add hostnames to link text. Not sure if link text will 
become too long.


--
Emilis Dambauskas

---

Previous in thread (33 of 37): 🗣️ Rohan Kumar (seirdy (a) seirdy.one)

Next in thread (35 of 37): 🗣️ cbabcock (a) asciiking.com (cbabcock (a) asciiking.com)

View entire thread.