2022-09-17

Musings on SmolZINE, continued

#gemini

I wrote:

And what is in this list now? Ok, I can look at the 203 entries. But I can try to extract the domain name of the capsules and see, whether some are mentioned a lot more than others

/en/2022/20220915-musings-on-smolzine.gmi

kelbot was quick to point out, that my shell one liner seemed a bit strange, since pollux.casa and flounder were indeed referenced in SmolZINE multiple times. He was right, of course. I had not "normalized" the domains to their last two fields. So, try it again, Sam:

So let's explain the resulting one liner in pieces. From all issues of SmolZINE, they being files in the current directory, grep the gemini links. Be sure to supress the file name (-h). Also be sure to anchor the search at the beginning of the line with '^':

grep -h '^=> gemini://' smolzine-issue-*.gmi |

The resulting lines start with the Marker for links and a blank (optional but apparently always present in this output). So I ask good ol'awk to just print the second string:

awk '{print $2;}' |

Now I have a list of all the gemini:// entries. In order to extract the highest two parts of any entry, I ask good ol'sed to massage the strings.

sed -e 's|^.*gemini://||' \
-e 's|/.*$||' \
-e 's|^.*\.\([^\.][^\.]*\)\.\([^\.][^\.]*\)$|\1\.\2|' \
-e 's|:[0-9][0-9]*$||' |

I changed the domain names by removing the subdomains, so sorting is meaningful only after this point

LANG=C sort |

And then we count adjacent identical lines and sort the resulting list numerically.

uniq -c | sort -nr

Now repeat after me (I broke the one liner for readability):

grep -h '^=> gemini://' smolzine-issue-*.gmi |
  awk '{print $2;}' |
  sed -e 's|^.*gemini://||' -e 's|/.*$||' \
  -e 's|^.*\.\([^\.][^\.]*\)\.\([^\.][^\.]*\)$|\1\.\2|' \
  -e 's|:[0-9][0-9]*$||' |
  LANG=C sort | uniq -c | sort -nr
      9 circumlunar.space
      8 flounder.online
      6 warmedal.se
      6 tilde.team
      5 transjovian.org
      5 smol.pub
      4 yesterweb.org
      4 midnight.pub
      4 locrian.zone
      3 tilde.pink
      3 thegonz.net
      3 skylarhill.me
      3 skyjake.fi
      3 rawtext.club
      3 pollux.casa
      3 gemi.dev
      3 breadpunk.club
      2 usebox.net
      2 tilde.club
      2 thurk.org
      2 susa.net
      2 noulin.net
      2 nightfall.city
      2 mozz.us
      2 gemlog.blue
      2 dimakrasner.com
      2 bacardi55.io
      2 antipod.de
      2 alchemi.dev
      1 yysu.xyz
      1 yujiri.xyz
      1 ynh.fr
...

And there they are: circumlunar.space, flounder.online, tilde.team, smol.pub ... and so on. Great.

Now, while we are at it, can we not generate an index pointing to the issue in which the domain is referenced? Sure we can, and it is still just a oneliner:

grep -h '^=> gemini://' smolzine-issue-*.gmi |
  awk '{print $2;}' |
  sed -e 's|^.*gemini://||' -e 's|/.*$||' \
  -e 's|^.*\.\([^\.][^\.]*\)\.\([^\.][^\.]*\)$|\1\.\2|' \
  -e 's|:[0-9][0-9]*$||' |
  LANG=C sort | uniq |
  while read D; do \
    X=$(grep -l "^=> .*$D" smolzine-issue-*.gmi | sed -e 's|^.*-||' -e 's|\.gmi$||' | sort -n | tr '\n' ' ') ; \
    echo -e "$D:\t$X" ; \
  done
0x80.org:       5
1436.ninja:     11 13
725.be: 17
7irb.tk:        9
adele.work:     32
adventuregameclub.com:  19
ainent.xyz:     18
ajul.io:        22
alchemi.dev:    6
alkali.me:      31
antipod.de:     9 12
atyh.net:       25
bacardi55.io:   7 24
beyondneolithic.life:   27
bjornwestergard.com:    21
bortzmeyer.org: 32
breadpunk.club: 12
bunburya.eu:    6
cadadr.space:   5
cadence.moe:    11
calmwaters.xyz: 1
campaignwiki.org:       5
chriswere.uk:   4
circumlunar.space:      2 9 11 14 16 19 32
...

However, is this thing useful? Not so much, because here the subdomains should maybe stay. And the protocol, too, so we have gopher and http links in the list as well. And can we make the number a link to the issue? Sure we can, one line per link. And wouldn't it be nice, if we could somehow scrape the preceeding description into the result as well? It surely would.

And where do we stop? I'll stop right here.

Have the appropriate amount of fun!

Home