💾 Archived View for shit.cx › 2020 › 11 › 02 › measuring-traffic-to-shit-cx.gmi captured on 2020-11-07 at 01:25:00. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
I guess it's curiosity, because if it were ego I certainly wouldn't be publishing content to Gemini. Regardless, I'm interested in how many requests this site is getting.
I've thrown together a simple awk script to count the requests per hour for each url.
/Got request for/ { gsub(/\"/,"",$NF); data[$NF] += 1 } END { for (u in data) print date_prefix"00",data[u],u }
I've wrapped it in Make
popularity_report: date_prefix = $(shell date -u --date="1 hour ago" +%b\ %d\ %H:) popularity_report: log_cmd = journalctl -u agate.service popularity_report: @$(log_cmd) \ | awk -v date_prefix="$(date_prefix)" -f bin/popularity_report.awk
And I trigger it hourly with cron.
2 * * * * make -C /usr/local/src/shit.cx popularity_report | grep -v ^make >> ~/popularity_report.txt
And with that, I have a running log of how many requests each page got per hour. It looks like this:
Nov 02 03:00 2 gemini://shit.cx/2020/10/31/i-picked-up-my-next-bike-build.gmi Nov 02 03:00 1 gemini://shit.cx/2020/10/30/a-new-keyboard-build.gmi Nov 02 03:00 1 gemini://shit.cx/2020/10/30/the-shit-cx-infra.gmi Nov 02 03:00 6 gemini://shit.cx/
It isn't fancy but it's collecting all the data I expect I'll need. One day if I need to do something extravagant, I can feed it back into awk and gnuplot. But for now, the main thing is that the data is captured if and when it's wanted.
One day it might fill the disk, but that's a can still a long way down the road.
---
created_at: 2020-11-02T04:26+00:00
email: jon@shit.cx
tags: gemini meta scripting
The content for this site is CC-BY-SA-4.0.