💾 Archived View for shit.cx › 2020 › 11 › 02 › measuring-traffic-to-shit-cx.gmi captured on 2020-11-07 at 01:25:00. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

shit.cx

measuring traffic to shit.cx

I guess it's curiosity, because if it were ego I certainly wouldn't be publishing content to Gemini. Regardless, I'm interested in how many requests this site is getting.

I've thrown together a simple awk script to count the requests per hour for each url.

/Got request for/   { gsub(/\"/,"",$NF); data[$NF] += 1 }
END {
  for (u in data)
    print date_prefix"00",data[u],u
}

I've wrapped it in Make

popularity_report: date_prefix = $(shell date -u --date="1 hour ago" +%b\ %d\ %H:)
popularity_report: log_cmd = journalctl -u agate.service
popularity_report:
        @$(log_cmd) \
                | awk -v date_prefix="$(date_prefix)" -f bin/popularity_report.awk

And I trigger it hourly with cron.

2 * * * * make -C /usr/local/src/shit.cx popularity_report | grep -v ^make >> ~/popularity_report.txt

And with that, I have a running log of how many requests each page got per hour. It looks like this:

Nov 02 03:00 2 gemini://shit.cx/2020/10/31/i-picked-up-my-next-bike-build.gmi
Nov 02 03:00 1 gemini://shit.cx/2020/10/30/a-new-keyboard-build.gmi
Nov 02 03:00 1 gemini://shit.cx/2020/10/30/the-shit-cx-infra.gmi
Nov 02 03:00 6 gemini://shit.cx/

It isn't fancy but it's collecting all the data I expect I'll need. One day if I need to do something extravagant, I can feed it back into awk and gnuplot. But for now, the main thing is that the data is captured if and when it's wanted.

One day it might fill the disk, but that's a can still a long way down the road.

Return to homepage

---

created_at: 2020-11-02T04:26+00:00

email: jon@shit.cx

tags: gemini meta scripting

The content for this site is CC-BY-SA-4.0.