💾 Archived View for warmedal.se › ~bjorn › posts › 2022-01-30-you-rascals.gmi captured on 2024-02-05 at 10:01:44. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-03-01)
-=-=-=-=-=-=-
I was looking at the stats for Antenna and they looked really weird. More than a hundred posts, but only from 20-something feeds. It didn't look right. I made a quick count of the different domains: 48. Then I checked the database, listing all the different feeds. And there it was.
When a feed is ingested this happens:
Now, there is a certain case that I had to consider from the start: what if two feeds have a link to the same post? The URL to an entry is a primary key in the database. There can only be one post with a particular URL. Which feed is the correct one? It's impossible to know, and any judgment can be wrong. The solution I chose was to "upsert" all entries. That is, I insert them in the database, but if there is a collision in the primary key I update the feed URL, title, author, and timestamp instead. If this were to be abused I would of course look at other options, the first being to block offending feeds.
Back to the issue at hand! Here's what caused the miscount:
2022-01-27T23:34:14.677411 INFO: Feed 'gemini://warmedal.se/~antenna/atom.xml' added to queue. 2022-01-27T23:35:02.442104 INFO: validating feeds: {'gemini://warmedal.se/~antenna/atom.xml'} 2022-01-27T23:35:02.443968 ERROR: feed URL 'gemini://warmedal.se/~antenna/atom.xml' is blocked by rules. 2022-01-27T23:48:22.633613 INFO: Feed 'gemini://warmedal.se/./~antenna/atom.xml' added to queue. 2022-01-27T23:50:01.865552 INFO: validating feeds: {'gemini://warmedal.se/./~antenna/atom.xml'} 2022-01-27T23:50:02.060685 INFO: fetched feed from 'gemini://warmedal.se/./~antenna/atom.xml' 2022-01-27T23:50:02.062693 INFO: attempting to parse feed 'gemini://warmedal.se/./~antenna/atom.xml' as gemlog feed
To avoid recursion of Antenna I've blocked its own feeds, both gemsub (the index page) and the atom feed. However someone toyed around with it and found a way around the block! This caused Antenna to ingest itself, causing all the currently submitted feeds to be attributed to gemini://warmedal.se/~antenna/atom.xml, i.e. Antenna itself.
"Clever girl" as Muldoon would say.
I'm not entirely sure what the best way to prevent this kind of thing is, but the fix I've added is to remove all "/.." and "/." from feed URLs. I've also done the same for entry URLs as well, to avoid getting several links to the same posts manipulated the same way.
I assume this was done in good humour, but I'm a tiny bit miffed that my stats for the last month or so are very unreliable. At one point on January 27th there were 107 entries on Antenna, attributed to only 2 feeds 🤪
Well, no real harm done!
-- CC0 ew0k, 2022-01-30