Are Geocache Logs Getting Shorter?

2023-08-24

Background and hypothesis

When geocachers find a geocache, they typically "log" their find both in the cache's paper logbook and on one of the online listing sites on which the cache's coordinates can be found. (I have a dream that someday cache logging could be powered by Webmentions or ActivityPub or some similar decentralised-Web technology, so that cachers can log their finds on any site on which a cache is listed or even on their own site and have all the dots joined-up... but that's pretty far-fetched I'm afraid. It's not stopping some of us from experimenting with possible future standards, though...)

Photograph showing a medium-sized geocache container with its contents laid-out around it: various pieces of swag for trade, plus a notebook.

I've been finding and hiding geocaches for... a long while, so I've seen lots of log entries from people who've found my caches (and those of others). And it feels to me like the average length of a geocaching log entry is getting shorter.

Screenshot of a digital log entry from Geocaching.com, titled "MagicV77 found Grove Farm" on 22 August 2023. The entirety of the log entry itself is a thumbs-up emoji.

"It feels to me like..." isn't very scientific, though. Let's see if we can do better.

Getting the data

To test my hypothesis, I needed a decade or so of logs. I didn't want to compare old caches to new caches (in case people are biased by the logs before them) so I used Geocaching.com's own search to open the pages for the 500 caches closest to me that are each at least 10 years old.

Browser tab bar showing many hundreds of Geocaching.com tabs.

I hacked together a quick userscript to save all of the logs in a way that was easier than copy-pasting each of them but still didn't involve hitting Geocaching.com's API or automating bulk-scraping (which would violate their terms of service). Clicking each of several hundred tabs once every few minutes in the background while I got on with other things wasn't as much of an ordeal as you might think... but it did take a while.



Needless to say I only had to go through the cycle a couple of times before I set up a keyboard shortcut.

I mashed that together into a CSV file and for the first time looked at the size of my sample data: ~134,000 log entries, spanning 20 years. I filtered out everything over 10 years old (because some of the caches might have no logs that old) and stripped out everything that wasn't a "found it" or "didn't find it" log.

That gave me a far more-reasonable ~80,000 records with which I could make Excel cry. (Just for fun, try asking Excel to extrapolate a second-order polynomial trendline across 80,000 pairs of datapoints. Just don't do it if you're hoping to use your computer for anything in the next quarter hour.)

Results

It looks like my hunch is right. The wordcount of "found" logs on traditional and multi-stage caches has generally decreased over time:

Graph showing word counts (log10) of geocache logs on different dates from August 2013 through August 2023, There's a slight downward trend.

"Did not find" logs, which can be really helpful for cache owners to diagnose problems with their caches, have an even more-pronounced dip:

Graph showing word counts (log10) of geocache logs on different dates from August 2013 through August 2023, There's a pronounced downward trend.

When I first saw that deep dip on the average length of "did not find" logs, my first thought was to wonder whether the sample might not be representative because the did-not-find rate itself might have fallen over time. But no: the opposite is true:

Graph showing how the "did not find" rate in my samples has climbed from an average of 4% to an average of 7.5% over the last decade.

Strangely, the only place that the trend is reversed is in "found" logs of virtual caches, which have seen a slight increase in verbosity.

Graph showing word counts (log10) of geocache logs on different dates from August 2013 through August 2023, There's a slight upward trend.

Conclusion

Within the limitations of my research (80,000 logs from 500 caches each 10+ years old, near me), there are a handful of clear trends over the last decade:

Are these trends a sign of shortening attention spans? Increased use of mobile phones for logging? Use of emoji and acronyms to pack more detail into shorter messages? I don't know.

I'd love to see some wider research, perhaps by somebody at Geocaching.com HQ (who has database access and is thus able to easily extract enough data for a wider analysis!). I'm also very interested in whether the identity of the cache finder has an impact on log length: is it impacted by how long ago they started 'caching? Whether or not they have hidden caches of their own? How many caches they've found?

But personally, I'm just pleased to have been able to have a question in the back of my mind and - through a little bit of code and a little bit of data-mashing - have a pretty good go at answering it.

Links

W3C standards document for Webmention

W3C standards document for ActivityPub

Indieweb wiki page discussing microformat standardisation for geocache logs (as a 'checkin' post kind), with Dan contributing

User cachemania on Flickr

CC BY-SA license

My first geocache log, from January 2010.

My geocaches

Geocaching.com log GL1AJ2JNC, which consists of a single emoji

Geocaching.com cache GC86M6V "Grove Farm", which recently received a single-emoji log

Userscript to export Geocaching.com logs for a particular geocache as JSON

Geocaching.com blog post announcing Virtual Rewards