An offline city database

Somehow March is already drawing to a close, and I'm scrambling to handle all the things I have obligated myself to handle this month, one of which is my submission for the inaugural OFFLFIRSOCH. To this end I've just pushed to a public repo the rough first implementation of my very imaginatively tool, `city`. Hey, after ROOPHLOCH and OFFLFIRSOCH plus VF-1 and AV-98, I am entitled to one exceptionally mundane project name, okay? I had hoped to have this a little more polished by now, but the essential functionality is already there. It'll likely be "done" in the space of a week or so.

`city` Git repo

city is designed to provide an offline solution to two problems I routinely solve via the absurd use a web browser: answering "what time is it in city X right now?", and "how far is city X from city Y?". I am often interested in this second question in the context of utility radio DXing. This past winter I finally did some decoding of Digital Selective Calling (DSC) messages from coastal stations on 2187.5 kHz and regularly found myself wondering just how far away some small coastal city I'd never heard of was.

Cities don't move around much, and while they do change their timezone occasionally, it's a pretty rare event, so the information required to answer these questions doesn't really go stale. This makes these tasks a prime candidate for having an offline solution. I ended up using the "Gazetteer" dataset from GeoNames, a CC-BY licensed dataset which provides basic information, including timezone and latitude and longitude, for a huge number of cities around the world. You can download simple tab delimited files for all cities with populations over 500, over 1,000, over 5,000 or over 15,000. Obviously the files get larger as you include smaller cities. I went with the 15,000 cut-off for this project to keep things small, as this was kind of an experimental effort which might be superseded in future (more on that later).

GeoNames

Gazetteer data dumps

The basic user experience is like this: you just list one or more cities as command line arguments and hit enter:

$ city Adelaide
Adelaide:
	Local time: Fri 29 Mar 2024 22:27:30 ACDT
	Location: -34.93°, 138.60°
	Elevation: 59m
	Population: 1387290

(future editions will add some polish to this output, like formatting -34.93° as 34.93° S and 1387290 as 1,387,290)

If your city has a space in its name, use quotes to enclose it as appropriate for your shell of choice:

$ city 'San Francisco'
San Francisco:
	Local time: Fri 29 Mar 2024 05:00:20 PDT
	Location: 37.77°, -122.42°
	Elevation: 28m
	Population: 864816

(yes, at time of writing you really do need to provide 'San Francisco' with uppercase S and F, it will claim not to find 'san francisco', this will naturally be fixed soon)

If you list more than a single city, you'll get this display for each of them, plus a table of distances at the end:

$ city Helsinki Tampere Turku
Helsinki:
	Local time: Fri 29 Mar 2024 14:02:43 EET
	Location: 60.17°, 24.94°
	Elevation: 26m
	Population: 658864
Tampere:
	Local time: Fri 29 Mar 2024 14:02:43 EET
	Location: 61.50°, 23.79°
	Elevation: 114m
	Population: 244315
Turku:
	Local time: Fri 29 Mar 2024 14:02:43 EET
	Location: 60.45°, 22.27°
	Elevation: 22m
	Population: 195301
Distance between Helsinki and Tampere is 160.40 km
Distance between Helsinki and Turku is 150.17 km
Distance between Tampere and Turku is 142.40 km

(I think I'd like to get the distance table sorted from nearest to furthest or vice versa. The distances for the "Finnish Triangle" here *are* sorted, but that's a happy coincidence arising from the order I listed the cities in, which was alphabetical but is happily also by decreasing population, ihana!)

That's pretty much the whole shebang. I guess I'll add some method of getting output in miles and feet, and also need a way of specifying a country so you can disambiguate Paris, France from Paris, Texas, but that's really it. Does what it says on the tin, as they say.

In terms of implementation, I wanted to experiment with writing this tool in a manner which complies well with my current conceptualisation of permacomputing, which has changed quite a lot since I first encountered the term. I no longer think of permacomputing at all in environmental/ecological terms, at least not primarily. I think of it instead as computing in a way that emphasises natural immunity against "bit rot" (not an unavoidable natural phenomenon of computing but an artificial and self-inflicted one) and resisting the externally-driven obsolescence of tools and skills. Of course, doing this also has enviro/eco benefits, but it has other benefits too which are more psycho-social-cultural in nature and honestly I think those are at least as important if not more. Not that the other issues aren't important, but they are honestly much better addressed by simply computing less than computing differently. Anyway, much more detail on this line of thinking in a future post, I hope. The main points here are I wanted maximum portability, minimum dependencies and ultralight or ideally zero coupling to any toolchains for either development or installation.

To these ends, city is a single Lua script, Lua being a very portable and widely ported language whose standard implementation is written in a very mature and unchanging language (C89), and which is one of the very few languages where the average span of time between subsequent major new releases has reliably trended downward throughout its history. The single file is called `city.lua` in the repo to facilitate easy syntax highlighting etc. but the notion is that you "install" it by placing a copy of or link to this file named just `city` in /usr/local/bin or wherever the heck you wanna put it which is in your $PATH or whatever the local equivalent concept is in your preferred computing environment. This single file contains both the city data and the logic to search it. The city data is stored in a single large table variable, and it's not populated by parsing a copy of the original tab delimited data structure, rather it is written in the source code as a huge literal. The first 11 non-empty, non-comment lines of `city.lua` look like this:

all_cities = {
        {name="Shanghai", ascii=nil, lat=31.22222, lon=121.45806, country="CN", pop=22315474, elev=12, tz="Asia/Shanghai"},
        {name="Beijing", ascii=nil, lat=39.9075, lon=116.39723, country="CN", pop=18960744, elev=49, tz="Asia/Shanghai"},
        {name="Shenzhen", ascii=nil, lat=22.54554, lon=114.0683, country="CN", pop=17494398, elev=4, tz="Asia/Shanghai"},
        {name="Guangzhou", ascii=nil, lat=23.11667, lon=113.25, country="CN", pop=16096724, elev=15, tz="Asia/Shanghai"},
        {name="Kinshasa", ascii=nil, lat=-4.32758, lon=15.31357, country="CD", pop=16000000, elev=281, tz="Africa/Kinshasa"},
        {name="Lagos", ascii=nil, lat=6.45407, lon=3.39467, country="NG", pop=15388000, elev=11, tz="Africa/Lagos"},
        {name="Istanbul", ascii=nil, lat=41.01384, lon=28.94966, country="TR", pop=14804116, elev=39, tz="Europe/Istanbul"},
        {name="Chengdu", ascii=nil, lat=30.66667, lon=104.06667, country="CN", pop=13568357, elev=499, tz="Asia/Shanghai"},
        {name="Mumbai", ascii=nil, lat=19.07283, lon=72.88261, country="IN", pop=12691836, elev=8, tz="Asia/Kolkata"},
        {name="São Paulo", ascii="Sao Paulo", lat=-23.5475, lon=-46.63611, country="BR", pop=12400232, elev=769, tz="America/Sao_Paulo"},

(I'm sharing the first 11 lines and not the first 10 because having São Paulo in there lets me point out that `city` supports searching by "plain" and accented variants of names, which is nice)

You will notice immediately that this is woefully inefficient in terms of storage space. The strings "name", "ascii", "lat", "lon", etc. literally occur nearly twenty eight thousand times in city.lua, in exactly the same order on every line, and the timezones are written out in full each time, each though "Asia/Shanghai" occupies enough bytes for a 13 digit number which is many times more than necessary to enumerate all the world's timezones. From an orthodox software engineering perspective, this solution is cringe-inducing, but you know what, doing it this way the entire file is just over 3 megabytes, which has been trivially small on a PC for decades, so who cares? And, yep, your queries are looked up in this big table by stepping through it from top to bottom in order. The cities are stored in decreasing order of population on the assumption that queries for big cities will happen more often than for small cities and so this is fastest. This is also cringe compared to using a real database, but on my 13 year old laptop this approach does not provide a subjectively slower user experience than using Google, so who cares? Doing the "right thing" and storing this data in an external SQLite file would require depending upon additional third party libraries, would reduce the range of systems the tool could be easily ported to, and would complicate installation. It's a bad trade-off in this context.

Of course, this data could be used for a lot of different purposes, and I quite enjoy the idea of having it in a local database and having a suite of tools which all make use of it, using an environment variable to convey the filename for the SQLite file to all of those tools. But this highly integrated standalone tool idea has merit too, and is much more OFFLFIRSOCH friendly, so I used it first. The costs of the inefficiency are small enough for a one-off tool that they are worth paying for the benefits, but the more tools you bake this dataset into, the more space you are wasting. Of course, even ten tools would total maybe 35 meg, which remains trivial, so maybe this also doesn't matter, and one might argue that having the same data stored in multiple files in multiple formats in multiple languages actually provides an additional layer of resilience...

Anyway, that's `city`. It scratches a personal itch and I'm certain I will continue using it. I hope it's useful for some other folk, too. I will add some of the "polish" features mentioned above in the coming days or weeks, but I'm not really interested in adding additional functionality. If you send me a *really* good idea in the next week or so I might consider it, but otherwise once I have the polish added I will call it 1.0.0 and it will go into a long-term maintenance mode. I will make a new release once per year, probably each March as part of OFFLFIRSOCH since that's a convenient reminder, where I update the city data to the latest version from GeoNames. That need is mostly driven by the population data, which is the only thing which will change appreciably over time. Perhaps it was a bad idea to even include that, as without it things would be essentially static. But, well, it was there and it was easy to add, and it's nice to have quick and easy access to that too (elevation ended up in there for the same reason). Having population data which is one year out of date is not a huge problem for many casual purposes.