💾 Archived View for thrig.me › blog › 2024 › 12 › 15 › sqlite-bootstrap.gmi captured on 2024-12-17 at 10:40:49. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

SQLite Bootstrap

/blog/2024/12/12/bdb.gmi

SQLite is slightly more complicated than Berkeley DB, notably that a schema is required to create the various tables and whatnot, while with BDB one can simply create the file, as the equivalent of a SQL schema lives in the code that encodes and decodes whatever complicated structures need be stored in a Berkeley DB.

There are various places one can perform a sqlite database bootstrap; one approach for a standalone script (where only one script interacts with the file or is responsible for the bootstrap) is to include the bootstrap code within the script itself, something along the lines of:

    #-quicklisp
    (let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp"
                                           (user-homedir-pathname))))
      (when (probe-file quicklisp-init)
        (load quicklisp-init)))
    (ql:quickload :dbi)

    (defconstant +dbfile+ "sales.db")

    ; bootstrap (may need more error handling)
    (unless (probe-file +dbfile+)
      (format *error-output* "bootstrap '~a'~&" +dbfile+)
      (dbi:with-connection
        (dbh :sqlite3 :database-name +dbfile+)
        (dolist
            (statement
              '("CREATE TABLE IF NOT EXISTS emp (ename TEXT, deptid INTEGER);"
                "CREATE TABLE IF NOT EXISTS dept (deptid INTEGER, dname TEXT);"))
          (dbi:do-sql dbh statement))))
    (unless (probe-file +dbfile+)
      (format *error-output* "unable to bootstrap '~a'~&" +dbfile+))

    ; script logic ...
    (dbi:with-connection
      (dbh :sqlite3 :database-name +dbfile+)
      ...

A disadvantage here is that every time the script is run it must do a bootstrap check, and the bootstrap code may needlessly embiggen the script (or puts everything in one place for easy reference, which may be an advantage for a standalone script). Also there may be race conditions between database use and creation if multiple scripts access the database file at the same time; to avoid this, one might create the database into a temporary file, then rename it into place?

Another way to solve this is via configuration management, or for less organized people to remember to run some "do the initial setup for this host" script that builds out all the necessary sqlite database files, and maybe populates them from backups or some other authoritative source, if relevant. The bootstrap can be as simple or complicated as you want to make it, and the disadvantage is that random scripts may fail in random ways if the necessary database files have not been bootstrapped, as they may not waste code that checks on what the configuration management should have done. With configuration management you can usually know whether a host built correctly, less so if someone forgot to run some script or the other, and one of the last things configuration management might do is to actually start daemons or enable cron jobs if everything went well. This makes for less of a race condition compared to a script that might randomly be creating a database file while other random scripts try to use that database. Another thing to think about is how well the system repairs itself (or reports the error and what to do about it) if the database file is deleted, accidentally or otherwise, and how much CPU that wastes over time. A job run at system startup might be a good compromise between "each script run tries to bootstrap everything" and "only the initial bootstrap does the database creation stuff".

    $ cat Makefile
    sales.db: sales.schema
        sqlite3 sales.db < sales.schema
    $ cat sales.schema
    CREATE TABLE IF NOT EXISTS emp (
      ename TEXT,
      deptid INTEGER
    );
    CREATE TABLE IF NOT EXISTS dept (
      deptid INTEGER,
      dname TEXT
    );
    $ make
    sqlite3 sales.db < sales.schema
    $ sqlite3 sales.db 
    SQLite version 3.44.2 2023-11-24 11:41:44
    Enter ".help" for usage hints.
    sqlite> .schema
    CREATE TABLE emp (
      ename TEXT,
      deptid INTEGER
    );
    CREATE TABLE dept (
      deptid INTEGER,
      dname TEXT
    );
    sqlite> .q

The banal sales database is due to some "I should learn me some more about joins" tutorial, sorry about that. Usually I try for something interesting or useful like Goblin attack tables.

Database management can be additionally complicated by schema modifications, though for me those have been so infrequent that I've muddled through them ad hoc; if your schema changes are more frequent then you'd probably want more automation and less winging it.

A Makefile solution may not be ideal, as that scatters Makefile and schema files across the system, but configuration management could install from a central directory to where the files need to go. Another approach would be to place all the schema in a single directory (or tree) and then a script or multiple script runs could create all the database if need be. Multiple script runs would be slightly more expensive, though would give a more usable exit status if a particular database fails to bootstrap, and configuration management could then more easily not run any subsequent steps that rely on that database existing, such as to start up services. One script to run would be easier to document on a wiki, if you do not have configuration management handling that.

Why so many sqlite databases, one might ask? This depends on your system setup; you might end up with a local user database, local firewall blacklist (and goodlist and ratelimit) tables, and various other such utility databases, especially if you are coordinating logs or user accounts between distinct services. These database in turn might be synced from or with various central services, though I like systems that can operate more or less unimpaired even when most or all of the central services are on fire, hence not relying only on some central service. Contrast this to "everything is dead, lol" when too much is chained to LDAP or Active Directory or CrowdStrike.

Bootstrap Script

One can start small, and use guarded commands to ensure that the database is created, or does not exist. One may need some way to pass flags to sqlite3 on a per-database level, or to use a particular init file, though ideally a site would use the same configuration for all the sqlite3 database files. Reality is hardly ever so ideal; for example, company A and B could merge and have different database setup styles.

    #!/bin/sh
    dbfile=${1:?Usage: strapsq dbfile schema-file}
    scfile=${2:?Usage: strapsq dbfile schema-file}
    TMPDB=`mktemp -t "$dbfile".XXXXXXXXXX` || exit 1
    trap "rm -- $TMPDB 2>/dev/null" EXIT
    test -f "$dbfile" || sqlite3 -- "$TMPDB" < "$scfile" && {
        chmod 640 "$TMPDB"
        mv -- "$TMPDB" "$dbfile"
        test -f "$dbfile" || exit 1
    }

sqlite3 (as of version 3.44.2) does not appear use a rename(2) call, so the database creation may not be atomic, and may create an empty file if the schema fails to parse. Hence the use of mktemp(1). This extra work may help avoid post-bootstrap code from having to do various sanity checks that the database is ready to use. However, a temporary file may run afoul on some operating systems weird security context rules.

On the other hand, this bootstrap script isn't really doing much, and may need a loop somewhere to create multiple databases and those with different options. How the script would restore from backups or from a central source would also be very site and database instance specific. And maybe some or most of the guarded command logic could be moved into configuration management, to say less of triggering subsequent service starts. Meanwhile the "IF NOT EXISTS" in the schema either creates something, or at least does not fail if anything else already has.

More could be done with the schema configuration, which is in SQL, apart from any sqlite3_config(3) options used. For one, the schema should be wrapped in a transaction. During a system bootstrap there should not, in theory, be other connections to contend with. In practice, there could already be other code trying to use the database, so obtaining an exclusive lock may be a good default (unless the process with the exclusive lock wedges, thus creating a denial of service).

    BEGIN EXCLUSIVE TRANSACTION;
    CREATE TABLE IF NOT EXISTS emp (
        ename TEXT,
        deptid INTEGER
    );
    CREATE TABLE IF NOT EXISTS dept (
        deptid INTEGER,
        dname TEXT
    );
    COMMIT;

A transaction better ensures that all or none of the schema is applied, not some of it if there's a bad statement, or the system crashes. We probably still want the "remove the database file if the initial schema setup fails", as this will give a hopefully clearer error rather than trying to use a file without any tables in it. Or maybe not.

    $ cat selectall.pl 
    #!/usr/bin/perl
    use 5.38.0;
    use DBI;
    use Data::Dumper;
    my $dbh = DBI->connect( "dbi:SQLite:dbname=$ARGV[0]", '', '',
        { RaiseError => 1 } );
    my $ret = $dbh->selectall_hashref(q{SELECT * FROM foo}, '', {});
    warn Dumper $ret;
    $ rm nosuchfile
    rm: nosuchfile: No such file or directory
    $ perl selectall.pl nosuchfile
    DBD::SQLite::db selectall_hashref failed: no such table: foo at selectall.pl line 7.
    DBD::SQLite::db selectall_hashref failed: no such table: foo at selectall.pl line 7.
    $ stat -f '%z' nosuchfile
    0
    $ rm nosuchfile
    $ sqlite3 nosuchfile < /dev/null
    $ stat -f '%z' nosuchfile
    stat: nosuchfile: No such file or directory
    $ printf 'SELECT * FROM foo\n' | sqlite3 nosuchfile
    Parse error near line 1: no such table: foo
    $ stat -f '%z' nosuchfile
    0

SQLite interacting with a nonexistent file may create an empty file. This means the bootstrap code shown above is problematic, as that uses PROBE-FILE to only test whether the database exists. The bootstrap must also test whether the file is empty, and if so perform the bootstrap. I had hoped there might have been a good error message "no such file" rather than a random SQL query failure when the database file was not present; good error messages are easier to document—"if no database file, run the database bootstrap script"—while random SQL errors more likely result in the on-call escalating the issue right away. SQL? That's above my pay grade.

    #!/bin/sh
    dbfile=${1:?Usage: strapsq dbfile schema-file [flags ..]}
    scfile=${2:?Usage: strapsq dbfile schema-file [flags ..]}
    test -f "$dbfile" -a -s "$dbfile" && exit 0
    shift 2
    TMPDB=`mktemp -t "$dbfile".XXXXXXXXXX` || exit 1
    trap "rm -- $TMPDB 2>/dev/null" EXIT
    sqlite3 "$@" -- "$TMPDB" < "$scfile" && {
        chmod 640 "$TMPDB"
        mv -- "$TMPDB" "$dbfile"
        test -f "$dbfile" -a -s "$dbfile" || {
            echo >&2 "strapsq: unable to create '$dbfile'"
            exit 1
        }
    }

This version complexifies the test(1) calls to ensure that the regular file is not empty, and allows arbitrary flags to be passed to sqlite, for better or worse. There can be weirdness if the dbfile is a directory, but hopefully that should be a rare thing, and there is an error message in addition to a non-zero exit code when that happens.

    $ strapsq sales.db sales.schema -bail
    $ echo $?
    0

With configuration management, one would copy this script onto systems with SQLite databases, and probably also copy a directory (or tree) of schema files onto those system. Then for each database, strapsq is run with the appropriate schema, and if that goes well then other tasks can be triggered, such as to copy over the scripts that use the database, or to start services or cron jobs. If strapsq fails for a particular database, then these triggers would not happen, and hopefully a warning or ideally error is raised that someone can poke into why things went sideways. The shell really isn't good here, as it would require multiple test(1) calls instead of a single stat(2) and then from that decide what to do.

Yet another approach with configuration management would be to pre-create the database file elsewhere, and to simply copy in that entire file. This could be a restore from backups, or an empty (or mostly empty) database could live somewhere in version control. On initial system setup it is safer to directly clobber the file; if the system has already been running then re-running the initial system setup code might destroy production data. On the other hand, poking at a file with stat(2) calls before doing something with that maybe-not-there file is a race condition, though an attacker probably has better things to do if they already have that sort of access. The main risk would be a race between "code that first sets up the database" and "code that uses the database", which is why it may make sense to only copy the uses-database code onto the system after the database is bootstrapped.

Things become messier once the database is bootstrapped, is being used, and you need to make a schema change on it.