2024-10-27 Upgrading GoToSocial from 16.0 to 17.1

Here's something to watch out for, if you're like me: Disable all the infrastructure that watches over your processes. In my case, the problem was Monit. It checks the website every five minutes and if it fails to connect for three times in a row it restarts the server, breaking the migration. ๐Ÿ˜ญ

systemctl stop gotosocial
# prevent systemctl from restarting it
systemctl disable gotosocial
# prevent monit from interrupting the migration with a restart!
monit unmonitor gotosocial
# backup!
mkdir backup
cp sqlite.db backup/

Now you're ready to extract the new version over the old one, compare your config file with the example provided, and start it again.

systemctl enable gotosocial
systemctl start gotosocial
journalctl --unit gotosocial --follow

Don't be like me and start Monit because my Monit config checks the URL every five minutes and restarts GoToSocial if the site is not up. Which is a big problem if migration takes more than a handful of minutes.

I ended up with a borked migration restart loop and ended up stopping it all again, overwriting the borked database file with the backup, and redoing it.

โ€‹#Administration โ€‹#GoToSocial

In order to avoid future compilation, @dumpsterqueer@superseriousbusiness.org pointed me at this:

You can instruct GoToSocial on where to store the Wazero artifacts by setting the environment variable `GTS_WAZERO_COMPILATION_CACHE` to a directory, which will be used by GtS to store two smallish artifacts of ~50MiB or so each (~100MiB total). โ€“ Configuration Overview

Configuration Overview

I'll try that.

It looks like a side-effect of GoToSocial implementing the direct messages API is that the Toot! App I'm using is showing me all my former direct messages using it's special user interface (those bubbles on the right hand side). I have to open every single one of them to dismiss it. ๐Ÿคจ

The `.recover` command didn't work when I tried it:

# sqlite3 sqlite.db ".recover" | sqlite3 new.db
sql error: SQL logic error (1)

So then I tried the following:

monit unmonitor gotosocial
systemctl stop gotosocial
sqlite3 sqlite.db ".dump" > db.sql
mkdir backup
mv sqlite.db backup/
sudo -u gotosocial sqlite3 sqlite.db < db.sql
gzip backup/sqlite.db
gzip db.sql

Some errors that I saw:

A few lines about accounts with no account_uri even though that was a NOT NULL column.

Many, many such lines:

no such table: sqlite_stat4

Then this one:

NOT NULL constraint failed: conversations.thread_id (19)

I started to feel bad about the whole thing.

I aborted the operation. The gzip command hadn't finished, yet. I restored the old database file.

mv backup/sqlite.db .
systemctl start gotosocial

As it turns out, now my GoToSocial instance seems to be unreachable. The service starts, `htop` shows processes churning. The log shows i/o timeouts and "No Content: wrote 0B" log messages scrolling by. Oof! ๐Ÿ˜“

Looking at the timestamps again, it seems that the recovery command left a `sqlite.db-shm` and a `sqlite.db-wal` file in place.

-rw-r--r--     1 gotosocial gotosocial 10445488128 29. Okt 22:47 sqlite.db
-rw-r--r--     1 gotosocial gotosocial       32768 29. Okt 23:19 sqlite.db-shm
-rw-r--r--     1 gotosocial gotosocial      341992 29. Okt 23:19 sqlite.db-wal

That can't be right. So I'm going to stop `gotosocial`, move these two files away, and start it again.

Sadly, no luck.

Perhaps there is a database recovery going on? I can't tell. This time around I see the typical startup messages, something about "recovered queued tasks", about 12 requests that look like regular requests, and then nothing.

I'll let it run for a bit.

I restarted it again. It seems to work?

error dereferencing remote status โ€ฆ : enrichStatus: failed to dereference status author โ€ฆ : enrichAccount: error putting in database: sqlite3: database disk image is malformed (code=11 extended=11)
0xc0091c61e0: error processing: CreateAnnounce: error dereferencing announce: EnrichAnnounce: error fetching boost target โ€ฆ : enrichStatus: failed to dereference status author โ€ฆ : enrichAccount: error putting in database: sqlite3: database disk image is malformed (code=11 extended=11)

There's something about these authors that's not working.

The code in `account.go`:

// This is new, put it in the database.
err := d.state.DB.PutAccount(ctx, latestAcc)
if err != nil {
	return nil, nil, gtserror.Newf("error putting in database: %w", err)
}

I feel that this is where things are going wrong. Something about the accounts table.

I'm going to make an offline copy of the `sqlite.db` file. Sadly the `.recover` doesn't work on my laptop, either.

$ sqlite3 sqlite.db ".recover" > data.sql
sql error: SQL logic error (1)

Not looking good! I'm going to try the dump.

sqlite3 sqlite.db ".dump" > data.sql
sqlite3 recovery.db < data.sql 2>&1 |tee recovery.log

Let's look at the log file and list the errors!

+------------+---------------+--------------------------------+
| Occurences |     Type      |             Error              |
+------------+---------------+--------------------------------+
|        454 | Runtime error | UNIQUE constraint failed:      |
|            |               | media_attachments.id           |
|         69 | Runtime error | NOT NULL constraint failed:    |
|            |               | accounts.uri                   |
|       2111 | Parse error   | no such table: sqlite_stat4    |
|          1 | Runtime error | NOT NULL constraint failed:    |
|            |               | conversations.thread_id        |
+------------+---------------+--------------------------------+

I ended up filing an issue.

an issue

And then, later that day, I used `.dump`. This time around, there was a `COMMIT` at the end of the dump, so no change was required.

sqlite3 sqlite.db ".dump" > data.sql
tail data.sql  # verify that there is a COMMIT at the end
sqlite3 recovery.db < data.sql 2>&1 |tee recovery.log
rsync --archive --itemize-changes recovery.db "sibirocobombus.root:/home/gotosocial/sqlite.db"

The recovery log showed all the errors mentioned above, and I used the new database anyway.