Here's something to watch out for, if you're like me: Disable all the infrastructure that watches over your processes. In my case, the problem was Monit. It checks the website every five minutes and if it fails to connect for three times in a row it restarts the server, breaking the migration. ๐ญ
systemctl stop gotosocial # prevent systemctl from restarting it systemctl disable gotosocial # prevent monit from interrupting the migration with a restart! monit unmonitor gotosocial # backup! mkdir backup cp sqlite.db backup/
Now you're ready to extract the new version over the old one, compare your config file with the example provided, and start it again.
systemctl enable gotosocial systemctl start gotosocial journalctl --unit gotosocial --follow
Don't be like me and start Monit because my Monit config checks the URL every five minutes and restarts GoToSocial if the site is not up. Which is a big problem if migration takes more than a handful of minutes.
I ended up with a borked migration restart loop and ended up stopping it all again, overwriting the borked database file with the backup, and redoing it.
โ#Administration โ#GoToSocial
In order to avoid future compilation, @dumpsterqueer@superseriousbusiness.org pointed me at this:
You can instruct GoToSocial on where to store the Wazero artifacts by setting the environment variable `GTS_WAZERO_COMPILATION_CACHE` to a directory, which will be used by GtS to store two smallish artifacts of ~50MiB or so each (~100MiB total). โ Configuration Overview
I'll try that.
It looks like a side-effect of GoToSocial implementing the direct messages API is that the Toot! App I'm using is showing me all my former direct messages using it's special user interface (those bubbles on the right hand side). I have to open every single one of them to dismiss it. ๐คจ
The `.recover` command didn't work when I tried it:
# sqlite3 sqlite.db ".recover" | sqlite3 new.db sql error: SQL logic error (1)
So then I tried the following:
monit unmonitor gotosocial systemctl stop gotosocial sqlite3 sqlite.db ".dump" > db.sql mkdir backup mv sqlite.db backup/ sudo -u gotosocial sqlite3 sqlite.db < db.sql gzip backup/sqlite.db gzip db.sql
Some errors that I saw:
A few lines about accounts with no account_uri even though that was a NOT NULL column.
Many, many such lines:
no such table: sqlite_stat4
Then this one:
NOT NULL constraint failed: conversations.thread_id (19)
I started to feel bad about the whole thing.
I aborted the operation. The gzip command hadn't finished, yet. I restored the old database file.
mv backup/sqlite.db . systemctl start gotosocial
As it turns out, now my GoToSocial instance seems to be unreachable. The service starts, `htop` shows processes churning. The log shows i/o timeouts and "No Content: wrote 0B" log messages scrolling by. Oof! ๐
Looking at the timestamps again, it seems that the recovery command left a `sqlite.db-shm` and a `sqlite.db-wal` file in place.
-rw-r--r-- 1 gotosocial gotosocial 10445488128 29. Okt 22:47 sqlite.db -rw-r--r-- 1 gotosocial gotosocial 32768 29. Okt 23:19 sqlite.db-shm -rw-r--r-- 1 gotosocial gotosocial 341992 29. Okt 23:19 sqlite.db-wal
That can't be right. So I'm going to stop `gotosocial`, move these two files away, and start it again.
Sadly, no luck.
Perhaps there is a database recovery going on? I can't tell. This time around I see the typical startup messages, something about "recovered queued tasks", about 12 requests that look like regular requests, and then nothing.
I'll let it run for a bit.
I restarted it again. It seems to work?
error dereferencing remote status โฆ : enrichStatus: failed to dereference status author โฆ : enrichAccount: error putting in database: sqlite3: database disk image is malformed (code=11 extended=11)
0xc0091c61e0: error processing: CreateAnnounce: error dereferencing announce: EnrichAnnounce: error fetching boost target โฆ : enrichStatus: failed to dereference status author โฆ : enrichAccount: error putting in database: sqlite3: database disk image is malformed (code=11 extended=11)
There's something about these authors that's not working.
The code in `account.go`:
// This is new, put it in the database. err := d.state.DB.PutAccount(ctx, latestAcc) if err != nil { return nil, nil, gtserror.Newf("error putting in database: %w", err) }
I feel that this is where things are going wrong. Something about the accounts table.
I'm going to make an offline copy of the `sqlite.db` file. Sadly the `.recover` doesn't work on my laptop, either.
$ sqlite3 sqlite.db ".recover" > data.sql sql error: SQL logic error (1)
Not looking good! I'm going to try the dump.
sqlite3 sqlite.db ".dump" > data.sql sqlite3 recovery.db < data.sql 2>&1 |tee recovery.log
Let's look at the log file and list the errors!
+------------+---------------+--------------------------------+ | Occurences | Type | Error | +------------+---------------+--------------------------------+ | 454 | Runtime error | UNIQUE constraint failed: | | | | media_attachments.id | | 69 | Runtime error | NOT NULL constraint failed: | | | | accounts.uri | | 2111 | Parse error | no such table: sqlite_stat4 | | 1 | Runtime error | NOT NULL constraint failed: | | | | conversations.thread_id | +------------+---------------+--------------------------------+
I ended up filing an issue.
And then, later that day, I used `.dump`. This time around, there was a `COMMIT` at the end of the dump, so no change was required.
sqlite3 sqlite.db ".dump" > data.sql tail data.sql # verify that there is a COMMIT at the end sqlite3 recovery.db < data.sql 2>&1 |tee recovery.log rsync --archive --itemize-changes recovery.db "sibirocobombus.root:/home/gotosocial/sqlite.db"
The recovery log showed all the errors mentioned above, and I used the new database anyway.
To illustrate: I just saw in my log that `alter table statuses drop column visibility` took 31 min.
This time it took from 8:30 until 15:40 for the migration. More than seven hours!
Everything is slow. Unbearably slow. Semaphore claims the Internet is down and shows me cached posts. `toot tui` gives me an exception. But perhaps, slowly, things improve. I hope that this is the 7h backlog that needs to go through.
2024-10-27-upgrade-gotosocial-1.jpg
Perhaps the problem is the age of my SQLite?
alex@sibirocobombus ~> sqlite3 -version 3.40.1 2022-12-28 14:03:47 df5c253c0b3dd24916e4ec7cf77d3db5294cc9fd45ae7b9c5e82ad8197f3alt1
2024-10-27-upgrade-gotosocial-2.jpg
2024-10-27-upgrade-gotosocial-3.jpg
@dumpsterqueer@superseriousbusiness.org suggested running a manual ANALYZE and so I did:
root@sibirocobombus /h/gotosocial# sudo -u gotosocial sqlite3 sqlite.db SQLite version 3.40.1 2022-12-28 14:03:47 Enter ".help" for usage hints. sqlite> PRAGMA analysis_limit=0; ANALYZE; 0 sqlite> .quit
Surprisingly, this ran in less than five minutes! The ANALYZE that was part of migration ran for 1h and 50 min.
At the moment I don't see much difference, but who knows.
The first fix was to use the "nowasm" build offered by GoToSocial. It worked! It's a good solution because I have `ffmpeg` and `sqlite3` libraries installed.
One avenue the maintainers wanted to explore was a CPU feature that wasn't optional: SSE4.1. To check, look at `/proc/cpuinfo`. Does it expose `sse4_1`? Mine did not.
When I contacted my hosting provider support I got the help required, however. It was a setting I had to switch on and then power cycle the virtual machine. The support person even hopped onto the issue tracker to tell others about it. And with that, the problem was fixed! `/proc/cpuinfo` shows the `sse4_1` flag and I'm back on the regular GoToSocial build. ๐ฅณ