💾 Archived View for jacksonchen666.com › posts › 2022-12-03 › 14-33-00 › index.gmi captured on 2024-12-17 at 09:31:35. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-11-04)

-=-=-=-=-=-=-

reducing the database size for matrix synapse

2022-12-03 14:33:00Z (last updated 2023-10-16 08:55:03Z)

let me check on the database size of synapse...

postgres=# SELECT pg_size_pretty( pg_database_size( 'synapse' ) );
 pg_size_pretty
----------------
 13 GB
(1 row)

yeah now is probably a good time to start cleaning up the database again

rationale for this blog post

i have found a resource for compressing the synapse database, but it is very outdated (used endpoints that were gone) and uses curl requests (i'd prefer synadm). so instead, i wrote a blog post that would take the steps outlined in that linked post, then make it useful for people preferring synadm (or links to doc for curl users).

mentioned resource

synadm

some notes before continuing

URL encoding

setting up

because this guide will use synadm and synapse's admin API endpoints, you will need to obtain an access token to an account on your homeserver that is an admin account.

smapse admin API endpoints

for synadm, follow the configuration guide:

synadm configuration guide

for curl users: if you don't know how to get an access token, you can "borrow" an access token in your client or look at the matrix spec for logging in to get a token.

(for curl users) matrix spec for getting login methods

(for curl users) matrix spec for logging in

finding rooms where everyone on your homeserver left it

synadm room list -s joined_local_members -r

the above command list the rooms starting from the least amount of users on your homeserver has joined.

list rooms admin API ("order_by" and "dir" parameters are relevant)

the "joined_local_members" number indicates how many users on your homeserver is joined to that room. if it's 0, you can probably safely delete that room without getting complaints about your homeserver leaving rooms "randomly".

for a neat command that outputs just the room id for rooms everyone on your homeserver left (for up to 500 rooms):

synadm -o json room list -s joined_local_members -r -l 500 | jq -r '.rooms[] | select(.joined_local_members == 0) | .room_id'

(requires jq, json command line processor)

deleting rooms

now that you found some rooms that you will delete, it is time to delete those rooms from the database.

NOTE: we do not want to block a room. blocking a room prevents joining that room. in case you do need to unblock a room, you can use something like `synadm room block -u '<room_id>'` for synadm. see admin API docs for unblocking rooms if needed.

admin API docs for unblocking rooms

here's the command for deleting a room:

synadm room delete '<ROOM_ID>'

delete room admin API

replace "<ROOM_ID>" with the room ID, and wrap it in single quotes, so that your shell hopefully does not interfere with the ! character (which means something to the shell). double quotes are not usable, since it would trigger the shell to do it's thing and change the command. or you can use escaping with backslashes.

i deleted a few rooms, including matrix HQ and the blackarch room, because they are pretty large and no local users was in the room.

running state compression

with rust-synapse-compress-state, state are compressed so that they take up less space.

rust-synapse-compress-state

the repository has an automatic and simple tool to go over states and compress them. the steps for building and running the tool has already been documented, so i won't document it again here.

automatic and simple tool

extra: purging old cached remote media

media is usually impossible to further compress. so instead, this will be about deleting old media.

this command/admin API specifies what to delete based on the date it was last accessed.

here's the incomplete synadm command:

synadm media purge

purge remote media admin API

database reindexing

updates and deletes in the database can leave junk in indexes. reindexing recreates indexes, so that it doesn't contain old junk.

postgres routine reindexing docs

postgres >= 12

for postgres >= 12, run the following query (substituting "<database_name>" appropriately):

REINDEX DATABASE CONCURRENTLY <database_name>;

you can run that without shutting down synapse first, as the "CONCURRENTLY" option makes postgres lock the tables as little as possible, allowing normal function of synapse.

postgres 12 documentation for REINDEX

postgres 11

concurrent reindexing is not an option in postgres 11, so **you should stop synapse before running REINDEX on the synapse database**. the SQL command would look like this:

REINDEX DATABASE <database_name>;

postgres 11 documentation for REINDEX

reindexing made the database size 3GB smaller.

database vacuuming

NOTE: a "FULL" vacuum can prevent synapse from working and requires extra disk space, so **stop synapse and make sure you have enough free space before running a full vacuum**. if that cannot happen, remove "FULL" from the SQL statement for at least, some vacuuming of the database.

NOTE: **ensure you're connected to the correct database.** else, you'll vacuum the wrong one.

VACUUM FULL;

postgres 11 documentation for VACUUM

vacuuming stats

before vacuuming (after reindexing):

synapse=# SELECT pg_database_size( 'synapse' );
 pg_database_size
------------------
      10654737187
(1 row)

(10.6 GB)

anyways, after all this vacuuming mess:

synapse=# SELECT pg_database_size( 'synapse' );
 pg_database_size
------------------
       5637989155
(1 row)

(5.6 GB)

synapse=# SELECT pg_size_pretty( pg_database_size( 'synapse' ) );
 pg_size_pretty
----------------
 5377 MB
(1 row)

(numbers might be inaccurate due to incorrect labeling of units)

anyways, hope that helped, even in the future. if something doesn't work and you followed the guide as intended, please throw an email at me.

public inbox (comments and discussions)

public inbox archives

(mailing list etiquette for public inbox)