2017-11-18 Mastodon Backup

mastodon-backup

OK, it seems that *Weapon vs. AC* is going to close down and that *Dice Camp* is the “new” RPG instance. I wanted a backup of all my toots. Mark Damon Hughes has written Mastotool which does that using screen scraping. MastoUserScrape by BobC does the same thing but doesn’t download all your media.

Mastotool

MastoUserScrape

One thing to note, though, is that scraping your user page (`https://octodon.social/@kensanata` and the like) means that you’re *not seeing any replies!*

I wanted to export all my toots, including replies, but based on the Atom feed. I got pretty far, I think, but I noticed that the chain of atom pages simply ends after a while:

pretty far

~/src/mastotool $ ./Mastotool.py --backup https://octodon.social/@kensanata --save kensanata@octodonsocial.atom
Downloading https://octodon.social/@kensanata...
Downloading https://octodon.social/users/kensanata.atom...
Downloading https://octodon.social/users/kensanata.atom?max_id=242067...
Downloading https://octodon.social/users/kensanata.atom?max_id=238093...
Downloading https://octodon.social/users/kensanata.atom?max_id=235430...
Downloading https://octodon.social/users/kensanata.atom?max_id=230812...
Downloading https://octodon.social/users/kensanata.atom?max_id=228327...
Done

Why doesn’t the last one have a *next* link? I raised an issue on GitHub and apparently that’s a bug.

issue

I wrote a new tool, mastodon-backup:

mastodon-backup

$ ./mastodon-backup.py kensanata@octodon.social
Get user info
Get statuses
We have 1276 statuses

This was the correct number of statuses a few minutes ago! As always, OAuth took a few extra cycles to get right, but I think the Mastodon.py library does everything I need.

Mastodon.py

Current state of mastodon-backup: it will log in and download all your statuses into a JSON file. Documentation of the format: see Toot dicts in the docs for *Mastodon.py*. The documentation says:

Toot dicts

Unless otherwise specified, all data is returned as python dictionaries, matching the JSON format used by the API. Dates returned by the API are in ISO 8601 format and are parsed into python datetime objects. ¹

¹

This is great. Documentation of the JSON format used by the API is available from the Mastodon documentation.

JSON format used by the API

My goal is to write a transformer that will just turn the JSON file into a HTML file. Alternatively, if you already have such a tool, please contribute? 😄

Some time later: OK, got text and HTML export. Mission accomplished!

Some time later: OK, got media backup, but only for the toots I authored myself, not for boosts, and some issues with the progress bar, and I want the HTML export to use images from the media backup instead of hitting the instance with hundreds of image source links.

Some time later: I think I’m done?

​#Mastodon

Comments

(Please contact me if you want to remove your comment.)

On Windows, I used Cygwin to install `python3` and `python3-pip` to get started but I get and error:

Cygwin

...
ModuleNotFoundError: No module named 'pip._vendor.requests'

So I switched and used `easy_install`.

easy_install-3.6 Mastodon.py
easy_install-3.6 html2text

– Alex 2017-11-20 10:29 UTC