💾 Archived View for text.adventuregameclub.com › tech › 2021-06-21-kindle-to-sqlite3.gmi captured on 2023-06-16 at 16:29:41. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-03-01)
-=-=-=-=-=-=-
I use a little python script to grab clippings out of my (early model) kindle's `My Clippings.txt` file and import them into an sqlite3 database. The clips and notes are in a consistant format, so it's possible to parse the file with some regular expressions, and a little logic. I found a script that did most of the work already, and converted it to send the clips to a database.
The sqlite3 database has a table for holding these clips. This is the basic structure of the table:
CREATE TABLE IF NOT EXISTS "clips" ( `clipID` INTEGER PRIMARY KEY AUTOINCREMENT, `bid` INTEGER, `type` TEXT, `location` INTEGER NOT NULL, `text` TEXT, `datestring` TEXT );
For a little added efficiency, the import script will check for the latest date in the table and ignore any clips in `My Clippings.txt` from before that date.
This works well until my kindle runs out of charge completely and resets the date back to 1970. Then, because the kindle doesn't really show the date anywhere, I don't notice the problem for a while. (You'll see there's a little hack in the code that I use to grab those clips.)
There's also a table for books. Here's the basic structure:
CREATE TABLE IF NOT EXISTS "books" ( `id` INTEGER, `book` TEXT, `Title` TEXT, `SubTitle` TEXT, `Author` TEXT, PRIMARY KEY(`id`) );
#!/usr/bin/env python3 # -*- coding: utf-8 -*- import os import re from datetime import datetime import sqlite3 from shutil import copyfile import zc.lockfile # EDIT THESE TWO FILEPATHS DATABASE = u"/home/xxxx/Sync/SRS/kindleClips.sqlite3" MYCLIPPINGS = u"/media/xxxx/Kindle/documents/My Clippings.txt" BOUNDARY = u"==========\r\n" TEMPFILE = u"/tmp/kindleClippings.txt" book_ids = {} def get_sections(filename): with open(filename, 'rb') as f: content = f.read().decode('utf-8') content = content.replace(u'\ufeff', u'') return content.split(BOUNDARY) def get_clip(section): clip = {} lines = [l for l in section.split(u'\r\n') if l] if len(lines) != 3: return clip['book'] = lines[0] match = re.search(r'(\d+)-\d+', lines[1]) #Matches only highlights if not match: match = re.search(r'(\d+)', lines[1]) if not match: return position = match.group(1) #Grab Date String dmatch = re.search(r'Added on (.*)