2013-07-26 Extracting Starred URLs from Google Reader Takeout Data

Google Takeout

Here’s how I did it. First, take a look at the file `starred.json`.

(setq starred-items (with-current-buffer "starred.json (Google Reader-takeout.zip)" (goto-char (point-min)) (json-read))) (mapcar (lambda (item) (car item)) starred-items) ⇒ (items direction updated author title id)

I’m interested in `items`, which happens to be an array. Let’s see what each item contains.

(mapcar (lambda (item) (car item))
        (aref (cdr (assoc-string "items" starred-items)) 0))
⇒ (origin annotations comments author content replies alternate updated published title categories id timestampUsec crawlTimeMsec)

As it happens, the URL I’m interested in is part of `alternate`. Let’s make sure there’s always exactly one entry:

(mapc (lambda (item) (when (not (= 1 (length (cdr (assoc-string “alternate” item))))) (error “%S” item))) (cdr (assoc-string “items” starred-items)))

Phew! Let’s produce a first list of URLs and the respective titles:

(mapc (lambda (item) (let ((href (cdr (assoc-string “href” (aref (cdr (assoc-string “alternate” item)) 0)))) (title (cdr (assoc-string “title” item)))) (insert (format “* [%s %s]\n” href title)))) (cdr (assoc-string “items” starred-items)))
(defun redirection-target (url)
  (save-match-data (let ((url-request-method “HEAD”) (retrieval-done nil) (spinner “-\|_”) (n 0)) (url-retrieve url (lambda (status &rest ignore) (setq retrieval-done t url (plist-get status :redirect) url (replace-regexp-in-string “blogspot\\.ch” “blogspot.com” url) url (replace-regexp-in-string “\\?utm._” “” url)))) (while (not retrieval-done) (sit-for 1) (message “Waiting... %c” (aref spinner (setq n (mod (1+ n) (length spinner)))))) url)))

Now I can run the following search an replace operation in the buffer where I generated my list:

(while (re-search-forward "http://feedproxy\\.google\\.com/\\S-+" nil t)
  (replace-match (redirection-target (match-string 0))))

Phew, thank you, Emacs!

​#Emacs