💾 Archived View for republic.circumlunar.space › users › johngodlee › posts › 2019-10-20-ig-dl.gmi captured on 2024-02-05 at 10:37:14. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Scraping instagram without an account

DATE: 2019-10-20

AUTHOR: John L. Godlee

There are lots of people I would like to follow on Instagram, mostly woodworkers, bicycle people, and outdoors people. It seems to be a really good method of delivering content. Unfortunately for Instagram, there is absolutely no way I would make an account with them. I fear it would be too much of a time sink, and I'm paranoid of giving too much detail of my personal interests to Facebook.

I found a command line tool called InstaLooter[1] which you can use to scrape public Instagram profiles without an account and save the images on my local machine which I can then read at my leisure, in the spirit of RSS. This is how I implemented the program.

1: https://github.com/althonos/InstaLooter

I created a text file which lives in my $HOME called .ig_subs.txt. The file holds a list of Instagram user IDs for the accounts I want to scrape from:

kelsoparadiso
lloyd.kahn
exploringalternatives
barnthespoon
terrybarentsen
woodlands.co.uk
zedoutdoors
mossy_bottom

Then I made a shell script which lives in my path, called insta_dl:

#!/bin/bash

# Make directory if it doesn't exist
mkdir -p $HOME/Downloads/ig

# make newlines the only separator
IFS=


\n' 

# disable globbing
set -f          

# Loop
for i in $(cat < "$HOME/.ig_subs.txt"); do
  instalooter user $i $HOME/Downloads/ig/ -n 1 -N -T {username}.{date}.{id} 
done

instalooter user $i downloads photos from each user i. -n 1 only downloads the most recent post, whether that post is one photo or multiple. -N only downloads images which don't already exist in the destination directory ($HOME/Downloads/ig/), based on the filename. -T {username}.{date}.{id} sets the filename of each photo. {id} is unique for each photo on Instagram, so it uniquely identifies each file downloaded for use by -N. The filenames then look something like this:

exploringalternatives.2019-09-27.2142383070393557093.jpg
kelsoparadiso.2019-10-09.2150831532411304437.jpg
kelsoparadiso.2019-10-09.2150831532419588103.jpg
kelsoparadiso.2019-10-09.2150831532419839765.jpg
lloyd.kahn.2019-10-11.2152638264107259024.jpg
mossy_bottom.2019-10-09.2151026330651686709.jpg
terrybarentsen.2019-10-03.2146722625883638769.jpg
terrybarentsen.2019-10-03.2146722625900303797.jpg
terrybarentsen.2019-10-03.2146722625950630270.jpg
woodlands.co.uk.2019-10-11.2152273592812162360.jpg
zedoutdoors.2019-10-02.2145942922787735607.jpg

If I wanted to I guess I could further file each image into its own directory based on username or date, but I don't want that.

I can now create a cronjob or a LaunchAgents script to automate this to run everyday or every week in the background.

Update - 2019_10_31

I updated the insta_dl shell script so that it also grabs the caption of each instagram post downloaded and stores it in a text file. InstaLooter can download post metadata as a JSON file by adding the -d flag (--dump-json). Then I use jq to parse the JSON file for each post to extract the full name of the account (.owner.full_name), the @username of the account (.owner.username) and the content of the caption of the post (.edge_media_to_caption[][][].text). Then I use sed to put a blank line between each caption to make it easier to read and delete the original JSON files:

#!/bin/bash

# Make directory if it doesn't exist
mkdir -p $HOME/Downloads/ig

DIR=$HOME/Downloads/ig

# make newlines the only separator
IFS=


\n' 

# Loop
for i in $(cat < "$HOME/.ig_subs.txt"); do
    instalooter user $i $DIR -v -d -n 1 -N -T {username}.{date}.{id} 
done

for i in $DIR/*json ; do
    cat $i | jq '(.owner.full_name + " (" + .owner.username + "): " + .edge_media_to_caption[][][].text)'
done > $DIR/description.txt

sed -i 'G' $DIR/description.txt

rm $HOME/Downloads/ig/*.json