💾 Archived View for sylvaindurand.org › use-tor-with-python › index.gmi captured on 2022-04-29 at 11:23:20. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-04-28)
-=-=-=-=-=-=-
This page will show you how to use Tor to anonymously access data with a Python script. This can be particularly useful if you want to create a scrapper without being banned by the server concerned.
The installation of Tor depends on your system, and is detailed on the official website. On a Debian or Raspbian, we use:
sudo apt-get install tor
To launch Tor, just run:
sudo service tor start
To check if it works, simply run the following command from a terminal:
curl --socks5 localhost:9050 --socks5-hostname localhost:9050 -s https://check.torproject.org/ | cat | grep -m 1 Congratulations | xargs
This command will display:
Congratulations. This browser is configured to use Tor.
To request a page, use the `requests` library. If you do not have it, just install it:
pip install requests pip install requests[socks] pip install requests[security]
If there is an error for the last command, try to install `cryptography` requirements:
sudo apt-get install build-essential libssl-dev libffi-dev python-dev
We then use, in Python:
import requests
You can check your IP address without Tor with the command:
requests.get('https://ident.me').text
To use Tor, we tell it to use a proxy:
proxies = { 'http': 'socks5://127.0.0.1:9050', 'https': 'socks5://127.0.0.1:9050' } requests.get(url, proxies=proxies).text
So, you should have a new IP address with:
requests.get('https://ident.me', proxies=proxies).text
If you need a new identity, and change your IP address, you need to install stem:
pip install stem
The Tor controller must also be configured to request identity renewal:
sudo nano /etc/tor/torrc
We use the parameters:
ControlPort 9051 CookieAuthentication 1
Then we restart Tor to take into account these modifications:
sudo service tor restart
With Python, we now use the following command:
from stem import Signal from stem.control import Controller with Controller.from_port(port = 9051) as c: c.authenticate() c.signal(Signal.NEWNYM)
To check it, we look if we get a new IP with:
requests.get('https://api.ipify.org', proxies=proxies).text
If anonymity is required, it may be useful to change the user-agent , which betrays our identity to the server. To do this, install `fake_useragent`:
pip install fake_useragent
We can then use, in Python:
from fake_useragent import UserAgent headers = { 'User-Agent': UserAgent().random } requests.get(url, proxies=proxies, headers=headers).text
If your Python script is to be used regularly using a Cron job, it may be useful to add a random delay to prevent the access time from being too regular:
import random, time wait = random.uniform(0, 2*60*60) time.sleep(wait)