💾 Archived View for sylvaindurand.org › emulate-a-browser-with-selenium › index.gmi captured on 2022-04-28 at 17:51:24. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Python is particularly effective when you want to automatically browse or perform actions on web pages. If it is enough to use libraries like Beautiful Soup when you just scrape web pages, it is sometimes necessary to perform actions on pages that require Javascript, or even to imitate human behavior. To do this, it is possible to fully emulate a web browser with Python.
We will use Selenium, which is easily installed with :
pip install selenium
To be able to use it, it is necessary to install a rendering engine. We will use Gecko, the Firefox rendering engine.
To do this, download `geckodriver`, decompress it, and place it in `/usr/local/bin`.
tar xvfz geckodriver-version-plateform.tar.gz mv geckodriver /usr/local/bin
To use Selenium, simply import at the beginning of the file:
from selenium import webdriver
You can start Selenium with:
driver = webdriver.Firefox(executable_path=r'/usr/local/bin/geckodriver')
Thus, to click on an element, we use for example:
driver.find_element_by_class_name('my_class').click()
To free the memory, you can exit Selenium with :
driver.quit()
If you want to launch it without a graphical user interface, you can use:
options = webdriver.FirefoxOptions() options.add_argument('-headless') driver = webdriver.Firefox(options=options, executable_path=r'/usr/local/bin/geckodriver')
If you use a proxy, or Tor (which is used as a proxy, with local IP `127.0.0.0.1` and port `9050`), it is possible to connect to it with Selenium using the following options:
profile = webdriver.FirefoxProfile() profile.set_preference("network.proxy.type", 1) profile.set_preference("network.proxy.socks", '127.0.0.1') profile.set_preference("network.proxy.socks_port", 9050) profile.set_preference("network.proxy.socks_remote_dns", False) profile.update_preferences()
You can then use:
driver = webdriver.Firefox(firefox_profile=profile, executable_path=r'/usr/local/bin/geckodriver')
Other options are available, for example to disable the cache:
profile.set_preference("browser.cache.disk.enable", False) profile.set_preference("browser.cache.memory.enable", False) profile.set_preference("browser.cache.offline.enable", False) profile.set_preference("network.http.use-cache", False)
It is also possible to clear cookies with:
driver.delete_all_cookies()