samedi 7 avril 2018

Download files with Python, Selenium and Chrome headless

I have recurring tasks these days that consist to automatically download files on the Internet. Usually, requests, beautifulsoup and some tricks do the job effectively. Sometimes though, I have to play it hard and ask Selenium and Chromium* headless to do the heavy lifting. Alas, asking Chromium to automatically download files is not clear.

I found the solution in Chrome tracker, and you read it bellow written in Python:

from selenium import webdriver

# Let's create some option to make Chromium go headless
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('disable-gpu')

# Launch the browser 
browser = webdriver.Chrome(chrome_options=options)
download_dir = tempfile.TemporaryDirectory().name
os.mkdir(download_dir)

# Send a command to tell chrome to download files in download_dir without
# asking.
browser.command_executor._commands["send_command"] = (
    "POST",
    '/session/$sessionId/chromium/send_command'
)
params = {
    'cmd': 'Page.setDownloadBehavior',
    'params': {
        'behavior': 'allow',
        'downloadPath': download_dir
    }
}
browser.execute("send_command", params)

There you go, happy scraping!

* Of course, it works with regular Chrome too!