Data scraping

Download binary data like pictures (jpg, png, etc.)

This post is based on an stackoverflow answer. First of all you should use the requests package instead of using raw urllib{1,2,3}.

If you do not want to download a typical HTML website with

import requests
response = requests.get('http://blog.itkun.de/index.html')
print(response.text)

, but want to download for example an image?

You can also use requests for this, but you have to do a little bit more, since the data being downloaded is binary.

pic = requests.get('http://blog.itkun.de/favicon.png', stream=True)

lets you use a stream instead of getting the text as a string.

if pic.ok:
    with open("favicon.png", "wb") as fobj:
        for chunk in pic:
            fobj.write(chunk)

So the check if the request was ok, is also useful in the case of HTML, therefore I do not count this line ;) You open a file using a contextmanager in binary write mode. The tricky part is that you write the file in chunks, which is possible due to the streaming. But if you are using Python >= 3.4 you can also use the pathlib and save the file directly in binary format:

import pathlib
name.write_bytes(pic.content)

To get the name out of your request (you probably know it beforehand) you can use Python 3.4's pathlib (or os.path as Python < 3.4 user) some properties of you request object:

import pathlib
name = pathlib.Path(pic.request.path_url).name

or for Python < 3.4

from os import path
path.basename(pic.request.path.url)

links

social