PolarSPARC

Hands-on with Python Requests


Bhaskar S 07/03/2021


Overview

requests is an elegant, human-friendly, and popular Python library for making HTTP requests.


Installation

Installation is assumed to be Linux desktop running Ubuntu 20.04 LTS. To install the requests Python module, open a terminal window and execute the following command:

$ pip3 install requests

On successful installation, we should be ready to start using Python requests.


Hands-on Python requests

The following is a simple Python script that makes a GET request to the Hacker News site:


first.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'http://news.ycombinator.com/newest'
    logging.info('URL to GET: %s' % url)
    res = requests.get(url)
    logging.info('Type of res: %s' % type(res))
    logging.info('URL: %s, Status code: %d, Content length: %d' % (res.url, res.status_code, len(res.content)))


if __name__ == '__main__':
    main()

Some aspects of the first.py from the above needs a little explanation.

To run the Python script first.py, execute the following command:

$ python3 first.py

The following would be a typical output:

Output.1

2021-07-03 11:05:57,422 - URL to GET: http://news.ycombinator.com/newest
2021-07-03 11:05:57,967 - Type of res: <class 'requests.models.Response'>
2021-07-03 11:05:57,967 - URL: https://news.ycombinator.com/newest, Status code: 200, Content length: 41978

Interesting part - the target URL is 'https://news.ycombinator.com/newest' vs 'http://news.ycombinator.com/newest'.

The illustration below indicates the request made from a chrome browser with the developer tools ON:

Request from Browser
Figure.1

As is evident from the illustration above, there is a HTTP redirection (301) involved.

Python requests by default performs location redirec tions of all the HTTP verbs, except for the HTTP HEAD request.

The following is a simple Python script that makes the same GET request to the Hacker News site and shows the redirection:


second.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'http://news.ycombinator.com/newest'
    logging.info('URL to GET: %s' % url)
    res = requests.get(url)
    logging.info('Unicode content size: %d, Encoding: %s, Headers: %s' % (len(res.text), res.encoding, res.headers))
    if res.history:
        for his in res.history:
            logging.info('History: status: %d, headers: %s' % (his.status_code, his.headers))


if __name__ == '__main__':
    main()

Some aspects of the second.py from the above needs a little explanation.

To run the Python script second.py, execute the following command:

$ python3 second.py

The following would be a typical output:

Output.2

2021-07-03 12:08:01,099 - URL to GET: http://news.ycombinator.com/newest
2021-07-03 12:08:01,654 - Unicode content size: 41126, Encoding: utf-8, Headers: {'Server': 'nginx', 'Date': 'Sat, 03 Jul 2021 16:08:01 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Cache-Control': 'private; max-age=0', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'origin', 'Strict-Transport-Security': 'max-age=31556900', 'Content-Security-Policy': "default-src 'self'; script-src 'self' 'unsafe-inline' https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://cdnjs.cloudflare.com/; frame-src 'self' https://www.google.com/recaptcha/; style-src 'self' 'unsafe-inline'", 'Content-Encoding': 'gzip'}
2021-07-03 12:08:01,654 - History: status: 301, headers: {'Server': 'nginx', 'Date': 'Sat, 03 Jul 2021 16:08:01 GMT', 'Content-Type': 'text/html', 'Content-Length': '178', 'Connection': 'keep-alive', 'Location': 'https://news.ycombinator.com/newest'}

To disable the default behavior of redirection handling, the following is a simple Python script that makes the same GET request to the Hacker News site:


third.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'http://news.ycombinator.com/newest'
    logging.info('URL to GET: %s' % url)
    res = requests.get(url, allow_redirects=False)
    logging.info('Unicode content: %s' % res.text)
    logging.info('Status code: %d, Location: %s' % (res.status_code, res.headers['Location']))
    if res.history:
        for his in res.history:
            logging.info('History: status: %d, headers: %s' % (his.status_code, his.headers))


if __name__ == '__main__':
    main()

Some aspects of the third.py from the above needs a little explanation.

To run the Python script third.py, execute the following command:

$ python3 third.py

The following would be a typical output:

Output.3

2021-07-03 13:36:37,936 - URL to GET: http://news.ycombinator.com/newest
2021-07-03 13:36:38,134 - Unicode content: <html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

2021-07-03 13:36:38,134 - Status code: 301, Location: https://news.ycombinator.com/newest

Until now, we have been exploring the HTTP GET method. The other commonly used methods are POST, PUT, and DELETE. In the following simple Python script, we demonstrate the use of these common HTTP methods by making requests to the simple HTTP request/response site https://httpbin.org:


fourth.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def http_get():
    url = 'https://httpbin.org/get'
    logging.info('[GET] URL: %s' % url)
    res = requests.get(url)
    logging.info('[GET] Status code: %d' % res.status_code)
    logging.info('[GET] Content: %s' % res.text)
    logging.info('[GET] Headers: %s' % res.headers)


def http_post():
    url = 'https://httpbin.org/post'
    payload = {'abc': '123', 'def': '456'}
    logging.info('[POST] URL: %s' % url)
    res = requests.post(url, data=payload)
    logging.info('[POST] Status code: %d' % res.status_code)
    logging.info('[POST] Content: %s' % res.text)
    logging.info('[POST] Headers: %s' % res.headers)


def http_put():
    url = 'https://httpbin.org/put'
    payload = {'abc': '789'}
    logging.info('[PUT] URL: %s' % url)
    res = requests.put(url, data=payload)
    logging.info('[PUT] Status code: %d' % res.status_code)
    logging.info('[PUT] Content: %s' % res.text)
    logging.info('[PUT] Headers: %s' % res.headers)


def http_delete():
    url = 'https://httpbin.org/delete'
    logging.info('[DELETE] URL: %s' % url)
    res = requests.delete(url)
    logging.info('[DELETE] Status code: %d' % res.status_code)
    logging.info('[DELETE] Content: %s' % res.text)
    logging.info('[DELETE] Headers: %s' % res.headers)


if __name__ == '__main__':
    http_get()
    http_post()
    http_put()
    http_delete()

Some aspects of the fourth.py from the above needs a little explanation.

To run the Python script fourth.py, execute the following command:

$ python3 fourth.py

The following would be a typical output:

Output.4

2021-07-03 14:20:11,307 - [GET] URL: https://httpbin.org/get
2021-07-03 14:20:11,426 - [GET] Status code: 200
2021-07-03 14:20:11,427 - [GET] Content: {
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-60e0aa5b-38ffcad8663ebd196c350476"
  }, 
  "origin": "173.71.122.117", 
  "url": "https://httpbin.org/get"
}

2021-07-03 14:20:11,427 - [GET] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '308', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
2021-07-03 14:20:11,427 - [POST] URL: https://httpbin.org/post
2021-07-03 14:20:11,557 - [POST] Status code: 200
2021-07-03 14:20:11,558 - [POST] Content: {
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "abc": "123", 
    "def": "456"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "15", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-60e0aa5b-7ceaddef2d5d20555ac9f775"
  }, 
  "json": null, 
  "origin": "173.71.122.117", 
  "url": "https://httpbin.org/post"
}

2021-07-03 14:20:11,558 - [POST] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '498', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
2021-07-03 14:20:11,558 - [PUT] URL: https://httpbin.org/put
2021-07-03 14:20:11,677 - [PUT] Status code: 200
2021-07-03 14:20:11,677 - [PUT] Content: {
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "abc": "789"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "7", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-60e0aa5b-7d8ec6be57aaee1e3f250932"
  }, 
  "json": null, 
  "origin": "173.71.122.117", 
  "url": "https://httpbin.org/put"
}

2021-07-03 14:20:11,677 - [PUT] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '477', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
2021-07-03 14:20:11,677 - [DELETE] URL: https://httpbin.org/delete
2021-07-03 14:20:11,790 - [DELETE] Status code: 200
2021-07-03 14:20:11,791 - [DELETE] Content: {
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "0", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-60e0aa5b-5bf7b2721da78f1344ff15b6"
  }, 
  "json": null, 
  "origin": "173.71.122.117", 
  "url": "https://httpbin.org/delete"
}

2021-07-03 14:20:11,791 - [DELETE] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '402', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}/pre>
    

Often times we have a need to interact with HTTP based REST API services using GET (to query for resources), POST (to create a new resource), PUT (to update an existing resource), or DELETE (to delete a resource). The REST services typically take a JSON payload and respond with a JSON payload. In the following simple Python script, we demonstrate the POST and PUT methods by making API requests to the fake JSON API service at https://jsonplaceholder.typicode.com/:


fifth.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def api_post():
    url = 'https://jsonplaceholder.typicode.com/posts'
    headers = {'Content-Type': 'application/json'}
    json = {'title': 'Learning Python, 5th',
            'body': 'An in-depth introductory Python book',
            'userId': 3}
    logging.info('[POST] URL: %s' % url)
    res = requests.post(url, headers=headers, json=json)
    logging.info('[POST] Status code: %d' % res.status_code)
    logging.info('[POST] Response: %s' % res.json())


def api_put():
    url = 'https://jsonplaceholder.typicode.com/posts/1'
    headers = {'Content-Type': 'application/json'}
    json = {'id': 101,
            'title': 'Learning Python, 5th',
            'body': 'A comprehensive, in-depth introduction to the core Python language',
            'userId': 3}
    logging.info('[PUT] URL: %s' % url)
    res = requests.put(url, headers=headers, json=json)
    logging.info('[PUT] Status code: %d' % res.status_code)
    logging.info('[PUT] Response: %s' % res.json())


if __name__ == '__main__':
    api_post()
    api_put()

Some aspects of the fifth.py from the above needs a little explanation.

To run the Python script fifth.py, execute the following command:

$ python3 fifth.py

The following would be a typical output:

Output.5

2021-07-03 16:46:25,555 - [POST] URL: https://jsonplaceholder.typicode.com/posts
2021-07-03 16:46:25,658 - [POST] Status code: 201
2021-07-03 16:46:25,659 - [POST] Response: {'title': 'Learning Python, 5th', 'body': 'An in-depth introductory Python book', 'userId': 3, 'id': 101}
2021-07-03 16:46:25,659 - [PUT] URL: https://jsonplaceholder.typicode.com/posts/1
2021-07-03 16:46:25,756 - [PUT] Status code: 200
2021-07-03 16:46:25,757 - [PUT] Response: {'id': 1, 'title': 'Learning Python, 5th', 'body': 'A comprehensive, in-depth introduction to the core Python language', 'userId': 3}

The following is a simple Python script that makes a GET request to the PolarSPARC site:


sixth.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'https://www.polarsparc.com/'
    logging.info('URL to GET: %s' % url)
    res = requests.get(url)
    logging.info('URL: %s, Status code: %d, Content: %s' % (res.url, res.status_code, res.content))


if __name__ == '__main__':
    main()

To run the Python script sixth.py, execute the following command:

$ python3 sixth.py

The following would be a typical output:

Output.6

2021-07-03 20:35:52,452 - URL to GET: https://www.polarsparc.com/
2021-07-03 20:35:52,701 - URL: https://www.polarsparc.com/, Status code: 406, Content: b'Not Acceptable!

Not Acceptable!

An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.

'

The status code 406 means that server is expecting the User-Agent header.

The following is the Python script with the fix to make a GET request to the PolarSPARC site:


seventh.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'https://www.polarsparc.com/'
    logging.info('URL to GET: %s' % url)
    headers = {'User-Agent': 'python'}
    res = requests.get(url, headers=headers)
    logging.info('URL: %s, Status code: %d, Content: %s' % (res.url, res.status_code, res.content))


if __name__ == '__main__':
    main()

Shifting gears, there are certain sites that require one to authenticate before they grant access to the content. One of the simplest authentication methods is the Basic Authentication mechanism, in which one must provide a user-id and a password to gain access to the site.

The following is the Python script that makes a GET request to a basic authentication protected link hosted on the site https://httpbin.org:


eighth.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'https://httpbin.org/basic-auth/admin/S3cr3t'
    logging.info('URL to GET: %s' % url)
    res = requests.get(url)
    logging.info('Status code: %d, Headers: %s' % (res.status_code, res.headers))


if __name__ == '__main__':
    main()

To run the Python script eighth.py, execute the following command:

$ python3 eighth.py

The following would be a typical output:

Output.7

2021-07-03 21:35:29,585 - URL to GET: https://httpbin.org/basic-auth/admin/S3cr3t
2021-07-03 21:35:29,721 - Status code: 401, Headers: {'Date': 'Sun, 04 Jul 2021 01:35:29 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'WWW-Authenticate': 'Basic realm="Fake Realm"', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

The status code 401 means that server is denying access to the specified URL and is expecting basic authentication, which is indicated by the response header WWW-Authenticate.

The following is the Python script with the fix to make a GET request to the basic authentication protected link hosted on the site https://httpbin.org:


nineth.py
#
# @Author: Bhaskar S
# @Blog:   https://www.polarsparc.com
# @Date:   03 Jul 2021
#

import logging
import requests
from requests.auth import HTTPBasicAuth

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)


def main():
    url = 'https://httpbin.org/basic-auth/admin/S3cr3t'
    logging.info('URL to GET: %s' % url)
    res = requests.get(url, auth=HTTPBasicAuth('admin', 'S3cr3t'))
    logging.info('Status code: %d, Headers: %s' % (res.status_code, res.headers))


if __name__ == '__main__':
    main()

Some aspects of the nineth.py from the above needs a little explanation.


References

Requests: HTTP for Humans


© PolarSPARC