Understanding HyperText Transfer Protocol (HTTP)


Bhaskar S 02/15/2009


We live in an Internet age where the World Wide Web (WWW) is part of our lives. Have you ever wondered how the Web works ? What happens when you type a web address on your browser and press key to go ? At a very high-level, the Web Browser is requesting for information from the Web Server corresponding to the web address and the Web Server responds by sending the appropriate information. Behind the scenes, the Web Browser and the Web Server are communicating with each other using HyperText Transfer Protocol (HTTP for short).

The HTTP protocol is an application level request-response like communication protocol of the Web. When we type a web address like “www.polarsparc.com” in our Web Browser, it is basically making a request to the Web Server for the web content of PolarSPARC. The Web Server sends the web content back as a response to the Web Browser. The Web Browser renders the content on the browser. The following diagram illustrates this scenario:

HTTP Client and Server
Figure.1

For convenience, we will refer to the Web Browser as the Client and the Web Server as the Server.

The Web Server is a container for various web resources such as, texts, images, videos, etc. Every web resource on a Web Server can be identified uniquely using a Uniform Resource Identifier (or URI for short). When a URI is used as an address to locate a web resource on a Web Server, it is known as a Uniform Resource Locator (or URL for short). For example, to locate the original HTTP protocol specification from the Internet Engineering Task Force (IETF) website, we would use the URL http://www.rfc-editor.org/rfc/rfc1945.txt.

The format of any URI is as follows:

protocol://[user-id:password@]host[:port][/resource-path][?query-string]#[fragment]

where,

For example, if we once again look at the URI http://www.rfc-editor.org/rfc/rfc1945.txt, the protocol is http, the host is www.rfc-editor.org, the port is not specified and hence defaults to 80, and the resource-path is /rfc/rfc1945.txt.

It is clear that the Web Browser is the one that initiates action by sending a HTTP request to the Web Server. How does the HTTP request message look like ? A HTTP request has the following structure:

Listing.1
<HTTP Method> <URI> <HTTP Version> <CR><LF>
<Request Header Name-1><:> <Request Header Value-1> <CR><LF>
<Request Header Name-2><:> <Request Header Value-2> <CR><LF>
...<CR><LF>
...<CR><LF>
<Request Header Name-n><:> <Request Header Value-n> <CR><LF>
<CR><LF>
[Optional Body of Data]

<CR><LF> represents Carriage Return and Line Feed.

<URI> is the resource path.

<HTTP Version> can be HTTP/1.1 for the current version or HTTP/1.0 for the older version of the HTTP protocol.

<HTTP Method> can be one of the following: GET, HEAD, POST, PUT, DELETE, OPTIONS, or TRACE. These are sometimes referred to as HTTP Verbs. The following table describes each of the HTTP methods:

HTTP Method Description
GET This is one of the most common method. It is used to request the server for a resource
HEAD This is similar to the GET method except that the server returns meta information about the requested resource and not the actual resource. This method can be used to check if a resource is present or not or to determine the type of the resource
POST This is the other most used method. It is used to send the [Optional Body of Data] to the server. This method is used when the data in the user form(s) needs to be submitted to the server for further processing
PUT This method is used to create a new resource on the server whose path is provided by <URI> and whose content is [Optional Body of Data]
DELETE This method is used to tell the server to delete the resource specified by <URI>. The server can override this request from a client and not delete the resource
OPTIONS This method is used to get the various features supported by the server or the features supported for the given resource
TRACE This method is usually used to discover information about the various intermediaries between the client and the server, namely, proxy, firewall, etc. This method is useful for diagnostics for a loop back response from the server. Most of the web servers disable this option due to security vulnerabilities

<Request Header Name><:><Request Header Value> indicates the HTTP request header(s) that the client can send to the server. The following table list and describes the most commonly used request headers from the client:

Request Header Description
Accept This header indicates the types of data the client can handle. Here are some examples:
Accept: */* -- Indicates it can accept any type of data response
Accept: text/html – Indicates it can only accept html data response
Accept-Encoding This header indicates the type of data encoding (compression types) the client can handle. Here is an example:
Accept-Encoding: gzip – Indicates it can accept gzipped data
Authorization This header indicates the user authentication credentials to the server. The most common type is the Basic Auth. Here is an example:
Authorization: Basic STa59v3wUPNb
Host This header indicates the server and port from where the resource <URI> is being requested. Here is an example:
Host: www.polarsparc.com
If-Modified-Since This header indicates to the server to send the content for the requested resource <URI> only if has been modified since the last time it was requested. Here is an example:
If-Modified-Since: Sat, 21 Feb 2009 16:50:00 GMT
User-Agent This request header indicates the client from where the request is sent. Here is an example:
User-Agent: Mozilla/5.0 (X11; Linux i686, en-US) Gecko/20090209

When a Web Browser sends a HTTP request, the Web Server has to send a HTTP response. How does the HTTP response message look like ? A HTTP response has the following structure:

Listing.2
<HTTP Version> <Status Code> <Reason Phrase> <CR><LF>
<Response Header Name-1><:> <Response Header Value-1> <CR><LF>
<Response Header Name-2><:> <Response Header Value-2> <CR><LF>
...<CR><LF>
...<CR><LF>
<Response Header Name-n><:> <Response Header Value-n> <CR><LF>
<CR><LF>
[Optional Resource Content]

<Status Code> indicates the result code of processing a client request. <Reason Phrase> is the human readable interpretation of the result code. The following table describes some of the common codes and phrases:

Status Code Reason Phrase Description
100 Continue Indicates that the server received the request and the client should continue and expect a final response from the server
200 OK Indicates the client request was processed successfully
201 Created Indicates that the resource corresponding to HTTP PUT was successfully created
301 Moved Permanently Indicates the resource indicated by <URI> has been moved to a new location and hence has a new resource path
304 Not Modified Indicates that the content has not changed since the last time it was requested by the client
400 Bad Request Indicates that the client request is not correct
401 Unauthorized Indicates that the client needs to authenticate before accessing the resource
404 Not Found Indicates that the resource indicated by <URI> was not found
500 Internal Server Error Indicates an internal error on the server

<Response Header Name><:><Response Header Value> indicates the HTTP response header(s) that the server sends to the client. The following table lists and describes the most commonly used response headers from the server:

Response Header Description
Content-Type This header indicates the type of [Optional Resource Content]. Here are some examples:
Content-Type: text/xml -- Indicates the type of resource content in the response to be XML
Content-Encoding This header indicates the type of resource content encoding (compression types). Here is an example:
Content-Encoding: gzip – Indicates that the resource content is gzipped
Content-Length This header indicates length of the resource content in bytes. Here is an example:
Content-Length: 1245
Expires This header indicates the time after which the resource content is to be considered stale. This is basically to invalidate any cache. Here is an example:
Expires: Sat, 21 Feb 2009 16:30:00 GMT
Last-Modified This header indicates the last time the resource content was modified. Here is an example:
Last-Modified: Sat, 21 Feb 2009 16:50:00 GMT
Location This request header indicates a redirect and provides a new <URI> the client needs to use for the requested resource. Here is an example:
Location: www.polarsparc.com/unknown/location/res_one

Having explored both the HTTP request and the HTTP response messages, the following diagram illustrates the web interaction as follows:

HTTP Request and Response
Figure.2

With this we have covered the basics of the HTTP request-response protocol.

We can try some of the HTTP requests using the telnet command to see HTTP in action. Open a new terminal and type the following HTTP request:

$ telnet www.yahoo.com 80


Input.1

GET / HTTP/1.1
Accept: text/html
Host: www.yahoo.com
User-Agent: telnet
<Press Enter twice>

The following is the HTTP response from www.yahoo.com:

Output.1

HTTP/1.1 200 OK 
Date: Sun, 22 Feb 2009 03:46:12 GMT
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"
Cache-Control: private
Vary: User-Agent
X-XRDS-Location: http://open.login.yahooapis.com/openid20/www.yahoo.com/xrds
Last-Modified: Sun, 22 Feb 2009 03:04:24 GMT
Accept-Ranges: bytes
Content-Length: 9490
Connection: close
Content-Type: text/html; charset=utf-8

[<html content here>]

<!-- pbt 1235271720 -->Connection closed by foreign host.

Lets try another HTTP request as follows:

$ telnet www.w3c.org 80


Input.2

GET / HTTP/1.1
Accept: text/html
Host: www.w3c.org
User-Agent: telnet
<Press Enter twice>

The following is the HTTP response from www.w3c.org:

Output.2

HTTP/1.1 301 Moved Permanently 
Date: Sun, 22 Feb 2009 03:43:07 GMT
Server: Apache/2
Location: http://www.w3.org/
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.w3.org/">here</a>.</p>
</body></html>
Connection closed by foreign host.

This should give you an idea of how the HTTP protocol works !!!