Understanding the HTTP Protocol
In this first article, we present experiments that allow students to gain a deeper understanding of the basic HTTP protocol, which enables a web client to talk to web server and retrieve web pages. This operation forms the basis of almost every internet-based interaction, and most popular textbooks (e.g., Kurose and Ross , Peterson and Davie , Forouzan ) introduce the topic with illustrative HTTP requests and responses, such as shown below (Table1 and Table2). All these textbooks clearly specify the structure of both the request and response messages, but only  gently encourages students to dissect the workings of the protocol. We now describe experiments that help students to explore how HTTP headers affect the application behavior in significantly greater detail, to improve their grasp of the concepts. Students who fail to understand these concepts beyond the level of definitions struggle to make efficient use of the HTTP protocol when they develop network applications.The experiments described below require just two machines (laptops connected via a simple network). This could be two laptops connected via a Wi-Fi access point, or directly with an Ethernet cable.
Table1: HTTP request message
Table2: HTTP response message
Before defining the experiments and describing how they are conducted, let us briefly review the components of a web page and the HTTP protocol . A web page consists of HTML content together with embedded objects, such as images, videos etc. The client communicates with a web server by sending a HTTP request message and the server replies with the response message. The request message contains the URL of the web page. The URL  consists of two parts: the hostname of the server where the web page resides, and the path of web page object on the server. For example, the URL http://myweb.com/welcome.html has myweb.com as the hostname and welcome.html as the path of the web page on the server.
An example of an HTTP request message is shown in Table 1. The first line (called as the request line) has three parts: HTTP method, URL, and the protocol version (HTTP/1.1). The request line is followed by number of request headers, each header on a separate line. There is no request data in this request message. An example of corresponding HTTP Repsonse message is shown in Table 2. The first line of the response (called as the status line) consists of 3 parts: Protocol version, the status code (200), and a description of the status code (OK). The status line is followed by response headers, which are followed by an empty line and then the response data (i.e. the contents of the web page). The status code provides information to the client on how to interpret the response, and belongs to one of the following five categories:
- 1xx: Informational
- 2xx: Success
- 3xx: User action needed
- 4xx: Error on user part
- 5xx: Error on server part
The status code consists of 3 digits, first digit identifies the category and remaining two digits (xx) correspond to specific response in the category. The category 1xx indicates that request has been received and is being processed. The category 2xx indicates that request is successfully processed and the required contents are part of the response. The most common status value used is 200 implying success, i.e. the complete requested content has been sent. The category 3xx indicates that client needs to take additional action for the request to be completed. This implies that the request cannot be served in its current form. The most common values in this category are 301 and 302, which informs the client that current URL has been changed to a new URL, and that the client needs to make a new request to using this new URL. The category 4xx implies that client request has an error which cannot be served. Hence, the client needs to correct the error. The most common value in this category is 404, which corresponds to invalid URL (i.e., the content corresponding to the specified path does not exist). The last category 5xx informs the client that the server cannot serve the request because something unusual happened at the server side while serving the request. The most common error in this category is 500 which corresponds to Internal Server Error. In this article, we will explain simple experiments where students trigger and observe status codes 200, 400, 403, and 404.
Figure 1: Basic setup for understanding the HTTP protocol
Currently, the most commonly used HTTP protocol version in use is 1.1, and thus we focus on the working of the HTTP protocol for version 1.1. In our experimental learning, we use the setup shown in Figure 1, consisting of one client and one server. In our setup, both the web server runs Ubuntu OS with the Apache web server  software. The client is also an Ubuntu system, but Windows and Mac based systems can also be used for the client.
Suppose the IP address of our server is 10.1.1.1 and that of our client is 10.1.1.101. To access our server with the name myweb.com while avoiding DNS-related issues, we simply add the entry “10.1.1.1 myweb.com” in the /etc/hosts file of the client. We assume that the Apache web server’s root directory (DocumentRoot, from where web pages are served) is defined as var/www/html. In this directory, create a simple HTML page named welcome.html (as shown in Table 3).
Table 3: A simple HTML web page