Home > 2issue > Experiential Learning of Networking Technologies- HTTP Protocol Mechanisms for High Performance Applications

Experiential Learning of Networking Technologies- HTTP Protocol Mechanisms for High Performance Applications

Cache Control and Content Caching

The webpage pictures.html that we have defined has only 10 images embedded in it. A typical web page contains many more images [9], and several of these images may be common across multiple pages associated with a particular website. When multiple such web pages from the same website are accessed, the browser accesses the common images repeatedly from the same URL. Images can be large, so downloading the same image repeatedly not only wastes network bandwidth and computing resources for both the client and the server, it also delays the display of the web page. To overcome these inefficiencies, images are typically cached by browsers. However, before a browser can use an image from its cache, it must ensure that this image has not changed on the web server since the time it was cached.If the image has changed, the browser must replace the outdated image in its cache with a freshly downloaded version of the image. However, if the image has not changed, the browser can display the content from its cache rather than fetching it again from the web server, significantly improving the web page display performance.

The HTTP header Cache-Control is the key header that controls how cached content should be used by the browser and when this content expires (i.e., when it needs to be refreshed). One of the fields of this header is max-age, which defines the maximum cache validity duration (in seconds)9. The value associated with max-ageis relative to the time when the request is made. Once these many seconds have elapsed, the cache becomes invalid and needs to be refreshed. The Cache-Control header has several other fields. The field public indicates that any entity (e.g., a proxy server in the path between the client and the web server) can cache and use this content. In contrast, the field private indicates that this content can be cached only by the client, and should not be cached by any intermediary proxy server. The field no-cache indicates that the web client can cache this content (despite the name of this field), but this content must be revalidated with the web server before the browser renders it. The field no-store indicates that the content cannot be cached at all. A flowchart depicting the use of these Cache-Control fields is shown in Figure 8. Further details can be found in the following references:[6], [8],[10].

Figure 8: Processing of Cache-Control fields

In addition to Cache-Control, the HTTP headers relevant to understand caching areIf-Modified-Since and If-None-Match in the HTTP request, as well as ETag and Last-Modified in the HTTP response. The Last-Modified header should not be confused with the Date header in the response, which only indicates the date and time at which this response was sent by the web server. (It has no bearing on when the content was created or updated.) An example of an HTTP request with its accompanying headerswhen fetching an image is shown in Listing 2. The corresponding HTTP response with headers from the web server is shown in Listing 3. In this case, the HTTP status 304 Not Modified indicates that the browser should use the content from its cache. We will now understand how the browser and the web server collaborate to reach this decision.

Listing 2 : A typical HTTP Request header for a cached image

GET /img/img-09.jpg HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0
Accept: */*
Accept-Language: en-US,en;q=0.7,hi;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
If-Modified-Since: Sat, 05 Aug 2017 16:44:12 GMT
If-None-Match: “5009-5560452ce2725”
Cache-Control: max-age=0

Listing 3: The HTTP response from the web server for the cached image requested by the browser

HTTP/1.1 304 Not Modified
Date: Sun, 06 Aug 2017 10:12:36 GMT
Server: Apache/2.4.18 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=5, max=7
ETag: “5009-5560452ce2725”

First, consider the case when the browser requests the web server for some resource for the very first time. The web server’s response will contain the header Last-Modified followed by the date and time at which this resource was updated/modified on the web server. The browser stores this Last-Modified date while caching the contents of this resource. When the browser needs to reload this cached resource, it makes an HTTP request with the stored value of Last-Modified in the If-Modified-Since request header. The web server compares the date in the request header with the last modification date of the requested resource. If the latter date is earlier, the web server does not send the contents of the requested resource but instead sends the HTTP Response with status code value 304 (Not Modified) to inform the browser that it can safely render the resource from its cache.

Experimenting with Content Caching

To gain hands-on experience with caching, open the web developer tools in Firefox (Tools ->Web Developer ->Network). Enter the URL corresponding to a previously cached resource (e.g.,http://myweb.com/img/img-05.jpg). Under the Header column, select Raw headers to show both the request and response headers (as shown in Figure 9).














Figure 9: Request and Response headers when cached content is accessed

The request contains the header If-Modified-Since: Sat,12 Aug 2017 11:19:32 GMT. The web server responds with the HTTP Status code 304, so the browser displays the cached image. To see how this header changes when the cached content becomes outdated, change the modified date of this image on the web server (e.g.,sudo touch /var/www/html/img/img-05.jpg). When the page is refreshed, the web server recognizes that the document’s modified date is later than the value received in the request header If-Modified-Since. In this case, the web server responds with the full image content with status code 200 OK.
A complication arises when a resource on the web server is modified by replacing it with a version that was created earlier. In this case, the methodology described above would cause the browser to render the cached content, despite the modification made to the resource. To deal with such situations, the HTTP protocol provides the ETag header which is a unique signature associated with the resource. The browser stores the ETag value alongside the cached content, and sends this value in the request header. For example, the request in Figure 9 includes the header If-None-Match: “3e80-5568c9a9cc40e”. The web server’s response includes the same value in the ETag header. If the resource had been modified, the web server would assign it a different ETag, and would therefore response with the updated resource and a status code in the category {2xx}.

[9] A related HTTP header is Expires, which has the same meaning as the max-age field in the cache-control header. When both cache-control and Expires headers are present, the former takes precedence and the latter is ignored.


Leave a Comment:

Your email address will not be published. Required fields are marked *