Professor, CSE Dept, KSIT
Professor, CAOS, IISC
NOTE: Several readers have requested access to the source code described in these series of articles. We are delighted at this response from our readers, and we will be making all our code available for this article at this link: https://github.com/rprustagi/EL-Understanding-Web-Security.git. The code and example samples for earlier articles is going to be accessible at https://github.com/rprustagi/Experiential-Learning.git.
From its inception, the Internet has aimed to enable end-users to quickly and easily access information by interconnecting computers storing this information. The Internet’s design is thus based on an open-architecture: individual networks can be separately designed, but they are connected to other networks using open protocols. Initial implementations led by the academic fraternity and research labs focused on information exchange – security was less of a concern at that time. When web-based information dissemination (via news, email, etc.) took off in the 1990s, security concerns centered on malware detection and prevention. It was not until the advent of web-based commerce that security concerns became critical.
The first version of the web security protocol Secure Sockets Layer (SSL) was developed by Netscape in 1994, which aimed to provide secure communication between web servers and web browsers. Working at the Transport and Session layers of the OSI 7-layer model, SSL can support higher layer (application) protocols such as Hyper Text Transfer Protocol (HTTP)  , File Transfer Protocol (FTP)  , Telnet  , etc. The SSL protocol supports several encryption schemes, with provisions to support new encryption algorithms in the future. SSL subsequently evolved to Transport Layer Security (TLS)  , the current version of which is TLS v1.2  (TLS v1.3 in presently in the draft stage). When SSL/TLS is the underlying protocol used by HTTP for web communications, it is called HTTPS  .
It is important to understand the meaning of – and the difference between – three key terms related to security: authentication, confidentiality and integrity. As a concrete example, suppose person P uses HTTPS to access a website W to carry out an e-commerce transaction T. To begin with, person P would like to authenticate website W – i.e., P wants to be certain that his/her browser is communicating with website W, and not with some other website purporting to be W. To ensure this, HTTPS requires the web server to provide a website certificate containing the website name W, the validity period of the certificate and the identity of a well-known Certificate Authority (CA) which has digitally signed this certificate. Modern browsers are preconfigured to accept web certificates issued by trusted CAs such as Verisign, Thawte, GoDaddy, etc. and their sub-agents. After accepting the certificate, person P’s browser verifies that the certificate is still valid and that the website name in the certificate is identical to what person P entered in the browser. If either of these conditions fails, the browser warns person P that it cannot authenticate website W.
Next, P would like to ensure that sensitive data associated with transaction T (password, bank details, etc.) remains confidential – i.e., the raw data should not be accessible to any third party (such as others in person P’s local network, IT staff, or intermediate nodes in the network) while it is being transmitted to W. TLS enables W and P’s browser to create a common session key, which is then used to encrypt and transmit all messages in that session. Even if this data is captured, it cannot be decrypted (except by brute force over an infeasible time-span of several million years using the fastest supercomputers available). Lastly, P would like to ensure the integrity of the data transmitted to and from W – i.e., it should not be possible for any third party to tamper with the data. Once again, HTTPS ensures this.
At this point, you may believe that HTTPS is secure against all types of attacks, such as Man In The Middle (MITM)  . In fact, security can be compromised in several ways, and we shall see that MITM attacks are possible. Some vulnerabilities arise due to bugs in the implementation of TLS and HTTPS, but these are typically exploitable by experts only for a limited time – flaws can be quickly patched once they are found. The vulnerabilities we describe below are much easier to exploit and are often hard to eradicate, making them particularly worrisome.
We will not cover general web attacks such as Cross Site Scripting, Injection attacks, Distributed Denial of Service attacks, etc. in this article as these are not specific to HTTPS.
Unfortunately, many sites serve content using HTTPS but do not deploy SSL certificates correctly. A website certificate can be invalid for several reasons: it may have expired, it may refer to an outdated website name (the website’s name may have changed after the certificate was deployed), a certificate for a website domain may be used for subdomains, or the browser may consider the certificate’s CA untrustworthy (developers often use self-signed certificates while testing a website, but these will be rejected by users’ browsers). Obtaining a certificate from a well-known authority can be expensive, and it requires additional procedural work which also adds to the cost.
A common vulnerability relates to the way in which most web client browsers handle these invalid website certificates: they display “invalid certificate” warnings and typically offer an easy option to ignore these warnings (known as a “click-through option”). Attackers count on the fact that many users will take this easy option and will therefore be compromised.
An additional challenge arises when a user behind a proxy connects to a website using HTTPS. When the user enters the URL, the local router/firewall will redirect the request to the proxy server which will respond with an authentication web page. Next, when the proxy server establishes an SSL connection and responds with the web page, the SSL certificate will contain the website name of proxy server (which will be different from the website entered by the user). The browser will detect this discrepancy and will display an “invalid certificate” warning.
In this case, careful users may avoid proceeding further even though the website is legitimate. The proxy server can also be setup to simply reject the connection. This will result in a time out, which the user will perceive as a broken network connection. Since both these options result in a poor experience for the user, the network can allow direct access to websites on HTTPS without the proxy server. However, this could lead to security loopholes and will likely violate the security policies of the local network. Thus, most network setups with proxy servers require that users start browsing with HTTP and websites cannot stop supporting HTTP-based access.