CSE Department, KSIT
CAOS – Indian Institute of Science
The transport layer TCP in a network stack provides end-to-end reliable delivery to the application layer. A reliable connection implies that data will be delivered in-order, error-free and without loss or data duplication. Developers will occasionally face situations where TCP connections do not work as expected. Rather than just blaming “the network” for some unspecified fault and hoping that the problem will resolve itself, a developer with a clear understanding of the low-level details of TCP connections will be able to accurately diagnose the trouble. In this article, we delve into the three phases when a client establishes a TCP connection with a server: connection setup, data transfer and tear down. In each of these phases, the client and server exchange messages, and the TCP connection transitions through different states. We will explain the conditions that define each of these TCP states, how the connection transitions from one state to another, and the implications for debugging when issues arise during connection setup.
Internet usage today is dominated by web browsers and mobile apps accessing online content. HTTP  is the underlying protocol used by browsers, and most mobile apps also make use of REST (Representational State Transfer)/HTTP as well as other protocols. All such communications make use of TCP  as the underlying protocol. TCP provides end-to-end reliable delivery between applications on two machines (end points of communications), located anywhere in the world. For example, when a user types
google.com to search, the web browser establishes a TCP connection between the user’s browser and one of Google’s servers.
TCP at the transport layer is implemented only in end-systems. Intermediate internet routers do not implement TCP and are “unaware” of TCP data packets unless they make use of Network Address Port Translation (NAPT). Reliable delivery of TCP implies that whenever it delivers data to the application, the delivered data is uncorrupted, in-order, without any packet loss or duplicates. A common belief is that TCP reliability also implies guaranteed delivery – it is important to note that this is a misconception. For example, when an application (browser) hands over the data to the transport layer at the sending side and the transport layer acknowledges its successful receipt (e.g., TCP send() returns success to the application), it only means that if the transport layer successfully delivers the received data to the other end of the communication, it will be in-order, error-free and without any loss or duplication. If TCP is unable to deliver the data to the other end (e.g., if the underlying network connecting the two machines breaks down and there is no alternative route available), this does not violate the definition of TCP reliability. TCP is also characterized as a streaming service, which implies that it treats the entire input data from the sender as a stream of bytes .
TCP communication involves three phases: (1) connection setup, (2) data transfer, and (3) tear down. For each TCP connection, the two machines communicating via this connection independently maintain information about the present state of this connection from their own perspective (as explained below).
Each TCP connection can be in one of the 11 possible states listed in Table 1. When the TCP connection operates normally (i.e., there is no packet loss or corruption), the machines at the two ends correctly update the state of the TCP connection, and an application developer or someone maintaining network operations does not need to be aware of the state-level details. However, if the network encounters abnormalities such as link breakdown or intermediate router crashes that result in packet loss and/or retransmission in Phases 1 or 31 , an understanding of these details is critical to diagnose the problem and resolve it.
Table 1: TCP Connection state in various phase of communication
The key purpose of this article is to provide this understanding. In the experience of the authors, most developers are only familiar with the network programming interfaces (known as socket() programming) provided by the TCP stack for their chosen programming language. When such developers encounter abnormalities as noted above, they are often unable to resolve such issues. A naïve yet common “solution” is to close (discard) the current connection and start afresh with a new connection, in the hope that the issue does not recur. Apart from being wasteful (in terms of bandwidth and computing resources) in the near-term, this approach leaves the problem unresolved and can prove costly for enterprises in the long term. This article will focus on abnormalities that occur during Phase 1, and we will defer the discussion related to Phase 2 and Phase 3 to the next article.
1Such abnormalities have limited impact in Phase 2. If they occur, packets will be retransmitted and acknowledged at some point in time later, but the TCP connection state remains in the ESTABLISHED throughout this phase