Network Working Group R. Braden Request for Comments: 1644 ISI Category: Experimental July 1994 T/TCP -- TCP Extensions for Transactions Functional Specification Status of this Memo This memo describes an Experimental Protocol for the Internet community, and requests discussion and suggestions for improvements. It does not specify an Internet Standard. Distribution is unlimited. Abstract This memo specifies T/TCP, an experimental TCP extension for efficient transaction-oriented (request/response) service. This backwards-compatible extension could fill the gap between the current connection-oriented TCP and the datagram-based UDP. This work was supported in part by the National Science Foundation under Grant Number NCR-8922231. Table of Contents 1. INTRODUCTION .................................................. 2 2. OVERVIEW ..................................................... 3 2.1 Bypassing the Three-Way Handshake ........................ 4 2.2 Transaction Sequences .................................... 6 2.3 Protocol Correctness ..................................... 8 2.4 Truncating TIME-WAIT State ............................... 12 2.5 Transition to Standard TCP Operation ..................... 14 3. FUNCTIONAL SPECIFICATION ..................................... 17 3.1 Data Structures .......................................... 17 3.2 New TCP Options .......................................... 17 3.3 Connection States ........................................ 19 3.4 T/TCP Processing Rules ................................... 25 3.5 User Interface ........................................... 28 4. IMPLEMENTATION ISSUES ........................................ 30 4.1 RFC-1323 Extensions ...................................... 30 4.2 Minimal Packet Sequence .................................. 31 4.3 RTT Measurement .......................................... 31 4.4 Cache Implementation ..................................... 32 4.5 CPU Performance .......................................... 32 4.6 Pre-SYN Queue ............................................ 33 6. ACKNOWLEDGMENTS .............................................. 34 7. REFERENCES ................................................... 34 APPENDIX A. ALGORITHM SUMMARY ................................... 35 Braden [Page 1] RFC 1644 Transaction/TCP July 1994 Security Considerations .......................................... 38 Author's Address ................................................. 38 1. INTRODUCTION TCP was designed to around the virtual circuit model, to support streaming of data. Another common mode of communication is a client-server interaction, a request message followed by a response message. The request/response paradigm is used by application-layer protocols that implement transaction processing or remote procedure calls, as well as by a number of network control and management protocols (e.g., DNS and SNMP). Currently, many Internet user programs that need request/response communication use UDP, and when they require transport protocol functions such as reliable delivery they must effectively build their own private transport protocol at the application layer. Request/response, or "transaction-oriented", communication has the following features: (a) The fundamental interaction is a request followed by a response. (b) An explicit open or close phase may impose excessive overhead. (c) At-most-once semantics is required; that is, a transaction must not be "replayed" as the result of a duplicate request packet. (d) The minimum transaction latency for a client should be RTT + SPT, where RTT is the round-trip time and SPT is the server processing time. (e) In favorable circumstances, a reliable request/response handshake should be achievable with exactly one packet in each direction. This memo concerns T/TCP, an backwards-compatible extension of TCP to provide efficient transaction-oriented service in addition to virtual-circuit service. T/TCP provides all the features listed above, except for (e); the minimum exchange for T/TCP is three segments. In this memo, we use the term "transaction" for an elementary request/response packet sequence. This is not intended to imply any of the semantics often associated with application-layer transaction processing, like 3-phase commits. It is expected that T/TCP can be used as the transport layer underlying such an application-layer service, but the semantics of T/TCP is limited to transport-layer services such as reliable, ordered delivery and at-most-once Braden [Page 2] RFC 1644 Transaction/TCP July 1994 operation. An earlier memo [RFC-1379] presented the concepts involved in T/TCP. However, the real-world usefulness of these ideas depends upon practical issues like implementation complexity and performance. To help explore these issues, this memo presents a functional specification for a particular embodiment of the ideas presented in RFC-1379. However, the specific algorithms in this memo represent a later evolution than RFC-1379. In particular, Appendix A in RFC-1379 explained the difficulties in truncating TIME-WAIT state. However, experience with an implementation of the RFC-1379 algorithms in a workstation later showed that accumulation of TCB's in TIME-WAIT state is an intolerable problem; this necessity led to a simple solution for truncating TIME-WAIT state, described in this memo. Section 2 introduces the T/TCP extensions, and section 3 contains the complete specification of T/TCP. Section 4 discusses some implementation issues, and Appendix A contains an algorithmic summary. This document assumes familiarity with the standard TCP specification [STD-007]. 2. OVERVIEW The TCP protocol is highly symmetric between the two ends of a connection. This symmetry is not lost in T/TCP; for example, T/TCP supports TCP's symmetric simultaneous open from both sides (Section 2.3 below). However, transaction sequences use T/TCP in a highly unsymmetrical manner. It is convenient to use the terms "client host" and "server host" for the host that initiates a connection and the host that responds, respectively. The goal of T/TCP is to allow each transaction, i.e., each request/response sequence, to be efficiently performed as a single incarnation of a TCP connection. Standard TCP imposes two performance problems for transaction-oriented communication. First, a TCP connection is opened with a "3-way handshake", which must complete successfully before data can be transferred. The 3-way handshake adds an extra RTT (round trip time) to the latency of a transaction. The second performance problem is that closing a TCP connection leaves one or both ends in TIME-WAIT state for a time 2*MSL, where MSL is the maximum segment lifetime (defined to be 120 seconds). TIME-WAIT state severely limits the rate of successive transactions between the same (host,port) pair, since a new incarnation of the connection cannot be opened until the TIME-WAIT delay expires. RFC- 1379 explained why the alternative approach, using a different user port for each transaction between a pair of hosts, also limits the Braden [Page 3] RFC 1644 Transaction/TCP July 1994 transaction rate: (1) the 16-bit port space limits the rate to 2**16/240 transactions per second, and (2) more practically, an excessive amount of kernel space would be occupied by TCP state blocks in TIME-WAIT state [RFC-1379]. T/TCP solves these two performance problems for transactions, by (1) bypassing the 3-way handshake (3WHS) and (2) shortening the delay in TIME-WAIT state. 2.1 Bypassing the Three-Way Handshake T/TCP introduces a 32-bit incarnation number, called a "connection count" (CC), that is carried in a TCP option in each segment. A distinct CC value is assigned to each direction of an open connection. A T/TCP implementation assigns monotonically increasing CC values to successive connections that it opens actively or passively. T/TCP uses the monotonic property of CC values in initial segments to bypass the 3WHS, using a mechanism that we call TCP Accelerated Open (TAO). Under the TAO mechanism, a host caches a small amount of state per remote host. Specifically, a T/TCP host that is acting as a server keeps a cache containing the last valid CC value that it has received from each different client host. If an initial segment (i.e., a segment containing a SYN bit but no ACK bit) from a particular client host carries a CC value larger than the corresponding cached value, the monotonic property of CC's ensures that the segment must be new and can therefore be accepted immediately. Otherwise, the server host does not know whether the segment is an old duplicate or was simply delivered out of order; it therefore executes a normal 3WHS to validate the . Thus, the TAO mechanism provides an optimization, with the normal TCP mechanism as a fallback. The CC value carried in non- segments is used to protect against old duplicate segments from earlier incarnations of the same connection (we call such segments 'antique duplicates' for short). In the case of short connections (e.g., transactions), these CC values allow TIME-WAIT state delay to be safely discuss in Section 2.3. T/TCP defines three new TCP options, each of which carries one 32-bit CC value. These options are named CC, CC.NEW, and CC.ECHO. The CC option is normally used; CC.NEW and CC.ECHO have special functions, as follows. Braden [Page 4] RFC 1644 Transaction/TCP July 1994 (a) CC.NEW Correctness of the TAO mechanism requires that clients generate monotonically increasing CC values for successive connection initiations. These values can be generated using a simple global counter. There are certain circumstances (discussed below in Section 2.2) when the client knows that monotonicity may be violated; in this case, it sends a CC.NEW rather than a CC option in the initial segment. Receiving a CC.NEW causes the server to invalidate its cache entry and do a 3WHS. (b) CC.ECHO When a server host sends a segment, it echoes the connection count from the initial in a CC.ECHO option, which is used by the client host to validate the segment. Figure 1 illustrates the TAO mechanism bypassing a 3WHS. The cached CC values, denoted by cache.CC[host], are shown on each side. The server host compares the new CC value x in segment #1 against x0, its cached value for client host A; this comparison is called the "TAO test". Since x > x0, the must be new and can be accepted immediately; the data in the segment can therefore be delivered to the user process B, and the cached value is updated. If the TAO test failed (x <= x0), the server host would do a normal three-way handshake to validate the segment, but the cache would not be updated. Braden [Page 5] RFC 1644 Transaction/TCP July 1994 TCP A (Client) TCP B (Server) _______________ ______________ cache.CC[A] V [ x0 ] #1 --> --> (TAO test OK (x > x0) => data1->user_B and cache.CC[A]= x; ) [ x ] #2 <-- <-- (data2->user_A;) Figure 1. TAO: Three-Way Handshake is Bypassed The CC value x is echoed in a CC.ECHO option in the segment (#2); the client side uses this option to validate the segment. Since segment #2 is valid, its data2 is delivered to the client user process. Segment #2 also carries B's CC value; this is used by A to validate non-SYN segments from B, as explained in Section 2.4. Implementing the T/TCP extensions expands the connection control block (TCB) to include the two CC values for the connection; call these variables TCB.CCsend and TCB.CCrecv (or CCsend, CCrecv for short). For example, the sequence shown in Figure 1 sets TCB.CCsend = x and TCB.CCrecv = y at host A, and vice versa at host B. Any segment that is received with a CC option containing a value SEG.CC different from TCB.CCsend will be rejected as an antique duplicate. 2.2 Transaction Sequences T/TCP applies the TAO mechanism described in the previous section to perform a transaction sequence. Figure 2 shows a minimal transaction, when the request and response data can each fit into a single segment. This requires three segments and completes in one round-trip time (RTT). If the TAO test had failed on segment #1, B would have queued data1 and the FIN for later processing, and then it would have returned a segment to A, to perform a normal 3WHS. Braden [Page 6] RFC 1644 Transaction/TCP July 1994 TCP A (Client) TCP B (Server) _______________ ______________ CLOSED LISTEN #1 SYN-SENT* --> --> CLOSE-WAIT* (TAO test OK) (data1->user_B) <-- LAST-ACK* #2 TIME-WAIT <-- (data2->user_A) #3 TIME-WAIT --> --> CLOSED (timeout) CLOSED Figure 2: Minimal T/TCP Transaction Sequence T/TCP extensions require additional connection states, e.g., the SYN-SENT*, CLOSE-WAIT*, and LAST-ACK* states shown in Figure 2. Section 3.3 describes these new connection states. To obtain the minimal 3-segment sequence shown in Figure 2, the server host must delay acknowledging segment #1 so the response may be piggy-backed on segment #2. If the application takes longer than this delay to compute the response, the normal TCP retransmission mechanism in TCP B will send an acknowledgment to forestall a retransmission from TCP A. Figure 3 shows an example of a slow server application. Although the sequence in Figure 3 does contain a 3-way handshake, the TAO mechanism has allowed the request data to be accepted immediately, so that the c