Network Working Group                                  M. Allman, Editor
Request for Comments: 2760   NASA Glenn Research Center/BBN Technologies
Category: Informational                                       S. Dawkins
                                                                  Nortel
                                                               D. Glover
                                                               J. Griner
                                                                 D. Tran
                                              NASA Glenn Research Center
                                                            T. Henderson
                                    University of California at Berkeley
                                                            J. Heidemann
                                                                J. Touch
                                   University of Southern California/ISI
                                                                H. Kruse
                                                            S. Ostermann
                                                         Ohio University
                                                                K. Scott
                                                   The MITRE Corporation
                                                                J. Semke
                                        Pittsburgh Supercomputing Center
                                                           February 2000


               Ongoing TCP Research Related to Satellites


Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.

Abstract

   This document outlines possible TCP enhancements that may allow TCP
   to better utilize the available bandwidth provided by networks
   containing satellite links.  The algorithms and mechanisms outlined
   have not been judged to be mature enough to be recommended by the
   IETF.  The goal of this document is to educate researchers as to the
   current work and progress being done in TCP research related to
   satellite networks.


Allman, et al.               Informational                      [Page 1]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


Table of Contents

   1         Introduction. . . . . . . . . . . . . . . . . . . .  2
   2         Satellite Architectures . . . . . . . . . . . . . .  3
   2.1       Asymmetric Satellite Networks . . . . . . . . . . .  3
   2.2       Satellite Link as Last Hop. . . . . . . . . . . . .  3
   2.3       Hybrid Satellite Networks     . . . . . . . . . . .  4
   2.4       Point-to-Point Satellite Networks . . . . . . . . .  4
   2.5       Multiple Satellite Hops . . . . . . . . . . . . . .  4
   3         Mitigations . . . . . . . . . . . . . . . . . . . .  4
   3.1       TCP For Transactions. . . . . . . . . . . . . . . .  4
   3.2       Slow Start. . . . . . . . . . . . . . . . . . . . .  5
   3.2.1     Larger Initial Window . . . . . . . . . . . . . . .  6
   3.2.2     Byte Counting . . . . . . . . . . . . . . . . . . .  7
   3.2.3     Delayed ACKs After Slow Start . . . . . . . . . . .  9
   3.2.4     Terminating Slow Start. . . . . . . . . . . . . . . 11
   3.3       Loss Recovery . . . . . . . . . . . . . . . . . . . 12
   3.3.1     Non-SACK Based Mechanisms . . . . . . . . . . . . . 12
   3.3.2     SACK Based Mechanisms . . . . . . . . . . . . . . . 13
   3.3.3     Explicit Congestion Notification. . . . . . . . . . 16
   3.3.4     Detecting Corruption Loss . . . . . . . . . . . . . 18
   3.4       Congestion Avoidance. . . . . . . . . . . . . . . . 21
   3.5       Multiple Data Connections . . . . . . . . . . . . . 22
   3.6       Pacing TCP Segments . . . . . . . . . . . . . . . . 24
   3.7       TCP Header Compression. . . . . . . . . . . . . . . 26
   3.8       Sharing TCP State Among Similar Connections . . . . 29
   3.9       ACK Congestion Control. . . . . . . . . . . . . . . 32
   3.10      ACK Filtering . . . . . . . . . . . . . . . . . . . 34
   4         Conclusions . . . . . . . . . . . . . . . . . . . . 36
   5         Security Considerations . . . . . . . . . . . . . . 36
   6         Acknowledgments . . . . . . . . . . . . . . . . . . 37
   7         References. . . . . . . . . . . . . . . . . . . . . 37
   8         Authors' Addresses. . . . . . . . . . . . . . . . . 43
   9         Full Copyright Statement. . . . . . . . . . . . . . 46

1   Introduction

   This document outlines mechanisms that may help the Transmission
   Control Protocol (TCP) [Pos81] better utilize the bandwidth provided
   by long-delay satellite environments.  These mechanisms may also help
   in other environments or for other protocols.  The proposals outlined
   in this document are currently being studied throughout the research
   community.  Therefore, these mechanisms are not mature enough to be
   recommended for wide-spread use by the IETF.  However, some of these
   mechanisms may be safely used today.  It is hoped that this document
   will stimulate further study into the described mechanisms.  If, at


Allman, et al.               Informational                      [Page 2]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   some point, the mechanisms discussed in this memo prove to be safe
   and appropriate to be recommended for general use, the appropriate
   IETF documents will be written.

   It should be noted that non-TCP mechanisms that help performance over
   satellite links do exist (e.g., application-level changes, queueing
   disciplines, etc.).  However, outlining these non-TCP mitigations is
   beyond the scope of this document and therefore is left as future
   work.  Additionally, there are a number of mitigations to TCP's
   performance problems that involve very active intervention by
   gateways along the end-to-end path from the sender to the receiver.
   Documenting the pros and cons of such solutions is also left as
   future work.

2   Satellite Architectures

   Specific characteristics of satellite links and the impact these
   characteristics have on TCP are presented in RFC 2488 [AGS99].  This
   section discusses several possible topologies where satellite links
   may be integrated into the global Internet.  The mitigation outlined
   in section 3 will include a discussion of which environment the
   mechanism is expected to benefit.

2.1 Asymmetric Satellite Networks

   Some satellite networks exhibit a bandwidth asymmetry, a larger data
   rate in one direction than the reverse direction, because of limits
   on the transmission power and the antenna size at one end of the
   link.  Meanwhile, some other satellite systems are unidirectional and
   use a non-satellite return path (such as a dialup modem link).  The
   nature of most TCP traffic is asymmetric with data flowing in one
   direction and acknowledgments in opposite direction.  However, the
   term asymmetric in this document refers to different physical
   capacities in the forward and return links.  Asymmetry has been shown
   to be a problem for TCP [BPK97,BPK98].

2.2 Satellite Link as Last Hop

   Satellite links that provide service directly to end users, as
   opposed to satellite links located in the middle of a network, may
   allow for specialized design of protocols used over the last hop.
   Some satellite providers use the satellite link as a shared high
   speed downlink to users with a lower speed, non-shared terrestrial
   link that is used as a return link for requests and acknowledgments.
   Many times this creates an asymmetric network, as discussed above.


Allman, et al.               Informational                      [Page 3]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


2.3 Hybrid Satellite Networks

   In the more general case, satellite links may be located at any point
   in the network topology.  In this case, the satellite link acts as
   just another link between two gateways.  In this environment, a given
   connection may be sent over terrestrial links (including terrestrial
   wireless), as well as satellite links.  On the other hand, a
   connection could also travel over only the terrestrial network or
   only over the satellite portion of the network.

2.4 Point-to-Point Satellite Networks

   In point-to-point satellite networks, the only hop in the network is
   over the satellite link.  This pure satellite environment exhibits
   only the problems associated with the satellite links, as outlined in
   [AGS99].  Since this is a private network, some mitigations that are
   not appropriate for shared networks can be considered.

2.5 Multiple Satellite Hops

   In some situations, network traffic may traverse multiple satellite
   hops between the source and the destination.  Such an environment
   aggravates the satellite characteristics described in [AGS99].

3   Mitigations

   The following sections will discuss various techniques for mitigating
   the problems TCP faces in the satellite environment.  Each of the
   following sections will be organized as follows: First, each
   mitigation will be briefly outlined.  Next, research work involving
   the mechanism in question will be briefly discussed.  Next the
   implementation issues of the mechanism will be presented (including
   whether or not the particular mechanism presents any dangers to
   shared networks).  Then a discussion of the mechanism's potential
   with regard to the topologies outlined above is given.  Finally, the
   relationships and possible interactions with other TCP mechanisms are
   outlined.  The reader is expected to be familiar with the TCP
   terminology used in [AGS99].

3.1 TCP For Transactions

3.1.1 Mitigation Description

   TCP uses a three-way handshake to setup a connection between two
   hosts [Pos81].  This connection setup requires 1-1.5 round-trip times
   (RTTs), depending upon whether the data sender started the connection
   actively or passively.  This startup time can be eliminated by using
   TCP extensions for transactions (T/TCP) [Bra94].  After the first


Allman, et al.               Informational                      [Page 4]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   connection between a pair of hosts is established, T/TCP is able to
   bypass the three-way handshake, allowing the data sender to begin
   transmitting data in the first segment sent (along with the SYN).
   This is especially helpful for short request/response traffic, as it
   saves a potentially long setup phase when no useful data is being
   transmitted.

3.1.2 Research

   T/TCP is outlined and analyzed in [Bra92,Bra94].

3.1.3 Implementation Issues

   T/TCP requires changes in the TCP stacks of both the data sender and
   the data receiver.  While T/TCP is safe to implement in shared
   networks from a congestion control perspective, several security
   implications of sending data in the first data segment have been
   identified [ddKI99].

3.1.4 Topology Considerations

   It is expected that T/TCP will be equally beneficial in all
   environments outlined in section 2.

3.1.5 Possible Interaction and Relationships with Other Research

   T/TCP allows data transfer to start more rapidly, much like using a
   larger initial congestion window (see section 3.2.1), delayed ACKs
   after slow start (section 3.2.3) or byte counting (section 3.2.2).

3.2 Slow Start

   The slow start algorithm is used to gradually increase the size of
   TCP's congestion window (cwnd) [Jac88,Ste97,APS99].  The algorithm is
   an important safe-guard against transmitting an inappropriate amount
   of data into the network when the connection starts up.  However,
   slow start can also waste available network capacity, especially in
   long-delay networks [All97a,Hay97].  Slow start is particularly
   inefficient for transfers that are short compared to the
   delay*bandwidth product of the network (e.g., WWW transfers).

   Delayed ACKs are another source of wasted capacity during the slow
   start phase.  RFC 1122 [Bra89] suggests data receivers refrain from
   ACKing every incoming data segment.  However, every second full-sized
   segment should be ACKed.  If a second full-sized segment does not
   arrive within a given timeout, an ACK must be generated (this timeout
   cannot exceed 500 ms).  Since the data sender increases the size of
   cwnd based on the number of arriving ACKs, reducing the number of


Allman, et al.               Informational                      [Page 5]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   ACKs slows the cwnd growth rate.  In addition, when TCP starts
   sending, it sends 1 segment.  When using delayed ACKs a second
   segment must arrive before an ACK is sent.  Therefore, the receiver
   is always forced to wait for the delayed ACK timer to expire before
   ACKing the first segment, which also increases the transfer time.

   Several proposals have suggested ways to make slow start less time
   consuming.  These proposals are briefly outlined below and references
   to the research work given.

3.2.1 Larger Initial Window

3.2.1.1 Mitigation Description

   One method that will reduce the amount of time required by slow start
   (and therefore, the amount of wasted capacity) is to increase the
   initial value of cwnd.  An experimental TCP extension outlined in
   [AFP98] allows the initial size of cwnd to be increased from 1
   segment to that given in equation (1).

               min (4*MSS, max (2*MSS, 4380 bytes))               (1)

   By increasing the initial value of cwnd, more packets are sent during
   the first RTT of data transmission, which will trigger more ACKs,
   allowing the congestion window to open more rapidly.  In addition, by
   sending at least 2 segments initially, the first segment does not
   need to wait for the delayed ACK timer to expire as is the case when
   the initial size of cwnd is 1 segment (as discussed above).
   Therefore, the value of cwnd given in equation 1 saves up to 3 RTTs
   and a delayed ACK timeout when compared to an initial cwnd of 1
   segment.

   Also, we note that RFC 2581 [APS99], a standards-track document,
   allows a TCP to use an initial cwnd of up to 2 segments.  This change
   is highly recommended for satellite networks.

3.2.1.2 Research

   Several researchers have studied the use of a larger initial window
   in various environments.  [Nic97] and [KAGT98] show a reduction in
   WWW page transfer time over hybrid fiber coax (HFC) and satellite
   links respectively.  Furthermore, it has been shown that using an
   initial cwnd of 4 segments does not negatively impact overall
   performance over dialup modem links with a small number of buffers
   [SP98].  [AHO98] shows an improvement in transfer time for 16 KB
   files across the Internet and dialup modem links when using a larger
   initial value for cwnd.  However, a slight increase in dropped


Allman, et al.               Informational                      [Page 6]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   segments was also shown.  Finally, [PN98] shows improved transfer
   time for WWW traffic in simulations with competing traffic, in
   addition to a small increase in the drop rate.

3.2.1.3 Implementation Issues

   The use of a larger initial cwnd value requires changes to the
   sender's TCP stack.  Using an initial congestion window of 2 segments
   is allowed by RFC 2581 [APS99].  Using an initial congestion window
   of 3 or 4 segments is not expected to present any danger of
   congestion collapse [AFP98], however may degrade performance in some
   networks.

3.2.1.4 Topology Considerations

   It is expected that the use of a large initial window would be
   equally beneficial to all network architectures outlined in section
   2.

3.2.1.5 Possible Interaction and Relationships with Other Research

   Using a fixed larger initial congestion window decreases the impact
   of a long RTT on transfer time (especially for short transfers) at
   the cost of bursting data into a network with unknown conditions.  A
   mechanism that mitigates bursts may make the use of a larger initial
   congestion window more appropriate (e.g., limiting the size of line-
   rate bursts [FF96] or pacing the segments in a burst [VH97a]).

   Also, using delayed ACKs only after slow start (as outlined in
   section 3.2.3) offers an alternative way to immediately ACK the first
   segment of a transfer and open the congestion window more rapidly.
   Finally, using some form of TCP state sharing among a number of
   connections (as discussed in 3.8) may provide an alternative to using
   a fixed larger initial window.

3.2.2 Byte Counting

3.2.2.1 Mitigation Description

   As discussed above, the wide-spread use of delayed ACKs increases the
   time needed by a TCP sender to increase the size of the congestion
   window during slow start.  This is especially harmful to flows
   traversing long-delay GEO satellite links.  One mechanism that has
   been suggested to mitigate the problems caused by delayed ACKs is the
   use of "byte counting", rather than standard ACK counting
   [All97a,All98].  Using standard ACK counting, the congestion window
   is increased by 1 segment for each ACK received during slow start.
   However, using byte counting the congestion window increase is based


Allman, et al.               Informational                      [Page 7]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   on the number of previously unacknowledged bytes covered by each
   incoming ACK, rather than on the number of ACKs received.  This makes
   the increase relative to the amount of data transmitted, rather than
   being dependent on the ACK interval used by the receiver.

   Two forms of byte counting are studied in [All98].  The first is
   unlimited byte counting (UBC).  This mechanism simply uses the number
   of previously unacknowledged bytes to increase the congestion window
   each time an ACK arrives.  The second form is limited byte counting
   (LBC).  LBC limits the amount of cwnd increase to 2 segments.  This
   limit throttles the size of the burst of data sent in response to a
   "stretch ACK" [Pax97].  Stretch ACKs are acknowledgments that cover
   more than 2 segments of previously unacknowledged data.  Stretch ACKs
   can occur by design [Joh95] (although this is not standard), due to
   implementation bugs [All97b,PADHV99] or due to ACK loss.  [All98]
   shows that LBC prevents large line-rate bursts when compared to UBC,
   and therefore offers fewer dropped segments and better performance.
   In addition, UBC causes large bursts during slow start based loss
   recovery due to the large cumulative ACKs that can arrive during loss
   recovery.  The behavior of UBC during loss recovery can cause large
   decreases in performance and [All98] strongly recommends UBC not be
   deployed without further study into mitigating the large bursts.

   Note: The standards track RFC 2581 [APS99] allows a TCP to use byte
   counting to increase cwnd during congestion avoidance, however not
   during slow start.

3.2.2.2 Research

   Using byte counting, as opposed to standard ACK counting, has been
   shown to reduce the amount of time needed to increase the value of
   cwnd to an appropriate size in satellite networks [All97a].  In
   addition, [All98] presents a simulation comparison of byte counting
   and the standard cwnd increase algorithm in uncongested networks and
   networks with competing traffic.  This study found that the limited
   form of byte counting outlined above can improve performance, while
   also increasing the drop rate slightly.

   [BPK97,BPK98] also investigated unlimited byte counting in
   conjunction with various ACK filtering algorithms (discussed in
   section 3.10) in asymmetric networks.


Allman, et al.               Informational                      [Page 8]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


3.2.2.3 Implementation Issues

   Changing from ACK counting to byte counting requires changes to the
   data sender's TCP stack.  Byte counting violates the algorithm for
   increasing the congestion window outlined in RFC 2581 [APS99] (by
   making congestion window growth more aggressive during slow start)
   and therefore should not be used in shared networks.

3.2.2.4 Topology Considerations

   It has been suggested by some (and roundly criticized by others) that
   byte counting will allow TCP to provide uniform cwnd increase,
   regardless of the ACKing behavior of the receiver.  In addition, byte
   counting also mitigates the retarded window growth provided by
   receivers that generate stretch ACKs because of the capacity of the
   return link, as discussed in [BPK97,BPK98].  Therefore, this change
   is expected to be especially beneficial to asymmetric networks.

3.2.2.5 Possible Interaction and Relationships with Other Research

   Unlimited byte counting should not be used without a method to
   mitigate the potentially large line-rate bursts the algorithm can
   cause.  Also, LBC may send bursts that are too large for the given
   network conditions.  In this case, LBC may also benefit from some
   algorithm that would lessen the impact of line-rate bursts of
   segments.  Also note that using delayed ACKs only after slow start
   (as outlined in section 3.2.3) negates the limited byte counting
   algorithm because each ACK covers only one segment during slow start.
   Therefore, both ACK counting and byte counting yield the same
   increase in the congestion window at this point (in the first RTT).

3.2.3 Delayed ACKs After Slow Start

3.2.3.1 Mitigation Description

   As discussed above, TCP senders use the number of incoming ACKs to
   increase the congestion window during slow start.  And, since delayed
   ACKs reduce the number of ACKs returned by the receiver by roughly
   half, the rate of growth of the congestion window is reduced.  One
   proposed solution to this problem is to use delayed ACKs only after
   the slow start (DAASS) phase.  This provides more ACKs while TCP is
   aggressively increasing the congestion window and less ACKs while TCP
   is in steady state, which conserves network resources.


Allman, et al.               Informational                      [Page 9]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


3.2.3.2 Research

   [All98] shows that in simulation, using delayed ACKs after slow start
   (DAASS) improves transfer time when compared to a receiver that
   always generates delayed ACKs.  However, DAASS also slightly
   increases the loss rate due to the increased rate of cwnd growth.

3.2.3.3 Implementation Issues

   The major problem with DAASS is in the implementation.  The receiver
   has to somehow know when the sender is using the slow start
   algorithm.  The receiver could implement a heuristic that attempts to
   watch the change in the amount of data being received and change the
   ACKing behavior accordingly.  Or, the sender could send a message (a
   flipped bit in the TCP header, perhaps) indicating that it was using
   slow start.  The implementation of DAASS is, therefore, an open
   issue.

   Using DAASS does not violate the TCP congestion control specification
   [APS99].  However, the standards (RFC 2581 [APS99]) currently
   recommend using delayed acknowledgments and DAASS goes (partially)
   against this recommendation.

3.2.3.4 Topology Considerations

   DAASS should work equally well in all scenarios presented in section
   2.  However, in asymmetric networks it may aggravate ACK congestion
   in the return link, due to the increased number of ACKs (see sections
   3.9 and 3.10 for a more detailed discussion of ACK congestion).

3.2.3.5 Possible Interaction and Relationships with Other Research

   DAASS has several possible interactions with other proposals made in
   the research community.  DAASS can aggravate congestion on the path
   between the data receiver and the data sender due to the increased
   number of returning acknowledgments.  This can have an especially
   adverse effect on asymmetric networks that are prone to experiencing
   ACK congestion.  As outlined in sections 3.9 and 3.10, several
   mitigations have been proposed to reduce the number of ACKs that are
   passed over a low-bandwidth return link.  Using DAASS will increase
   the number of ACKs sent by the receiver.  The interaction between
   DAASS and the methods for reducing the number of ACKs is an open
   research question.  Also, as noted in section 3.2.1.5 above, DAASS
   provides some of the same benefits as using a larger initial
   congestion window and therefore it may not be desirable to use both
   mechanisms together.  However, this remains an open question.
   Finally, DAASS and limited byte counting are both used to increase


Allman, et al.               Informational                     [Page 10]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   the rate at which the congestion window is opened.  The DAASS
   algorithm substantially reduces the impact limited byte counting has
   on the rate of congestion window increase.

3.2.4 Terminating Slow Start

3.2.4.1 Mitigation Description

   The initial slow start phase is used by TCP to determine an
   appropriate congestion window size for the given network conditions
   [Jac88].  Slow start is terminated when TCP detects congestion, or
   when the size of cwnd reaches the size of the receiver's advertised
   window.  Slow start is also terminated if cwnd grows beyond a certain
   size.  The threshold at which TCP ends slow start and begins using
   the congestion avoidance algorithm is called "ssthresh" [Jac88].  In
   most implementations, the initial value for ssthresh is the
   receiver's advertised window.  During slow start, TCP roughly doubles
   the size of cwnd every RTT and therefore can overwhelm the network
   with at most twice as many segments as the network can handle.  By
   setting ssthresh to a value less than the receiver's advertised
   window initially, the sender may avoid overwhelming the network with
   twice the appropriate number of segments.  Hoe [Hoe96] proposes using
   the packet-pair algorithm [Kes91] and the measured RTT to determine a
   more appropriate value for ssthresh.  The algorithm observes the
   spacing between the first few returning ACKs to determine the
   bandwidth of the bottleneck link.  Together with the measured RTT,
   the delay*bandwidth product is determined and ssthresh is set to this
   value.  When TCP's cwnd reaches this reduced ssthresh, slow start is
   terminated and transmission continues using congestion avoidance,
   which is a more conservative algorithm for increasing the size of the
   congestion window.

3.2.4.2 Research

   It has been shown that estimating ssthresh can improve performance
   and decrease packet loss in simulations [Hoe96].  However, obtaining
   an accurate estimate of the available bandwidth in a dynamic network
   is very challenging, especially attempting to do so on the sending
   side of the TCP connection [AP99].  Therefore, before this mechanism
   is widely deployed, bandwidth estimation must be studied in a more
   detail.

3.2.4.3 Implementation Issues

   As outlined in [Hoe96], estimating ssthresh requires changes to the
   data sender's TCP stack.  As suggested in [AP99], bandwidth estimates
   may be more accurate when taken by the TCP receiver, and therefore
   both sender and receiver changes would be required.  Estimating


Allman, et al.               Informational                     [Page 11]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   ssthresh is safe to implement in production networks from a
   congestion control perspective, as it can only make TCP more
   conservative than outlined in RFC 2581 [APS99] (assuming the TCP
   implementation is using an initial ssthresh of infinity as allowed by
   [APS99]).

3.2.4.4 Topology Considerations

   It is expected that this mechanism will work equally well in all
   symmetric topologies outlined in section 2.  However, asymmetric
   links pose a special problem, as the rate of the returning ACKs may
   not be the bottleneck bandwidth in the forward direction.  This can
   lead to the sender setting ssthresh too low.  Premature termination
   of slow start can hurt performance, as congestion avoidance opens
   cwnd more conservatively.  Receiver-based bandwidth estimators do not
   suffer from this problem.

3.2.4.5 Possible Interaction and Relationships with Other Research

   Terminating slow start at the right time is useful to avoid multiple
   dropped segments.  However, using a selective acknowledgment-based
   loss recovery scheme (as outlined in section 3.3.2) can drastically
   improve TCP's ability to quickly recover from multiple lost segments
   Therefore, it may not be as important to terminate slow start before
   a large loss event occurs.  [AP99] shows that using delayed
   acknowledgments [Bra89] reduces the effectiveness of sender-side
   bandwidth estimation.  Therefore, using delayed ACKs only during slow
   start (as outlined in section 3.2.3) may make bandwidth estimation
   more feasible.

3.3 Loss Recovery

3.3.1 Non-SACK Based Mechanisms

3.3.1.1 Mitigation Description

   Several similar algorithms have been developed and studied that
   improve TCP's ability to recover from multiple lost segments in a
   window of data without relying on the (often long) retransmission
   timeout.  These sender-side algorithms, known as NewReno TCP, do not
   depend on the availability of selective acknowledgments (SACKs)
   [MMFR96].

   These algorithms generally work by updating the fast recovery
   algorithm to use information provided by "partial ACKs" to trigger
   retransmissions.  A partial ACK covers some new data, but not all
   data outstanding when a particular loss event starts.  For instance,
   consider the case when segment N is retransmitted using the fast


Allman, et al.               Informational                     [Page 12]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   retransmit algorithm and segment M is the last segment sent when
   segment N is resent.  If segment N is the only segment lost, the ACK
   elicited by the retransmission of segment N would be for segment M.
   If, however, segment N+1 was also lost, the ACK elicited by the
   retransmission of segment N will be N+1.  This can be taken as an
   indication that segment N+1 was lost and used to trigger a
   retransmission.

3.3.1.2 Research

   Hoe [Hoe95,Hoe96] introduced the idea of using partial ACKs to
   trigger retransmissions and showed that doing so could improve
   performance.  [FF96] shows that in some cases using partial ACKs to
   trigger retransmissions reduces the time required to recover from
   multiple lost segments.  However, [FF96] also shows that in some
   cases (many lost segments) relying on the RTO timer can improve
   performance over simply using partial ACKs to trigger all
   retransmissions.  [HK99] shows that using partial ACKs to trigger
   retransmissions, in conjunction with SACK, improves performance when
   compared to TCP using fast retransmit/fast recovery in a satellite
   environment.  Finally, [FH99] describes several slightly different
   variants of NewReno.

3.3.1.3 Implementation Issues

   Implementing these fast recovery enhancements requires changes to the
   sender-side TCP stack.  These changes can safely be implemented in
   production networks and are allowed by RFC 2581 [APS99].

3.3.1.4 Topology Considerations

   It is expected that these changes will work well in all environments
   outlined in section 2.

3.3.1.5 Possible Interaction and Relationships with Other Research

   See section 3.3.2.2.5.

3.3.2 SACK Based Mechanisms

3.3.2.1 Fast Recovery with SACK

3.3.2.1.1 Mitigation Description

   Fall and Floyd [FF96] describe a conservative extension to the fast
   recovery algorithm that takes into account information provided by
   selective acknowledgments (SACKs) [MMFR96] sent by the receiver.  The
   algorithm starts after fast retransmit triggers the resending of a


Allman, et al.               Informational                     [Page 13]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   segment.  As with fast retransmit, the algorithm cuts cwnd in half
   when a loss is detected.  The algorithm keeps a variable called
   "pipe", which is an estimate of the number of outstanding segments in
   the network.  The pipe variable is decremented by 1 segment for each
   duplicate ACK that arrives with new SACK information.  The pipe
   variable is incremented by 1 for each new or retransmitted segment
   sent.  A segment may be sent when the value of pipe is less than cwnd
   (this segment is either a retransmission per the SACK information or
   a new segment if the SACK information indicates that no more
   retransmits are needed).

   This algorithm generally allows TCP to recover from multiple segment
   losses in a window of data within one RTT of loss detection.  Like
   the forward acknowledgment (FACK) algorithm described below, the SACK
   information allows the pipe algorithm to decouple the choice of when
   to send a segment from the choice of what segment to send.

   [APS99] allows the use of this algorithm, as it is consistent with
   the spirit of the fast recovery algorithm.

3.3.2.1.2 Research

   [FF96] shows that the above described SACK algorithm performs better
   than several non-SACK based recovery algorithms when 1--4 segments
   are lost from a window of data.  [AHKO97] shows that the algorithm
   improves performance over satellite links.  Hayes [Hay97] shows the
   in certain circumstances, the SACK algorithm can hurt performance by
   generating a large line-rate burst of data at the end of loss
   recovery, which causes further loss.

3.3.2.1.3 Implementation Issues

   This algorithm is implemented in the sender's TCP stack.  However, it
   relies on SACK information generated by the receiver.  This algorithm
   is safe for shared networks and is allowed by RFC 2581 [APS99].

3.3.2.1.4 Topology Considerations

   It is expected that the pipe algorithm will work equally well in all
   scenarios presented in section 2.

3.3.2.1.5 Possible Interaction and Relationships with Other Research

   See section 3.3.2.2.5.


Allman, et al.               Informational                     [Page 14]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


3.3.2.2 Forward Acknowledgments

3.3.2.2.1 Mitigation Description

   The Forward Acknowledgment (FACK) algorithm [MM96a,MM96b] was
   developed to improve TCP congestion control during loss recovery.
   FACK uses TCP SACK options to glean additional information about the
   congestion state, adding more precise control to the injection of
   data into the network during recovery.  FACK decouples the congestion
   control algorithms from the data recovery algorithms to provide a
   simple and direct way to use SACK to improve congestion control.  Due
   to the separation of these two algorithms, new data may be sent
   during recovery to sustain TCP's self-clock when there is no further
   data to retransmit.

   The most recent version of FACK is Rate-Halving [MM96b], in which one
   packet is sent for every two ACKs received during recovery.
   Transmitting a segment for every-other ACK has the result of reducing
   the congestion window in one round trip to half of the number of
   packets that were successfully handled by the network (so when cwnd
   is too large by more than a factor of two it still gets reduced to
   half of what the network can sustain).  Another important aspect of
   FACK with Rate-Halving is that it sustains the ACK self-clock during
   recovery because transmitting a packet for every-other ACK does not
   require half a cwnd of data to drain from the network before
   transmitting, as required by the fast recovery algorithm
   [Ste97,APS99].

   In addition, the FACK with Rate-Halving implementation provides
   Thresholded Retransmission to each lost segment.  "Tcprexmtthresh" is
   the number of duplicate ACKs required by TCP to trigger a fast
   retransmit and enter recovery.  FACK applies thresholded
   retransmission to all segments by waiting until tcprexmtthresh SACK
   blocks indicate that a given segment is missing before resending the
   segment.  This allows reasonable behavior on links that reorder
   segments.  As described above, FACK sends a segment for every second
   ACK received during recovery.  New segments are transmitted except
   when tcprexmtthresh SACK blocks have been observed for a dropped
   segment, at which point the dropped segment is retransmitted.

   [APS99] allows the use of this algorithm, as it is consistent with
   the spirit of the fast recovery algorithm.

3.3.2.2.2 Research

   The original FACK algorithm is outlined in [MM96a].  The algorithm
   was later enhanced to include Rate-Halving [MM96b].  The real-world
   performance of FACK with Rate-Halving was shown to be much closer to


Allman, et al.               Informational                     [Page 15]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   the theoretical maximum for TCP than either TCP Reno or the SACK-
   based extensions to fast recovery outlined in section 3.3.2.1
   [MSMO97].

3.3.2.2.3 Implementation Issues

   In order to use FACK, the sender's TCP stack must be modified.  In
   addition, the receiver must be able to generate SACK options to
   obtain the full benefit of using FACK.  The FACK algorithm is safe
   for shared networks and is allowed by RFC 2581 [APS99].

3.3.2.2.4 Topology Considerations

   FACK is expected to improve performance in all environments outlined
   in section 2.  Since it is better able to sustain its self-clock than
   TCP Reno, it may be considerably more attractive over long delay
   paths.

3.3.2.2.5 Possible Interaction and Relationships with Other Research

   Both SACK based loss recovery algorithms described above (the fast
   recovery enhancement and the FACK algorithm) are similar in that they
   attempt to effectively repair multiple lost segments from a window of
   data.  Which of the SACK-based loss recovery algorithms to use is
   still an open research question.  In addition, these algorithms are
   similar to the non-SACK NewReno algorithm described in section 3.3.1,
   in that they attempt to recover from multiple lost segments without
   reverting to using the retransmission timer.  As has been shown, the
   above SACK based algorithms are more robust than the NewReno
   algorithm.  However, the SACK algorithm requires a cooperating TCP
   receiver, which the NewReno algorithm does not.  A reasonable TCP
   implementation might include both a SACK-based and a NewReno-based
   loss recovery algorithm such that the sender can use the most
   appropriate loss recovery algorithm based on whether or not the
   receiver supports SACKs.  Finally, both SACK-based and non-SACK-based
   versions of fast recovery have been shown to transmit a large burst
   of data upon leaving loss recovery, in some cases [Hay97].
   Therefore, the algorithms may benefit from some burst suppression
   algorithm.

3.3.3 Explicit Congestion Notification

3.3.3.1 Mitigation Description

   Explicit congestion notification (ECN) allows routers to inform TCP
   senders about imminent congestion without dropping segments.  Two
   major forms of ECN have been studied.  A router employing backward
   ECN (BECN), transmits messages directly to the data originator


Allman, et al.               Informational                     [Page 16]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   informing it of congestion.  IP routers can accomplish this with an
   ICMP Source Quench message.  The arrival of a BECN signal may or may
   not mean that a TCP data segment has been dropped, but it is a clear
   indication that the TCP sender should reduce its sending rate (i.e.,
   the value of cwnd).  The second major form of congestion notification
   is forward ECN (FECN).  FECN routers mark data segments with a
   special tag when congestion is imminent, but forward the data
   segment.  The data receiver then echos the congestion information
   back to the sender in the ACK packet.  A description of a FECN
   mechanism for TCP/IP is given in [RF99].

   As described in [RF99], senders transmit segments with an "ECN-
   Capable Transport" bit set in the IP header of each packet.  If a
   router employing an active queueing strategy, such as Random Early
   Detection (RED) [FJ93,BCC+98], would otherwise drop this segment, an
   "Congestion Experienced" bit in the IP header is set instead.  Upon
   reception, the information is echoed back to TCP senders using a bit
   in the TCP header.  The TCP sender adjusts the congestion window just
   as it would if a segment was dropped.

   The implementation of ECN as specified in [RF99] requires the
   deployment of active queue management mechanisms in the affected
   routers.  This allows the routers to signal congestion by sending TCP
   a small number of "congestion signals" (segment drops or ECN
   messages), rather than discarding a large number of segments, as can
   happen when TCP overwhelms a drop-tail router queue.

   Since satellite networks generally have higher bit-error rates than
   terrestrial networks, determining whether a segment was lost due to
   congestion or corruption may allow TCP to achieve better performance
   in high BER environments than currently possible (due to TCP's
   assumption that all loss is due to congestion).  While not a solution
   to this problem, adding an ECN mechanism to TCP may be a part of a
   mechanism that will help achieve this goal.  See section 3.3.4 for a
   more detailed discussion of differentiating between corruption and
   congestion based losses.

3.3.3.2 Research

   [Flo94] shows that ECN is effective in reducing the segment loss rate
   which yields better performance especially for short and interactive
   TCP connections.  Furthermore, [Flo94] also shows that ECN avoids
   some unnecessary, and costly TCP retransmission timeouts.  Finally,
   [Flo94] also considers some of the advantages and disadvantages of
   various forms of explicit congestion notification.


Allman, et al.               Informational                     [Page 17]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


3.3.3.3 Implementation Issues

   Deployment of ECN requires changes to the TCP implementation on both
   sender and receiver.  Additionally, deployment of ECN requires
   deployment of some active queue management infrastructure in routers.
   RED is assumed in most ECN discussions, because RED is already
   identifying segments to drop, even before its buffer space is
   exhausted.  ECN simply allows the delivery of "marked" segments while
   still notifying the end nodes that congestion is occurring along the
   path.  ECN is safe (from a congestion control perspective) for shared
   networks, as it maintains the same TCP congestion control principles
   as are used when congestion is detected via segment drops.

3.3.3.4 Topology Considerations

   It is expected that none of the environments outlined in section 2
   will present a bias towards or against ECN traffic.

3.3.3.5 Possible Interaction and Relationships with Other Research

   Note that some form of active queueing is necessary to use ECN (e.g.,
   RED queueing).

3.3.4 Detecting Corruption Loss

   Differentiating between congestion (loss of segments due to router
   buffer overflow or imminent buffer overflow) and corruption (loss of
   segments due to damaged bits) is a difficult problem for TCP.  This
   differentiation is particularly important because the action that TCP
   should take in the two cases is entirely different.  In the case of
   corruption, TCP should merely retransmit the damaged segment as soon
   as its loss is detected; there is no need for TCP to adjust its
   congestion window.  On the other hand, as has been widely discussed
   above, when the TCP sender detects congestion, it should immediately
   reduce its congestion window to avoid making the congestion worse.

   TCP's defined behavior, as motivated by [Jac88,Jac90] and defined in
   [Bra89,Ste97,APS99], is to assume that all loss is due to congestion
   and to trigger the congestion control algorithms, as defined in
   [Ste97,APS99].  The loss may be detected using the fast retransmit
   algorithm, or in the worst case is detected by the expiration of
   TCP's retransmission timer.

   TCP's assumption that loss is due to congestion rather than
   corruption is a conservative mechanism that prevents congestion
   collapse [Jac88,FF98].  Over satellite networks, however, as in many
   wireless environments, loss due to corruption is more common than on
   terrestrial networks.  One common partial solution to this problem is


Allman, et al.               Informational                     [Page 18]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   to add Forward Error Correction (FEC) to the data that's sent over
   the satellite/wireless link.  A more complete discussion of the
   benefits of FEC can be found in [AGS99].  However, given that FEC
   does not always work or cannot be universally applied, other
   mechanisms have been studied to attempt to make TCP able to
   differentiate between congestion-based and corruption-based loss.

   TCP segments that have been corrupted are most often dropped by
   intervening routers when link-level checksum mechanisms detect that
   an incoming frame has errors.  Occasionally, a TCP segment containing
   an error may survive without detection until it arrives at the TCP
   receiving host, at which point it will almost always either fail the
   IP header checksum or the TCP checksum and be discarded as in the
   link-level error case.  Unfortunately, in either of these cases, it's
   not generally safe for the node detecting the corruption to return
   information about the corrupt packet to the TCP sender because the
   sending address itself might have been corrupted.

3.3.4.1 Mitigation Description

   Because the probability of link errors on a satellite link is
   relatively greater than on a hardwired link, it is particularly
   important that the TCP sender retransmit these lost segments without
   reducing its congestion window.  Because corrupt segments do not
   indicate congestion, there is no need for the TCP sender to enter a
   congestion avoidance phase, which may waste available bandwidth.
   Simulations performed in [SF98] show a performance improvement when
   TCP can properly differentiate between between corruption and
   congestion of wireless links.

   Perhaps the greatest research challenge in detecting corruption is
   getting TCP (a transport-layer protocol) to receive appropriate
   information from either the network layer (IP) or the link layer.
   Much of the work done to date has involved link-layer mechanisms that
   retransmit damaged segments.  The challenge seems to be to get these
   mechanisms to make repairs in such a way that TCP understands what
   happened and can respond appropriately.

3.3.4.2 Research

   Research into corruption detection to date has focused primarily on
   making the link level detect errors and then perform link-level
   retransmissions.  This work is summarized in [BKVP97,BPSK96].  One of
   the problems with this promising technique is that it causes an
   effective reordering of the segments from the TCP receiver's point of
   view.  As a simple example, if segments A B C D are sent across a
   noisy link and segment B is corrupted, segments C and D may have
   already crossed the link before B can be retransmitted at the link


Allman, et al.               Informational                     [Page 19]

RFC 2760       Ongoing TCP Research Related to Satellites  February 2000


   level, causing them to arrive at the TCP receiver in the order A C D
   B.  This segment reordering would cause the TCP receiver to generate
   duplicate ACKs upon the arrival of segments C and D.  If the
   reordering was bad enough, the sender would trigger the fast
   retransmit algorithm in the