[4] An Introduction to TCP/IP by Jay Hauben jrh29@columbia.edu I. Introduction The Internet as we know it in 1998, although vast, is still a new and developing communications technology. It is based on a number of ingenious engineering accomplishments. This article will look at one of the most important, the Transmission Control Protocol and Internet Protocol suite, known as TCP/IP. Any quantitative description of the Internet includes the number of networks interconnected (hence the name Internet from internetworking), the number of computers among which electronic data can be exchanged and ultimately the number of people who can communicate with this vast computer and network resource and also with each other. The elements that comprise the Internet are computers and networks of computers. These being physical entities, in order to perform reliably, require careful design based on solid engineering principles. The Internet itself is more than the sum of its elements. It too requires careful and evolving design based on principles similar to those for computers and networks and some unique to the Internet. II The Internet The Internet is the successful interconnecting of many different networks to give the illusion of being one big computer network. What the networks have in common is that they all use packet switching technology(1). On the other hand, each of the connected networks may have its own addressing mechanism, packet size, speed etc. Any of the computers on the connected networks no matter what its operating system or other characteristics can communicate via the Internet if it has software implemented on it that conforms to the set of protocols which resulted from open research funded by the Advanced Research Projects Agency (ARPA) of the United States Department of Defense in the late 1970s(2). That set of protocols is built around the Internet Protocol (IP) and the Transmission Control Protocol (TCP). Informally, the set of protocols is called TCP/IP (pronounced by saying the names of the letters T-C-P-I-P). The Internet Protocol is the common agreement to have software on every computer on the Internet add a bit of additional information to each of packets that it sends out. Without such software a computer can not be connected to the Internet even if Internet traffic passes over the network that the computer is attached to. A packet that has the additional information required by IP is called an IP datagram. To each IP datagram the computer adds its own network addressing information. The whole package is called a network frame. It is network frames containing IP datagrams rather than ordinary packets that a computer must send onto its local packet switching network in order to communicate with a computer on another network via the Internet. If the communication is between computers on the same network the network information is enough to deliver the frame to its intended destination computer. If the communication is intended for a computer on a different network, the network information directs the frame to the closest computer that serves to connect the local network with a different network. Such a special purpose computer is called a router (some times a gateway). It is such routers that make internetworking possible. The Internet is not a single giant network of computers. It is hundreds of thousands of networks interconnected by routers. A router is a high speed, electronic, digital computer very much like all the other computers in use today. What makes a router special is that it has all the hardware and connections necessary to be able to connect to and communicate on two or more different networks. It also has the software to create and interpret network frames for each network it is attached to. In addition it must have capabilities require by IP. It must have software that can remove network information from the network frames that come to it and read the IP information in the datagrams. Based on the IP information it can add new network information to create a an appropriate network frame and send it out on that different net work. But how does it know where to send that the IP datagram? The entire process of Internet communication requires that each computer participating in the Internet has a unique digital address. The unique addresses of the source and destination are part of the IP information added to packets to make IP datagrams. The unique number assigned to a computer is its Internet Protocol or IP address. The IP address is a binary string of 32 digits. Therefore the Internet can provide communication among 2 to the 32nd power or about 4 billion 300 million computers (two unique addresses for every three people in the world). Internet addresses are written for example like 129.77.19.130. Each such address has two parts, a network ID and a host ID. In this example 129.77 (network ID) identifies that this computer is part of a particular university network and 19.130 (the host ID) identifies which particular computer it is. A router's IP software examines the IP information to determine the destination network from the net work ID of the destination address. Then the software consults a routing table to pick the next router to send the IP datagram to so that it takes the "shortest" path. A path is short only if it is active and it is not congested. Ingenious software programs called routing daemons send and receive short messages among adjacent routers characterizing the condition on each path. These messages are analyzed and the routing table is continually up dated. In this way IP datagrams pass from router to router over different networks until they reach a router connected to their destination network. That router puts network information into the network frame that delivers the datagram to its destination computer. The IP datagram is unchanged by this whole process. Each router has put next router information along with the IP datagram into the next network frame. When the IP datagram finally reaches its destination it has no information how it got there and different packets from the original source may have taken different paths to get to the same destination. IP as described above requires nothing of the interconnected networks except that they are packet switching networks with IP compliant routers. If a transmitting network uses a very small frame size, the IP software can even fragment an IP datagram into a few smaller ones to fit the network's frame size. It is this minimum requirement by the Internet Protocol that makes it possible for a great variety of networks to participate in the Internet. But this minimum requirement also results in little or no error detection. IP arranges for a best-effort process but has no guarantee of reliability. The remainder of the TCP/IP set of protocols adds a sufficient level of reliability to make the Internet useful. There are problems that IP does not solve. For example, interspersed network frames from many computers can sometimes arrive faster than a router can route them. A small backlog of data can be stored on most routers but if too many frames keep arriving some must be discarded. This possibility was antici- pated. On most computers on the Internet except routers software behaving according to the Transmission Control Protocol (TCP) is installed. When IP datagrams arrive at the destination computer, the TCP compliant software scans the IP information put into the IP datagram at the source. From this information the software can put packets, if they are all there, back together again. If there are duplications the software will discard all but the first copy of such packets to have arrived. But what if some IP datagrams have been lost? As a destination computer receives data, the TCP software sends a short message back over the Internet to the original source computer specifying what data has arrived. Such a message is called an "acknowledgment". Every time TCP and IP software send out data, TCP software starts a timer (sets a number and de creases it periodically using the computer's internal clock) and waits for an acknowledgment. If an acknowledgment arrives first, the timer is canceled. If the timer expires before an acknowledgment is received back the TCP software retransmits the data. In this way missing data can usually be replaced at the destination computer in a reasonable time. To achieve efficient data transfer the timeout interval can not be preset. It needs to be longer for more distant destinations and for times of greater network congestion and shorter for closer destinations and times of normal network traffic. TCP automatically adjusts the timeout interval based on current delays and on the distance it calculates according to the network address of destination. This ability to dynamically adjust the timeout interval contributes greatly to the success of the Internet. Having been designed together and engineered to perform two separate but related and needed tasks, TCP and IP complement each other. IP makes possible the travel of packets over different networks but it and thus the routers are not concerned with data loss or data reassembly. The Internet is possible because so little is required of the intervening networks. TCP makes the Internet reliable by detecting and correcting duplications, out of order arrival and data loss using an acknowledgment and time out mechanism with dynamically adjusted timeout intervals. III Conclusion The Internet is a wonderful engineering achievement. Since January 1, 1983, the cutoff date of the old ARPANET protocols, TCP/IP technology has successfully dealt with tremendous increases in usage and in the speed of connecting computers. This is a testament to the success of the TCP/IP protocol design and implementation process. Douglas Comer highlighted the features of this process as follows: * TCP/IP protocol software and the Internet were designed by talented dedicated people. * The Internet was a dream that inspired and challenged the research team. * Researchers were allowed to experiment, even when there was no short-term economic payoff. Indeed, Internet research often used new, innovative technologies that were expensive compared to exist ing technologies. * Instead of dreaming about a system that solved all problems, researchers built the Internet to operate efficiently * Researchers insisted that each part of the Internet work well in practice before they adopted it as standard./ * Internet technology solves an important, practical problem; the problem occurs whenever an organization has multiple networks. (from The Internet Book) The high speed, electronic, digital, stored program controlled computer and the TCP/IP Internet are major historic breakthroughs in engineering technology. Every such breakthrough in the past like the printing press, the steam engine, the telephone, the airplane have had profound effects on human society. The computer and the Internet have already begun to have such effects and this promises to be just the beginning. In the long run, despite the growing pains and dislocations every great technological break through serves to make possible a more fulfilling and comfortable life for more people. The computer and the Internet have the potential to speed up this process although it may take a hard fight for most people to experience any of the improvement. We live however in a time of great invention and great potential. The TCP/IP Internet is a major historical achievement. It provides human society with a new global communications technology with great promise and potential. This Internet has sustained unprecedented growth both in the number of its users and the vol ume of messages it handles daily. In the 15 years since the cutover from the NCP ARPANET to the TCP/IP Internet, the Internet has proven itself founded on solid principles. But there can be setbacks and false steps. As proposals for further development of the Internet are made, it would be proper to expect that they reaffirm and build on the proven principles. But there is, for example, research currently being under taken to "make IP more reliable." Since the principle of minimal requirement on component networks is IP's strength, such research if implemented would be a fundamental change for the Internet. In exchange for reliability, IP has made possible the interconnection of the most diverse of networks. To require greater reliability at the IP level could be an imposition of undue conformity on the component networks. That would be a backwards step. When today's Internet is developed and improved, the principles of TCP and IP will in all likelihood play crucial roles in that development. Bibliography Comer, Douglas E. Internetworking with TCP/IP Vol I: Principles, Protocols, and Architecture 2nd Edition. Englewood Cliffs, NJ. Prentice Hall. 1991. Comer, Douglas E. The Internet Book. Englewood Cliffs, NJ. Prentice Hall. 1995. Hauben, Michael and Ronda Hauben. Netizens: On the History and Impact of Usenet and the Internet. Los Alamitos, CA. IEEE Computer Society Press. 1997 Lynch, Daniel C. and Marshall T. Rose. Editors. Internet Systems Handbook. Reading, MA. Addison-Wesley. 1993. Stevens, W. Richard. TCP/IP Illustrated, Vol 1 Protocols. Reading, MA. Addison-Wesley. 1994. Strandh, Sigvard. The History of the Machine. New York. Dorset Press. 1989 (Copyright 1979, AB NORDBOK, Gothenburg, Sweden). --------------------- Notes: 1. See part IV of "The Computer and the Internet", the longer version of this paper accessible at http://www.ais.org/~jrh/paper.s98.html or by email from the author at jrh@ais.org. 2. Ibid.