💾 Archived View for aphrack.org › issues › phrack28 › 3.gmi captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
View Raw
More Information
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
==Phrack Inc.==
Volume Three, Issue 28, File #3 of 12
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> <>
<> Introduction to the Internet Protocols <>
<> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <>
<> Chapter Eight Of The Future Transcendent Saga <>
<> <>
<> Part One of Two Files <>
<> <>
<> Presented by Knight Lightning <>
<> July 3, 1989 <>
<> <>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Prologue
~~~~~~~~
Much of the material in this file comes from "Introduction to the
Internet Protocols" by Charles L. Hedrick of Rutgers University.
That material is copyrighted and is used in this file by
permission. Time differention and changes in the wide area
networks have made it necessary for some details of the file to
updated and in some cases reworded for better understanding of
our readers. Also, Unix is a trademark of AT&T Technologies,
Inc. -- Just thought I'd let you know.
If you are not already familiar with TCP/IP, I would suggest that
you read "Introduction to MIDNET" (Phrack Inc., Volume Three,
Issue 27, File 3 of 12) for more information. That file is
Chapter Seven of The Future Transcendent Saga and contains
information about TCP/IP and how it is used within the National
Science Foundation Network (NSFnet).
Table of Contents - Part One
~~~~~~~~~~~~~~~~~
- Introduction
- What Is TCP/IP?
- General Description Of The TCP/IP Protocols
The TCP Level
The IP Level
The Ethernet Level
Introduction
~~~~~~~~~~~~
This article is a brief introduction to TCP/IP, followed by
suggestions on what to read for more information. This is not
intended to be a complete description, but it can give you a
reasonable idea of the capabilities of the protocols. However,
if you need to know any details of the technology, you will want
to read the standards yourself.
Throughout the article, you will find references to the
standards, in the form of "RFC" (Request For Comments) or "IEN"
(Internet Engineering Notes) numbers -- these are document
numbers. The final section (in Part Two) explains how you can
get copies of those standards.
What Is TCP/IP?
~~~~~~~~~~~~~~~
TCP/IP is a set of protocols developed to allow cooperating
computers to share resources across a network. It was developed
by a community of researchers centered around the ARPAnet.
First some basic definitions; The most accurate name for the set
of protocols I am describing is the "Internet protocol suite."
TCP and IP are two of the protocols in this suite (they will be
described below). Because TCP and IP are the best known of the
protocols, it has become common to use the term TCP/IP to refer
to the whole family.
The Internet is a collection of networks, including the Arpanet,
NSFnet, regional networks such as MIDnet (described in Chapter
Seven of the Future Transcendent Saga), local networks at a
number of University and research institutions, and a number of
military networks. The term "Internet" applies to this entire
set of networks.
The subset of them that is managed by the Department of Defense
is referred to as the "DDN" (Defense Data Network). This
includes some research-oriented networks, such as the ARPAnet, as
well as more strictly military ones (because much of the funding
for Internet protocol developments is done via the DDN
organization, the terms Internet and DDN can sometimes seem
equivalent).
All of these networks are connected to each other. Users can
send messages from any of them to any other, except where there
are security or other policy restrictions on access. Officially
speaking, the Internet protocol documents are simply standards
adopted by the Internet community for its own use. The
Department of Defense once issued a MILSPEC definition of TCP/IP
that was intended to be a more formal definition, appropriate for
use in purchasing specifications. However most of the TCP/IP
community continues to use the Internet standards. The MILSPEC
version is intended to be consistent with it.
Whatever it is called, TCP/IP is a family of protocols. A few
provide "low-level" functions needed for many applications.
These include IP, TCP, and UDP (all of which will be described in
a bit more detail later in this file). Others are protocols for
doing specific tasks, e.g. transferring files between computers,
sending mail, or finding out who is logged in on another
computer.
Initially TCP/IP was used mostly between minicomputers or
mainframes. These machines had their own disks, and generally
were self-contained. Thus the most important "traditional"
TCP/IP services are:
- File Transfer -- The file transfer protocol (FTP) allows a
user on any computer to get files from another computer, or
to send files to another computer. Security is handled by
requiring the user to specify a user name and password for
the other computer.
Provisions are made for handling file transfer between
machines with different character set, end of line
conventions, etc. This is not quite the same as "network
file system" or "netbios" protocols, which will be
described later. Instead, FTP is a utility that you run
any time you want to access a file on another system. You
use it to copy the file to your own system. You then can
work with the local copy. (See RFC 959 for specifications
for FTP.)
- Remote Login -- The network terminal protocol (TELNET)
allows a user to log in on any other computer on the
network. You start a remote session by specifying a
computer to connect to. From that time until you finish
the session, anything you type is sent to the other
computer. Note that you are really still talking to your
own computer, but the telnet program effectively makes your
computer invisible while it is running. Every character
you type is sent directly to the other system. Generally,
the connection to the remote computer behaves much like a
dialup connection. That is, the remote system will ask you
to log in and give a password, in whatever manner it would
normally ask a user who had just dialed it up.
When you log off of the other computer, the telnet program
exits, and you will find yourself talking to your own
computer. Microcomputer implementations of telnet
generally include a terminal emulator for some common type
of terminal. (See RFCs 854 and 855 for specifications for
telnet. By the way, the telnet protocol should not be
confused with Telenet, a vendor of commercial network
services.)
- Computer Mail -- This allows you to send messages to users
on other computers. Originally, people tended to use only
one or two specific computers and they would maintain "mail
files" on those machines. The computer mail system is
simply a way for you to add a message to another user's
mail file. There are some problems with this in an
environment where microcomputers are used.
The most serious is that a micro is not well suited to
receive computer mail. When you send mail, the mail
software expects to be able to open a connection to the
addressee's computer, in order to send the mail. If this
is a microcomputer, it may be turned off, or it may be
running an application other than the mail system. For
this reason, mail is normally handled by a larger system,
where it is practical to have a mail server running all the
time. Microcomputer mail software then becomes a user
interface that retrieves mail from the mail server. (See
RFC 821 and 822 for specifications for computer mail. See
RFC 937 for a protocol designed for microcomputers to use
in reading mail from a mail server.)
These services should be present in any implementation of TCP/IP,
except that micro-oriented implementations may not support
computer mail. These traditional applications still play a very
important role in TCP/IP-based networks. However more recently,
the way in which networks are used has been changing. The older
model of a number of large, self-sufficient computers is
beginning to change. Now many installations have several kinds
of computers, including microcomputers, workstations,
minicomputers, and mainframes. These computers are likely to be
configured to perform specialized tasks. Although people are
still likely to work with one specific computer, that computer
will call on other systems on the net for specialized services.
This has led to the "server/client" model of network services. A
server is a system that provides a specific service for the rest
of the network. A client is another system that uses that
service. Note that the server and client need not be on
different computers. They could be different programs running on
the same computer. Here are the kinds of servers typically
present in a modern computer setup. Also note that these
computer services can all be provided within the framework of
TCP/IP.
- Network file systems. This allows a system to access files on
another computer in a somewhat more closely integrated fashion
than FTP. A network file system provides the illusion that
disks or other devices from one system are directly connected
to other systems. There is no need to use a special network
utility to access a file on another system. Your computer
simply thinks it has some extra disk drives. These extra
"virtual" drives refer to the other system's disks. This
capability is useful for several different purposes. It lets
you put large disks on a few computers, but still give others
access to the disk space. Aside from the obvious economic
benefits, this allows people working on several computers to
share common files. It makes system maintenance and backup
easier, because you don't have to worry about updating and
backing up copies on lots of different machines. A number of
vendors now offer high-performance diskless computers. These
computers have no disk drives at all. They are entirely
dependent upon disks attached to common "file servers". (See
RFC's 1001 and 1002 for a description of PC-oriented NetBIOS
over TCP. In the workstation and minicomputer area, Sun's
Network File System is more likely to be used. Protocol
specifications for it are available from Sun Microsystems.) -
remote printing. This allows you to access printers on other
computers as if they were directly attached to yours. (The
most commonly used protocol is the remote lineprinter protocol
from Berkeley Unix. Unfortunately, there is no protocol
document for this. However the C code is easily obtained from
Berkeley, so implementations are common.)
- Remote execution. This allows you to request that a
particular program be run on a different computer. This is
useful when you can do most of your work on a small computer,
but a few tasks require the resources of a larger system.
There are a number of different kinds of remote execution.
Some operate on a command by command basis. That is, you
request that a specific command or set of commands should run
on some specific computer. (More sophisticated versions will
choose a system that happens to be free.) However there are
also "remote procedure call" systems that allow a program to
call a subroutine that will run on another computer. (There
are many protocols of this sort. Berkeley Unix contains two
servers to execute commands remotely: rsh and rexec. The
Unix "man" pages describe the protocols that they use. The
user-contributed software with Berkeley 4.3 contains a
"distributed shell" that will distribute tasks among a set of
systems, depending upon load.
- Name servers. In large installations, there are a number of
different collections of names that have to be managed. This
includes users and their passwords, names and network
addresses for computers, and accounts. It becomes very
tedious to keep this data up to date on all of the computers.
Thus the databases are kept on a small number of systems.
Other systems access the data over the network. (RFC 822 and
823 describe the name server protocol used to keep track of
host names and Internet addresses on the Internet. This is
now a required part of any TCP/IP implementation. IEN 116
describes an older name server protocol that is used by a few
terminal servers and other products to look up host names.
Sun's Yellow Pages system is designed as a general mechanism
to handle user names, file sharing groups, and other databases
commonly used by Unix systems. It is widely available
commercially. Its protocol definition is available from Sun.)
- Terminal servers. Many installations no longer connect
terminals directly to computers. Instead they connect them to
terminal servers. A terminal server is simply a small
computer that only knows how to run telnet (or some other
protocol to do remote login). If your terminal is connected
to one of these, you simply type the name of a computer, and
you are connected to it. Generally it is possible to have
active connections to more than one computer at the same time.
The terminal server will have provisions to switch between
connections rapidly, and to notify you when output is waiting
for another connection. (Terminal servers use the telnet
protocol, already mentioned. However any real terminal server
will also have to support name service and a number of other
protocols.)
- Network-oriented window systems. Until recently,
high-performance graphics programs had to execute on a
computer that had a bit-mapped graphics screen directly
attached to it. Network window systems allow a program to use
a display on a different computer. Full-scale network window
systems provide an interface that lets you distribute jobs to
the systems that are best suited to handle them, but still
give you a single graphically-based user interface. (The most
widely-implemented window system is X. A protocol description
is available from MIT's Project Athena. A reference
implementation is publically available from MIT. A number of
vendors are also supporting NeWS, a window system defined by
Sun. Both of these systems are designed to use TCP/IP.)
Note that some of the protocols described above were designed by
Berkeley, Sun, or other organizations. Thus they are not
officially part of the Internet protocol suite. However they are
implemented using TCP/IP, just as normal TCP/IP application
protocols are. Since the protocol definitions are not considered
proprietary, and since commercially-supported implementations are
widely available, it is reasonable to think of these protocols as
being effectively part of the Internet suite.
Note that the list above is simply a sample of the sort of
services available through TCP/IP. However it does contain the
majority of the "major" applications. The other commonly-used
protocols tend to be specialized facilities for getting
information of various kinds, such as who is logged in, the time
of day, etc. However if you need a facility that is not listed
here, I encourage you to look through the current edition of
Internet Protocols (currently RFC 1011), which lists all of the
available protocols, and also to look at some of the major TCP/IP
implementations to see what various vendors have added.
General Description Of The TCP/IP Protocols
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TCP/IP is a layered set of protocols. In order to understand
what this means, it is useful to look at an example. A typical
situation is sending mail. First, there is a protocol for mail.
This defines a set of commands which one machine sends to
another, e.g. commands to specify who the sender of the message
is, who it is being sent to, and then the text of the message.
However this protocol assumes that there is a way to communicate
reliably between the two computers. Mail, like other application
protocols, simply defines a set of commands and messages to be
sent. It is designed to be used together with TCP and IP.
TCP is responsible for making sure that the commands get through
to the other end. It keeps track of what is sent, and
retransmitts anything that did not get through. If any message
is too large for one datagram, e.g. the text of the mail, TCP
will split it up into several datagrams, and make sure that they
all arrive correctly. Since these functions are needed for many
applications, they are put together into a separate protocol,
rather than being part of the specifications for sending mail.
You can think of TCP as forming a library of routines that
applications can use when they need reliable network
communications with another computer.
Similarly, TCP calls on the services of IP. Although the
services that TCP supplies are needed by many applications, there
are still some kinds of applications that don't need them.
However there are some services that every application needs. So
these services are put together into IP. As with TCP, you can
think of IP as a library of routines that TCP calls on, but which
is also available to applications that don't use TCP. This
strategy of building several levels of protocol is called
"layering." I like to think of the applications programs such as
mail, TCP, and IP, as being separate "layers," each of which
calls on the services of the layer below it. Generally, TCP/IP
applications use 4 layers:
- An application protocol such as mail.
- A protocol such as TCP that provides services need by many
applications.
- IP, which provides the basic service of getting datagrams to
their destination.
- The protocols needed to manage a specific physical medium, such
as Ethernet or a point to point line.
TCP/IP is based on the "catenet model." (This is described in
more detail in IEN 48.) This model assumes that there are a
large number of independent networks connected together by
gateways. The user should be able to access computers or other
resources on any of these networks. Datagrams will often pass
through a dozen different networks before getting to their final
destination. The routing needed to accomplish this should be
completely invisible to the user. As far as the user is
concerned, all he needs to know in order to access another system
is an "Internet address." This is an address that looks like
128.6.4.194. It is actually a 32-bit number. However it is
normally written as 4 decimal numbers, each representing 8 bits
of the address. (The term "octet" is used by Internet
documentation for such 8-bit chunks. The term "byte" is not
used, because TCP/IP is supported by some computers that have
byte sizes other than 8 bits.)
Generally the structure of the address gives you some information
about how to get to the system. For example, 128.6 is a network
number assigned by a central authority to Rutgers University.
Rutgers uses the next octet to indicate which of the campus
Ethernets is involved. 128.6.4 happens to be an Ethernet used by
the Computer Science Department. The last octet allows for up to
254 systems on each Ethernet. (It is 254 because 0 and 255 are
not allowed, for reasons that will be discussed later.) Note
that 128.6.4.194 and 128.6.5.194 would be different systems. The
structure of an Internet address is described in a bit more
detail later.
Of course I normally refer to systems by name, rather than by
Internet address. When I specify a name, the network software
looks it up in a database, and comes up with the corresponding
Internet address. Most of the network software deals strictly in
terms of the address. (RFC 882 describes the name server
technology used to handle this lookup.)
TCP/IP is built on "connectionless" technology. Information is
transfered as a sequence of "datagrams." A datagram is a
collection of data that is sent as a single message. Each of
these datagrams is sent through the network individually. There
are provisions to open connections (i.e. to start a conversation
that will continue for some time). However at some level,
information from those connections is broken up into datagrams,
and those datagrams are treated by the network as completely
separate. For example, suppose you want to transfer a 15000
octet file. Most networks can't handle a 15000 octet datagram.
So the protocols will break this up into something like 30
500-octet datagrams. Each of these datagrams will be sent to the
other end. At that point, they will be put back together into
the 15000-octet file. However while those datagrams are in
transit, the network doesn't know that there is any connection
between them. It is perfectly possible that datagram 14 will
actually arrive before datagram 13. It is also possible that
somewhere in the network, an error will occur, and some datagram
won't get through at all. In that case, that datagram has to be
sent again.
Note by the way that the terms "datagram" and "packet" often seem
to be nearly interchangable. Technically, datagram is the right
word to use when describing TCP/IP. A datagram is a unit of
data, which is what the protocols deal with. A packet is a
physical thing, appearing on an Ethernet or some wire. In most
cases a packet simply contains a datagram, so there is very
little difference. However they can differ. When TCP/IP is used
on top of X.25, the X.25 interface breaks the datagrams up into
128-byte packets. This is invisible to IP, because the packets
are put back together into a single datagram at the other end
before being processed by TCP/IP. So in this case, one IP
datagram would be carried by several packets. However with most
media, there are efficiency advantages to sending one datagram
per packet, and so the distinction tends to vanish.
Two separate protocols are involved in handling TCP/IP datagrams.
TCP (the "transmission control protocol") is responsible for
breaking up the message into datagrams, reassembling them at the
other end, resending anything that gets lost, and putting things
back in the right order. IP (the "internet protocol") is
responsible for routing individual datagrams. It may seem like
TCP is doing all the work. However in the Internet, simply
getting a datagram to its destination can be a complex job. A
connection may require the datagram to go through several
networks at Rutgers, a serial line to the John von Neuman
Supercomputer Center, a couple of Ethernets there, a series of
56Kbaud phone lines to another NSFnet site, and more Ethernets on
another campus. Keeping track of the routes to all of the
destinations and handling incompatibilities among different
transport media turns out to be a complex job. Note that the
interface between TCP and IP is fairly simple. TCP simply hands
IP a datagram with a destination. IP doesn't know how this
datagram relates to any datagram before it or after it.
It may have occurred to you that something is missing here. I
have talked about Internet addresses, but not about how you keep
track of multiple connections to a given system. Clearly it
isn't enough to get a datagram to the right destination. TCP has
to know which connection this datagram is part of. This task is
referred to as "demultiplexing." In fact, there are several
levels of demultiplexing going on in TCP/IP. The information
needed to do this demultiplexing is contained in a series of
"headers." A header is simply a few extra octets tacked onto the
beginning of a datagram by some protocol in order to keep track
of it. It's a lot like putting a letter into an envelope and
putting an address on the outside of the envelope. Except with
modern networks it happens several times. It's like you put the
letter into a little envelope, your secretary puts that into a
somewhat bigger envelope, the campus mail center puts that
envelope into a still bigger one, etc. Here is an overview of
the headers that get stuck on a message that passes through a
typical TCP/IP network:
It starts with a single data stream, say a file you are trying to
send to some other computer:
......................................................
TCP breaks it up into manageable chunks. (In order to do this,
TCP has to know how large a datagram your network can handle.
Actually, the TCP's at each end say how big a datagram they can
handle, and then they pick the smallest size.)
.... .... .... .... .... .... .... ....
TCP puts a header at the front of each datagram. This header
actually contains at least 20 octets, but the most important ones
are a source and destination "port number" and a "sequence
number." The port numbers are used to keep track of different
conversations. Suppose 3 different people are transferring
files. Your TCP might allocate port numbers 1000, 1001, and 1002
to these transfers. When you are sending a datagram, this
becomes the "source" port number, since you are the source of the
datagram. Of course the TCP at the other end has assigned a port
number of its own for the conversation. Your TCP has to know the
port number used by the other end as well. (It finds out when
the connection starts, as I will explain below.) It puts this in
the "destination" port field. Of course if the other end sends a
datagram back to you, the source and destination port numbers
will be reversed, since then it will be the source and you will
be the destination. Each datagram has a sequence number. This
is used so that the other end can make sure that it gets the
datagrams in the right order, and that it hasn't missed any.
(See the TCP specification for details.) TCP doesn't number the
datagrams, but the octets. So if there are 500 octets of data in
each datagram, the first datagram might be numbered 0, the second
500, the next 1000, the next 1500, etc. Finally, I will mention
the Checksum. This is a number that is computed by adding up all
the octets in the datagram (more or less - see the TCP spec).
The result is put in the header. TCP at the other end computes
the checksum again. If they disagree, then something bad
happened to the datagram in transmission, and it is thrown away.
So here's what the datagram looks like now.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| your data ... next 500 octets |
| ...... |
If you abbreviate the TCP header as "T", the whole file now looks like this:
T.... T.... T.... T.... T.... T.... T....
You will note that there are items in the header that I have not
described above. They are generally involved with managing the
connection. In order to make sure the datagram has arrived at
its destination, the recipient has to send back an
"acknowledgement." This is a datagram whose "Acknowledgement
number" field is filled in. For example, sending a packet with
an acknowledgement of 1500 indicates that you have received all
the data up to octet number 1500. If the sender doesn't get an
acknowledgement within a reasonable amount of time, it sends the
data again. The window is used to control how much data can be
in transit at any one time. It is not practical to wait for each
datagram to be acknowledged before sending the next one. That
would slow things down too much. On the other hand, you can't
just keep sending, or a fast computer might overrun the capacity
of a slow one to absorb data. Thus each end indicates how much
new data it is currently prepared to absorb by putting the number
of octets in its "Window" field. As the computer receives data,
the amount of space left in its window decreases. When it goes
to zero, the sender has to stop. As the receiver processes the
data, it increases its window, indicating that it is ready to
accept more data. Often the same datagram can be used to
acknowledge receipt of a set of data and to give permission for
additional new data (by an updated window). The "Urgent" field
allows one end to tell the other to skip ahead in its processing
to a particular octet. This is often useful for handling
asynchronous events, for example when you type a control
character or other command that interrupts output. The other
fields are not pertinent to understanding what I am trying to
explain in this article.
TCP sends each datagram to IP. Of course it has to tell IP the
Internet address of the computer at the other end. Note that
this is all IP is concerned about. It doesn't care about what is
in the datagram, or even in the TCP header. IP's job is simply
to find a route for the datagram and get it to the other end. In
order to allow gateways or other intermediate systems to forward
the datagram, it adds its own header. The main things in this
header are the source and destination Internet address (32-bit
addresses, like 128.6.4.194), the protocol number, and another
checksum. The source Internet address is simply the address of
your machine. (This is necessary so the other end knows where
the datagram came from.) The destination Internet address is the
address of the other machine. (This is necessary so any gateways
in the middle know where you want the datagram to go.) The
protocol number tells IP at the other end to send the datagram to
TCP.
Although most IP traffic uses TCP, there are other protocols that
can use IP, so you have to tell IP which protocol to send the
datagram to. Finally, the checksum allows IP at the other end to
verify that the header wasn't damaged in transit. Note that TCP
and IP have separate checksums. IP needs to be able to verify
that the header didn't get damaged in transit, or it could send a
message to the wrong place. It is both more efficient and safer
to have TCP compute a separate checksum for the TCP header and
data. Once IP has tacked on its header, here's what the message
looks like:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TCP header, then your data ...... |
| |
If you represent the IP header by an "I", your file now looks like this:
IT.... IT.... IT.... IT.... IT.... IT.... IT....
Again, the header contains some additional fields that will not
be discussed in this article because they are not relevent to
understanding the process. The flags and fragment offset are
used to keep track of the pieces when a datagram has to be split
up. This can happen when datagrams are forwarded through a
network for which they are too big. (This will be discussed a
bit more below.) The time to live is a number that is decremented
whenever the datagram passes through a system. When it goes to
zero, the datagram is discarded. This is done in case a loop
develops in the system somehow. Of course this should be
impossible, but well-designed networks are built to cope with
"impossible" conditions.
At this point, it's possible that no more headers are needed. If
your computer happens to have a direct phone line connecting it
to the destination computer, or to a gateway, it may simply send
the datagrams out on the line (though likely a synchronous
protocol such as HDLC would be used, and it would add at least a
few octets at the beginning and end).
Most networks these days use Ethernet which has its own
addresses. The people who designed Ethernet wanted to make sure
that no two machines would end up with the same Ethernet address.
Furthermore, they didn't want the user to have to worry about
assigning addresses. So each Ethernet controller comes with an
address built-in from the factory. In order to make sure that
they would never have to reuse addresses, the Ethernet designers
allocated 48 bits for the Ethernet address. People who make
Ethernet equipment have to register with a central authority, to
make sure that the numbers they assign don't overlap any other
manufacturer. Ethernet is a "broadcast medium." That is, it is
in effect like an old party line telephone. When you send a
packet out on the Ethernet, every machine on the network sees the
packet. So something is needed to make sure that the right
machine gets it. As you might guess, this involves the Ethernet
header.
Every Ethernet packet has a 14-octet header that includes the
source and destination Ethernet address, and a type code. Each
machine is supposed to pay attention only to packets with its own
Ethernet address in the destination field. (It's perfectly
possible to cheat, which is one reason that Ethernet
communications are not terribly secure.) Note that there is no
connection between the Ethernet address and the Internet address.
Each machine has to have a table of what Ethernet address
corresponds to what Internet address. (I will describe how this
table is constructed a bit later.) In addition to the addresses,
the header contains a type code. The type code is to allow for
several different protocol families to be used on the same
network. So you can use TCP/IP, DECnet, Xerox NS, etc. at the
same time. Each of them will put a different value in the type
field. Finally, there is a checksum. The Ethernet controller
computes a checksum of the entire packet. When the other end
receives the packet, it recomputes the checksum, and throws the
packet away if the answer disagrees with the original. The
checksum is put on the end of the packet, not in the header. The
final result is that your message looks like this:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet destination address (first 32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet source address (last 32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type code |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP header, then TCP header, then your data |
| |
...
| |
| end of your data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If you represent the Ethernet header with "E", and the Ethernet
checksum with "C", your file now looks like this:
EIT....C EIT....C EIT....C EIT....C EIT....C
When these packets are received by the other end, of course all
the headers are removed. The Ethernet interface removes the
Ethernet header and the checksum. It looks at the type code.
Since the type code is the one assigned to IP, the Ethernet
device driver passes the datagram up to IP. IP removes the IP
header. It looks at the IP protocol field. Since the protocol
type is TCP, it passes the datagram up to TCP. TCP now looks at
the sequence number. It uses the sequence numbers and other
information to combine all the datagrams into the original file.
This ends my initial summary of TCP/IP. There are still some
crucial concepts I have not gotten to, so in part two, I will go
back and add details in several areas. (For detailed
descriptions of the items discussed here see, RFC 793 for TCP,
RFC 791 for IP, and RFC's 894 and 826 for sending IP over
Ethernet.)
__________________________________________________________________