💾 Archived View for thrig.me › tech › EOLstory.txt captured on 2024-09-29 at 01:01:11.

View Raw

More Information

⬅️ Previous capture (2023-03-20)

-=-=-=-=-=-=-


The End-of-Line Story

The ASCII standard for text does not define a unique end-of-line (EOL)
character.  Instead, ASCII defines two independent and orthogonal
movements of the print head: Carriage Return (CR) and Line Feed (LF).
(IBM's EBCDIC did not make this mistake; it defined a single New Line
(NL) character.)  Early operating system designers had to adopt some
"end-of-line" convention using CR and LF; some used LF, some used
CR, and some used a two-octet sequence: LF CR or CR LF.

During the early ARPAnet research days (~1970-1972), this end-of-line
diversity among operating systems made network communication between
diverse host systems difficult.  After some discussion (recorded in
early RFCs), the researchers adopted a single convention:

	ASCII text transmitted across the network *must* use the
	two-character sequence:  CR LF.

This choice was designed to spread the pain equally among all
operating systems of the day; each has to translate to and from the CR
LF convention when text was transferred across the network.

This EOL convention was the core of the initial Telnet protocol
definition (negotiated options were added later).  Jon Postel was
one of the principal protocol policemen enforcing the CR LF
requirement.  He carried the EOL = CR LF convention Telnet into
FTP and SMTP on the ARPAnet, and later these protocols were taken
essentially unchanged into the Internet.

Few people today are aware of the EOL issue, because systems generally
(but not always!) make it transparent.  For example, the RFC Editor
stores the official RFC archive on a Unix system whose native EOL is a
single LF.  When you click on a link for an RFC from the RFC Editor Web
page, your browser uses an FTP client to retrieve the ASCII text.  The
RFC's FTP server atranslates the LF in each text line into CR LF for
transmission across the Internet, and your FTP client in turn
translates each CR LF into whatever the EOL convention of your
system.

Many today use Windows, based on MS-DOS, which came along later and
adopted CR LF as its EOL convention.  This simplifies the picture; no
EOL translation is actually required when MS-DOS systems move text
across the Internet.

RFC 2223, "Instructions for RFC Authors", describes the format of an
RFC; it says that every line of an RFC is to be ended by CR LF.  This
means, *as transmitted across the Internet*; the text is actually
stored at ISI and other Unix sites with LF as the EOL delimiter.

It should all work, magically.  However, misconfiguration or mismatches
can still cause confusion about EOL.  For example, you may see an extra
^M (Control M, or CR) at the end of every line of an RFC.  Or you may
be missing the CR entirely, causing bad formatting on a Windows
system.  On a Unix system, you may have to run the unix2dos utility
to remove spurious ^M characters.

Note that if you use binary mode FTP, the file is transferred literally
byte-by-byte, so the source host's end of line is sent across the
network.  This normally works OK because it is assumed that binary mode
FTP is used only between like systems.  The RFC Editor Web page
includes tar'd and zip'd collections of RFCs
(www.rfc-editor.org/download.html).  These compressed files are binary
and therefore contain buried EOL sequences.  The tar.Z files use the
Unix convention (LF), while the .zip files are assumed to be destined
for Windows machines and therefore use the MS-DOS convention.

RFC Editor
18 April 2004