💾 Archived View for thrig.me › tech › EOLstory.txt captured on 2024-09-29 at 01:01:11.
⬅️ Previous capture (2023-03-20)
-=-=-=-=-=-=-
The End-of-Line Story The ASCII standard for text does not define a unique end-of-line (EOL) character. Instead, ASCII defines two independent and orthogonal movements of the print head: Carriage Return (CR) and Line Feed (LF). (IBM's EBCDIC did not make this mistake; it defined a single New Line (NL) character.) Early operating system designers had to adopt some "end-of-line" convention using CR and LF; some used LF, some used CR, and some used a two-octet sequence: LF CR or CR LF. During the early ARPAnet research days (~1970-1972), this end-of-line diversity among operating systems made network communication between diverse host systems difficult. After some discussion (recorded in early RFCs), the researchers adopted a single convention: ASCII text transmitted across the network *must* use the two-character sequence: CR LF. This choice was designed to spread the pain equally among all operating systems of the day; each has to translate to and from the CR LF convention when text was transferred across the network. This EOL convention was the core of the initial Telnet protocol definition (negotiated options were added later). Jon Postel was one of the principal protocol policemen enforcing the CR LF requirement. He carried the EOL = CR LF convention Telnet into FTP and SMTP on the ARPAnet, and later these protocols were taken essentially unchanged into the Internet. Few people today are aware of the EOL issue, because systems generally (but not always!) make it transparent. For example, the RFC Editor stores the official RFC archive on a Unix system whose native EOL is a single LF. When you click on a link for an RFC from the RFC Editor Web page, your browser uses an FTP client to retrieve the ASCII text. The RFC's FTP server atranslates the LF in each text line into CR LF for transmission across the Internet, and your FTP client in turn translates each CR LF into whatever the EOL convention of your system. Many today use Windows, based on MS-DOS, which came along later and adopted CR LF as its EOL convention. This simplifies the picture; no EOL translation is actually required when MS-DOS systems move text across the Internet. RFC 2223, "Instructions for RFC Authors", describes the format of an RFC; it says that every line of an RFC is to be ended by CR LF. This means, *as transmitted across the Internet*; the text is actually stored at ISI and other Unix sites with LF as the EOL delimiter. It should all work, magically. However, misconfiguration or mismatches can still cause confusion about EOL. For example, you may see an extra ^M (Control M, or CR) at the end of every line of an RFC. Or you may be missing the CR entirely, causing bad formatting on a Windows system. On a Unix system, you may have to run the unix2dos utility to remove spurious ^M characters. Note that if you use binary mode FTP, the file is transferred literally byte-by-byte, so the source host's end of line is sent across the network. This normally works OK because it is assumed that binary mode FTP is used only between like systems. The RFC Editor Web page includes tar'd and zip'd collections of RFCs (www.rfc-editor.org/download.html). These compressed files are binary and therefore contain buried EOL sequences. The tar.Z files use the Unix convention (LF), while the .zip files are assumed to be destined for Windows machines and therefore use the MS-DOS convention. RFC Editor 18 April 2004