TKB: So, you know how you sometimes need to output something before the data you need to do it has been read from the input? So normally you save the input data in memory until you’ve read it all, then you find the information you need to output the early bit, then process the input data from memory to produce the output?
CPB: Yep, I have been there before!
TKB: But that is pain if you have to deal with large files and don’t have enough memory. Usually you’ve got more disk space than memory, so there is an interesting trick you can do instead that I learned from how Unroff (the roff translator written in Elk Scheme) handled cross references and tables of contents when converting to HTML. You don’t have the section titles for the table of contents or the names of the HTML files they’ll be in (because you’re chucking the output into multiple files) until you have read the whole document. So, just record the position in the file where the table of contents would go, and has you encounter sections record the section title and the output file it is in and then at the end of generating the files copy the file up to the point where the table of contents is, then output the table of contents with all the correct section titles and links to the output html files, and then copy the rest of the file. In fact, Unroff generalized to multiple files in such a way that you could record positions to use later in any file and add the necessary information no matter where in the input and output you were. Very useful!
TKB: Back to unroff, I downloaded the source and groffed the manual, and see it is a little more primitive than I remembered. You used a stream-position function to record the output positions you were interested in, and added the filename, position, and text to be inserted to a list, and at the end of the program you passed all those to the file-insertions procedure (probably as the result of an exit event handler), which did all the rewriting. This was supported by the streams functionality: “Input, output, and storage of text lines in unroff are centered around a new Scheme data type named stream and a set of primitives that work on streams. A stream can act as a source (input stream) or as a sink (output stream) for lines of text. Streams not only serve as the basis for input and output operations and for the exchange of text with shell commands, but can also be used to temporarily buffer lines of text (e.g. foot- notes or tables of contents) and to implement user-defined macros in a simple way.”
Unroff was written in Elk Scheme, which was a neat early Scheme. The versions from the original pages are still available, but may not run on modern Unixen.
Elk Scheme: The Extension Language Kit
Elk: Distribution and Source Page
There is a newer version of Elk Scheme that may run on modern Unixen.
Elk Scheme - the Extension Language Kit, from Sam Voy
Here are the manuals that come with Unroff in PDF: