Newsgroups: alt.etext From: dell@wiretap.spies.com (Thomas Dell) Subject: [GUTNBERG-L] Ascii v Graphics (Cornell Project) Message-ID: Organization: The Internet Wiretap Date: Sat, 6 Mar 1993 02:04:49 GMT Newsgroups: bit.listserv.gutnberg Approved-By: "Michael S. Hart" Message-ID: <9303051704.AA16004@rkd.msi.cornell.edu> Date: Fri, 5 Mar 1993 11:03:32 CST Sender: Project Gutenberg Email List From: Keith Dennis Subject: please post Lines: 92 Subject: Graphics Versus ASCII I feel that there are several points that have not been adequately addressed concerning the use of ASCII versus graphics. 1. COST. At Cornell there is a project (funded jointly with Xerox and the CPA) to scan in a number of old books that are falling apart. The cost is now down to about $100 for a typical 300 page book. This is independent of the complexity of the text/diagrams/pictures. The books are scanned in at 600dpi and can be reprinted on demand for any individual or institution and delivered at cost. Sometime in the near future these images will be available over the net; a few are currently available on an experimental basis here. Approximately 600 of these are in the field of mathematics, with the others in many other areas. To type such texts in a format suitable to convey the information contained therein (e.g., TeX) would cost between $5 and $10 per page, for a total of $1500 to $3000 per 300 page book. (See footnote at end.) 2. PRESERVATION. Due to the rapid deterioration of many books from the late 19th century it is imperative that we save their contents now. We cannot wait until standards are set and we cannot afford to have them all retyped. 3. ACCESS. Many important manuscripts and books are only available in the rare book rooms of institutions to which most of us do not have access. Making these available as electronic images, or even on paper, is preferable to not having them available. The proliferation of such copies also assures their preservation, at least in some form. 4. CONVERTABILITY. Once documents have been scanned, the possibilities for converting them to ASCII or some other format either via software or by the human mind and hand have significantly increased. In the long run, significant extra costs have not been incurred. In fact, the more of the original information retained, the easier it will be to put documents in whatever formats we may want in the future. This failure to keep all pertinant information is one of the main problems with plain ASCII format. In addition to not knowing what information should be retained, there is the question of accuracy. Retyping documents or using current OCR technology will certainly create errors. 5. CONTENT. As previously noted, many manuscripts are currently distributed over the net in TeX format. Those who write such papers can usually easily read them without special software - of course they look much nicer after processing. For scientific discussions, it is absolutely necessary to have a way of expressing information precisely. TeX is one way of doing that. It is certainly true that not everyone can read such things, nor do they want to. It is also true that not everyone wants to read all of the books in the Gutenberg project. This is not a criticism of either method, rather a simple observation that different people and fields have different interests and needs. 6. SEARCHABILITY. There exist previewers (e.g., one from ArborText) for dvi files (the device independent files created by TeX) that will search for text in documents which appear as graphics on the users screen. Thus technical documents can be seen with complicated equations displayed correctly AND searched at the same time. 7. THE FUTURE. It is not clear what hardware or software will be common twenty years from now. For that matter, it is not clear what standards there will be. I believe that it is certain that plain ASCII will not always be the standard. Unicode or some other system will certainly replace it as the common denominator. More than likely different types of information will have standard formats and there will be "viewers" available for every type of hardware that will seamlessly allow the user to view, print, watch or hear documents without needing to know anything about their format. If such systems exist, then more than likely any documents which are created now will be readily convertible to the required formats. It seems quite likely that the documents created by all of the different projects in the various formats will be useful to everyone in the future. __________________________________________________________________ I have recently started a project to make a number of older mathematics materials available in electronic format. These are TeX files so they can be searched as well as being previewed or printed. The first title to be released will be the collected works of Galois, in the original French as well as an English translation. These will be made available as shareware at a very low cost (just enough to recover the typing expense) Keith Dennis Chairman, Department of Mathematics Cornell University Ithaca, New York 14853-7901