Newsgroups: bit.listserv.gutnberg From: willem@pegun.law.columbia.edu (System administrator (Willem Scholten)) Subject: Janus Project Message-ID: <1993Feb14.221854.2666@news.columbia.edu> Organization: Columbia University Date: Sun, 14 Feb 1993 22:18:54 GMT Lines: 166 In response to the messages cited below, here is the official project description of project JANUS. For additional information please contact us at: janus-info@sparc-1.law.columbia.edu or via anonynous ftp at ftp.law.columbia.edu and in the directory /pub/janus you will find additional postings from time to time. -Willem. ---------------------------------- description ---------------------------------- Project JANUS Large-scale Imaging and Text-Retrieval Project Introduction JANUS is a five-year project of Columbia Law Library to develop an electronic, or 'virtual', library that combines digital conversion and storage of document text and graphics, massively parallel supercomputing, and advanced user-friendly search and retrieval software. When completed in 1996, JANUS will be a 'library of the future' and provide unprecedented electronic access nationwide to large-scale databases comprised of full text, graphics, images, data, sound and video materials. JANUS is a joint venture of Thinking Machines Corporation (TMC) of Cambridge MA, the world's largest manufacturer of highly parallel supercomputers, and Columbia Law Library, the nation's third largest collection of legal materials. JANUS uses the world's leading parallel supercomputer, The Connection Machine System (CM-2)*, on loan from TMC and installed at the Law School Library. Columbia and TMC are jointly developing retrieval software for use on large scale databases. A 'Virtual' Library Advances in optical scanning allow Project JANUS to convert and store in digital form an exact copy of a document page by page. Project JANUS is developing software that correctly interprets text that may have been corrupted during the optical character recognition process. By 1996, Columbia Law Library expects to convert about 10,000 to 12,000 volumes a year to electronic storage and expects a continued increase to its collection of new documents available only in electronic form. JANUS' optical scanning process also converts and stores an exact copy of the document into a file that can be searched and manipulated by software developed by Project JANUS. JANUS' unique and powerful ability to search for and retrieve specific information from large scale text, graphics, image, sound and video databases best demonstrates the 'library of the future.' From any computer terminal connected to the JANUS system, a user seeking specific information on a subject can type in a word, phrase or sentence in ordinary English and receive in seconds specific references listed in order of relevance. The user has access to the exact copy of complete text at the position most relevant to the query and to relevant images and sounds. The user can save retrieved items or sections of items as needed. The user can continue searching for additional references by using new words or phrases or by highlighting words, phrases or sections of text already retrieved and again receive in seconds more references listed in orders of relevance. JANUS' Database Software Project JANUS accomplishes free-text searches by using a non- Boolean natural language algorithm that incorporates frequency- based indexing, relational value weighting, and relevance feedback from the individual user. Using the immense "number-crunching" power of parallel computing and building on TMC's work with the Wide Area Information Server (WAIS), JANUS's unique retrieval system, WAISSeeker, allows users to search the full text of documents for arbitrary combinations of words and, unlike keyword systems, does not require users to have prior knowledge of the discipline they are searching. JANUS is also developing a Graphical User Interface (GUI) to make all processes easy to use. In addition to being clearly instructed during the feedback and the information retrieval phases of the process, system users will also be able to incorporate retrieved products into their word processing work products. JANUS' development of WAISSeeker will provide a storage, search and retrieval system of unprecedented power, flexibility and accuracy. Both researchers and the general user will have easy access to large scale databases of complete text, images and sounds and will retrieve items most relevant to individual user queries. Interconnection, Standards and Copyright Considerations JANUS will first serve scholars on Columbia's campus. Because JANUS uses the Z39.50 standard as its communication protocol in its WAIS interface, it will be accessible when operational from any remote computer using a WAIS server and, via Internet, can serve users nationally and internationally. Development of a large bandwidth network channel, such as proposed in development of a National Research and Education Network, would allow a large number of users to browse and work in the Columbia Law Library from any NREN connection in the nation. JANUS's database control system is designed to provide a datafile tagging and logging structure to protect copyrights and intellectual property. Columbia Law School is collaborating with Simon and Shuster to set standards for copyright protections applicable to wide-scale electronic retrieval and dissemination of information. Project JANUS 1991-1996: "The Library of the Future" Project JANUS is nearing completion of Phase One by developing and testing its natural language algorithm and extensively testing the system's ability to retrieve corrupted text. Phase Two will see a limited use production system available to researchers at the Columbia Law Library and via the Internet. JANUS will acquire large-scale databases and develop procedures for supporting multiple users concurrently using WAISSeeker. In its final phase, JANUS will become a full production system with staffing for the Law School 'virtual' library in place. JANUS will have established relations with publishers to allow for use of copyrighted materials directly in electronic form and will have developed programs to track and verify use of licensed materials electronically. *Connection Machine, CM-2 and Thinking Machines are trademarks of Thinking Machines Corporation. The CM-2 is equipped with 32,000 processors and has 256 megabytes of main memory and 20 gigabytes (billions of bytes) of hard disk storage. In article <930211111626.18248@charlie.usd.edu>, MJENSEN@charlie.usd.edu writes: |> ----------------------------Original message---------------------------- |> This is an interesting project. I know some of the people involved. |> It is unfortuneately confined primarily to public domain works because |> of copyright clearance problems. |> |> Mary Brandt Jensen University of South Dakota |> Director of the Law Library School of Law |> Associate Professor of Law 414 E. Clark St. |> MJENSEN@CHARLIE.USD.EDU Vermillion, SD 57069-2390 |> (605) 677 6363 Fax (605) 677 5417 |> |> ----------------------------Original message---------------------------- |> In the Feb. 8, 1993 issue of _The Wall Street Journal_ there is |> an article on the print to etext efforts of Chicago-Kent School |> of Law and Columbia University Law Library. The article is by |> William M. Bulkeley, on page B6, title "Libraries Shift From |> Books to Computers" |> |> Douglas Winship |> Austin, Texas |> winship@tenet.edu -- =============================================================================== Willem Scholten Mail: willem@lawmail.law.columbia.edu Columbia Univ Law School FedEx mail: willem@sparc-1.law.columbia.edu Computer Center Fax: 1 212 854 7946 435 West 116th Street Room 7W1 New York, NY 10027 ph: 1 212 854 7938 ===============================================================================