Newsgroups: uk.misc,alt.etext From: dominic@natcorp.ox.ac.uk (Dominic Dunlop) Subject: Written-to-be-spoken material needed for British national Corpus Message-ID: <1993Nov2.102827.7917@onionsnatcorp.ox.ac.uk> Organization: British National Corpus, Oxford University, GB Date: Tue, 2 Nov 1993 10:28:27 GMT Lines: 39 Examining the various quotas for particular text types that we have to fill for the British National Corpus, we find that, despite exploring obvious avenues like broadcasts and published plays, the ``written to be spoken'' bucket is nowhere near full. So, if you have a play, a script (even scripts for cold-calling double-glazing salespeople will do), or perhaps a sermon, ideally already in electronic form; if you have some connection with Britain (you were born here, or you'd lived here for a couple of years immediately before writing the thing); and if you're willing to let us have indefinite rights for non-exclusive academic use of the smaller of 90% of the work or 45,000 words, at least for the European Community, and ideally for a larger territory, then please get in touch. A bit of background: The British National Corpus is a major collaborative research project involving government, industry and the research community. Its objective is the creation and distribution of a 100 million-word computerised corpus of modern English text, for use in linguistic research. The participants in the project are Oxford University Press, which leads the consortium, W R Chambers Ltd, Longman Group UK Ltd, the British Library and the Universities of Oxford (Oxford University Computing Services) and Lancaster (the Unit for Computer Research on the English Language). Government funding for the project is provided under the Advanced Technology Programme, which is a component of the DTI/SERC Joint Framework for Information Technology (JFIT). The project, which started in January 1991, is planned to run for 39 months. The primary goal of the project is to provide the research and UK industrial community with state-of-the-art corpus and lexical resources, as a solid basis for the development and exploitation of new products in the rapidly expanding field of natural language processing (NLP) as applied to British English. These resources will be made widely available under appropriate licensing conditions and at minimum cost to the academic research community and also to the wider industrial research community. -- Dominic Dunlop