💾 Archived View for gemini.theuse.net › utzoointro.gmi captured on 2022-01-08 at 13:41:19. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-11-30)
-=-=-=-=-=-=-
Well, the thank-you's have been rather ebullient all day long today and I feel
somewhat embarrassed by the attention. Especially given how long it took us to
get the archive on line and visible! It has to be close to 10 years now. Sigh.
The story is more a story of fits and starts than of resolve. And our
contribution accounts for some (most?) of the first 10 years of the Google archive.
If I recall correctly, the issue of Henry Spencer's (actually, the University
of Toronto, Department of Zoology's) NetNews archive was raised at a Usenix
conference in the early 90's. The question: can we get at them? Bruce Jones
was especially interested in this. Henry's answer was that it really wasn't
going to be easy because he had neither the disk space nor the tape drive to
pull them all down to make them available.
I, it turned out, did. So one bright winter day I drove from London
(Ontario Canada) to Toronto (Ontario Canada) -- a two hour drive in my
shiny new pickup truck and picked up 141 magtapes from the Zoology
department at UofT and brought them back to the Department of Computer
Science at the University of Western Ontario. (A not unimpressive
bandwidth, by the way, of some 18Mb/sec :-) never underestimate the
bandwidth of a pickup truck on the highway!)
Then with the help of several people (some of whom have not yet been credited)
we started to pull the data off of the tapes and onto disks in both the
Computer Science department and the Robarts Research Institute. Lance Bailey,
then with the Robarts Research Institute, did the pulling there and I with
assistance from Bob Webber did it at Computer Science. Bruce Jones from
UCSD took some vacation time and came up here to help pull data down for a
week or so as well.
But we quickly ran out of space and time: Lance left Robarts for UBC, Bruce's
vacation ended, and Bob and I got busy doing other things (like our jobs).
As a result, the archive project made very little progress over the next few years.
Then Brewster Kahle started pushing on us (thanks Brewster!) to get it done.
He even bought us a large disk to hold the archive when we truly ran out of
space. With the help of Sue Thielen, who was out of work and bored, we got
all of the rest of the tapes read down onto that disk. Unfortunately, that
disk was not "close enough" to either a tape drive or the ftp server to
make the data available to anyone. And it wasn't organized in anyway usefully.
Brewster pushed very gently for a very long time but the new archive project
was far from the top of the list of projects I was supposed to be working
on and I just never got it going again.
Late this summer Michael Schmitt from Google started pushing as well.
And as luck would have it, I was able to hire a student to do the final
sorting of the archive as well. And, that luck still holding, I managed
to "steal" enough space on the ftp server for the entire archive! But
it still took months to get that figured out and the archive transferred
to a machine from which they pull the archive. It was the middle of October
before we were able make the collection available to Google. And it is
actually available, although totally unsorted, to anyone who wants it and can
deal with pulling some 160 files ranging in size from 1.4Mb to 65Mb. Just drop
me a line to say please and we'll arrange to make it visible to you.
I'd still like to impose a bit more order on the raw archives than we have
but the time just hasn't allowed for that...
David Wiseman, Dec 11, 2001