Consider the hardware: a computer system with close to 400 parallel processors, 100 terabytes of disk space, hundreds of gigs of RAM, all for under a half-million dollars. As you'll read in this in interview, the folks at the Archive have turned clusters of PCs into a single parallel computer running the biggest database in existence—and wrote their own operating system, P2, which allows programmers with no expertise in parallel systems to program the system.
Via Flutterby [1] How The Wayback Machine Works [2]
I find this stuff facinating. Google [3] runs off 8,000 servers, this site has 100 terabytes of storage, and my friend Kelly [4] works at a place that processes gigs of log files every day.
He said that before he optimized the processing, it sometimes took about 30 hours to process one day's worth of logs. Now, it can finish (for a real busy day) in under 22 hours. He works for a really busy site and I found the inner workings quite interesting.
[2] http://www.oreillynet.com/pub/a/webservices/2002/01/18/brewster.html