Tonight Mark [1] and I replaced a bad disk on swift, the colocated server currently serving up our sites. The bad disk is the system disk; the websites themselves (along with some other services we have) all reside on another disk.
There was much discussion before heading over there as to the best way to approach the problem of copying the data off the bad drive. The first method would to be install the new disk into the machine and do a disk-to-disk copy. The downside is that swift is a 1U (Rack Unit—1.75″) system with no room for a third drive (no matter how temporary). Also, the unit is designed to run with the cover on—we were unsure how it would deal running uncovered. The other option would be a network based copy, from swift to another machine with the new drive in it. The problem here was speed—even though we could hook the second machine directly to swift (on the secondary ethernet port) at 100Mbps (Megabit per second) it would still take a while to copy over several gigs worth of files. We decided to take a second computer (the Windows box Spring [2] and I share) as we decided to decide when we got to the colocation facility.
When we got there and examined swift, it was decided to use the temporary computer and do a network copy. We had some difficulty in getting the Windows box to recognize the new SCSI (Small Computer System Interface) disk (Mark had some extra SCSI controllers and disks); it was certainly news to me that the BIOS (Basic Input/Output System) setup was on the harddrive instead of on the ROM (Read-Only Memory) (much like the very old days of PC (Personal Computer)s). Once we straightened that out, it was pretty straightforward to boot Gentoo [3] from a live CD (Compact Disc), partition and format the new drive.
Then it was time to copy the files. It took some work to figure out how to use rsync using the rsync protocol and it still took us two attempts to get everything (first time rsync ran without root priviledges which limited the number of files copied). Once that finished (and still on the temporary machine) we recompiled the kernel to support SCSI, then set about to make the drive bootable.
The problem here was that Gentoo was a bit too aggressive in identifying hardware, and since the Linux kernel sticks USB (Universal Serial Bus) storage devices under the SCSI layer, the harddrive ended up with an ID that it wouldn't have in the swift. We ended up having to reboot the Gentoo CD, remove the loaded USB drivers, then mount the SCSI drive, then make the drive bootable. Once that was done, the temporary system booted up without a problem.
We then removed the drive and controller, cleaned the area (so we could have room to move about) and spent a few minutes making a game plan of swapping the bad drive for the new one. The physical swap went fairly smoothly. It was reconfiguring the BIOS that proved to be rather difficult. We couldn't get into the BIOS configuration. A search of possible key sequences to get into the BIOS configuration revealed:
We ran down the entire list, and not one worked. Mark then had the brainstorm to hold down the keys as the machine was powered up. First key he tried, DEL got us into the BIOS.
Talk about having plenty of time to get into the BIOS configuration.
Once the BIOS was configured with the new drive, it rebooted without a problem.
All told, we spent maybe five hours doing the drive swap, with the websites unavailable for maybe fifteen minutes tops. It was a bit scary at times though, watching the copying go with numerous disk errors. But so far, nothing important seems to have been corrupted, unlike most of the files in Mark's home directory (but he had current backups of that data anyway).