Ah, “Project White Elephant.” Haven't talked about that in a while, but then, I haven't had much involvement with it for a while.
Until today.
The current sub-project is pretty straight forward. We have two machines, let's call them chip and dale (again, the names have been changed to protect [DELETED-the guilty-DELETED] me, but thematically, the pseudonyms are close enough). chip is the primary machine, which handles email, the websites, DNS (Domain Name Service), what have you, for “Project White Elephant.” dale is there solely to pick up if (or when) chip dies.
Now, chip has an IP (Internet Protocol) address of C, and dale has an IP address of D. The services, DNS, SMTP (Simple Main Transport Protocol), POP (Post Office Protocol), IMAP (Internet Message Access Protocol) and HTTP (HyperText Transport Protocol) (among others), are not bound to address C or D, but to V. chip's network card is programmed to listen to V, but in the event that chip goes down, dale will then configure its card to listen on V and take over DNS, SMTP, POP, IMAP and HTTP (among others).
Not trivial, but doable and the details can be a bit tedious. Simplifying things a bit, you need to make sure that the configuration of all the services on chip are copied over to dale and that you can start the services on dale without error. Oh, you also need to replicate any datafiles (websites, email, etc) from chip to dale.
But …
Sadly, we're using Blech, a … <shudder> … control panel (yes, yes, I know … I said we weren't going to use a control panel for “Project White Elephant” [1]—that order has since been rescinded—sigh).
Sure, a control panel makes simple yet tedious operations, such as configuring a new site on the system, easy and relatively painless. But attempt to do anything out of a proscribed set of procedures and basically, it ends up either being too difficult or outright impossible, that otherwise would be possible without the contraints of a control panel.
Today's particular problem had to do with site replication from chip to dale. We could configure the site on chip, and it even got pushed out to dale but the configuration of the webserver (Apache [2]) wasn't replicated properly and sites pushed out to dale would end up with an IP address of D and not V.
I was on the phone with one of our contacts in “Project White Elephant” and I swear, the phone conversation was straight out of a bad Star Trek episode:
“Okay, we can set up the routers to preferentially route V to chip and then if it goes down, switch over to dale.”
“Wouldn't that require the configuation of RIP (Routing Information Protocol) on the servers to initiate the router switch from chip to dale?”
“Yes, you're right. We can't do that. How about creating a special instance of DNS on dale such that if chip goes down, dale picks up, and—”
“Terrible idea. Each DNS change on chip requires tracking that and updating the private copy on dale”
“Sure, but it can be scripted.”
“But Doctor, you miss the point. Zone A has a serial number of N. chip goes down. dale takes over, making sure to update the serial number of A to N plus 1. chip then comes back up and starts serving out zone A with a serial number of N.”
“And of course, other DNS servers would ignore that since it's an older serial number, meaning—”
“We'd have to update the zones back into chip such that Blech will accept it, and remember, Blech stores everything in a database and overwrites the DNS configuration files—”
“Meaning we'd have to script the changes back into the database or manually update the information through Blech.”
“Exactly.”
“Okay, what about not running Blech on dale? Then just copy the sites, email and zones over?”
“Then you would have to either translate the configuration from chip to the non-Blech configuration we set up on dale, which means that K [being the admin who is responsible for running these boxes and can't use the command line—don't ask] won't be able to handle that box. Or we set it up just like Blech.”
“Dash it all! Okay, what about reversing the polarity of the flux capacitor and letting the backwash flow into the Jeffries Tubes?”
“Nice in theory, but you know what they say about theory and practice, right?”
“ ‘In theory, there is no difference between theory and practice, but in practice, there is.’ ”
“Exactly. Do that, and you run a risk of the back pressure rupturing the Jeffries Tubes, and let's not even get into the problem of stuck bits on the condensor plate if you reverse the polarity of the flux capacitor.”
“You're right! I forgot about that!”
Only the conversation was much longer. And not as interesting.
In the end, the only real sticking point was the websites. DNS isn't a problem if we can assign the services to address V. SMTP isn't a problem since you can set the MX (Mail eXchange) records for incoming email to do the right thing. And since we did find a way to replicate the existing mailboxes between the two machines (which doesn't impinge on any configurations) and assuming we can get the IP address switch over working, then POP and IMAP aren't real issues (and in that reguard, Blech does seem replicate most of the site data between chip and dale, stuff like users and what not). That leaves HTTP. And a quick test of simplying copying the web server configuration and the ability to start and stop the webserver via the command line (which amazingly enough is a standard command!) I think we can pull this off.
Now only if we could do something about those stuck bits on the condensor plate …