“Don't Panic!”

While Mark [1] and I were doing a fast recovery of a customer machine [2] we received a call from John, the paper millionaire of a dotcom company and former member of a Grateful Dead cover band to say he couldn't get to his servers, located in the very same co-location facility we were currently at.

Mark goes over to John's machines. All servers are up, but he can't ping out. In fact, he can't get past the first hop. Mark then heads over to the core room, I remain in the co-location room, and we all get on a conference call.

Network seems okay—link light is on at both ends of the connection. No traffic. Jiggle the cord. Oh! A few packets. Then major lossage again. Repeat.

John is freaking out because he needs to be on a plane early and it's now 3:30 am or there abouts. He finally conferences in the main sysadmin for Atlantic Internet [3] because Mark and I can't figure out what's going on.

Neither could the sysadmin. Everything seems okay. Only there's no traffic. John, panicing is yelling at Mark. Mark is yelling back at John not to panic. Meanwhile we can barely hear the sysadmin over the conference call. Pandemonium reigns.

I quickly grab the network analyzer they have (way too cool) an hook it to John's side of the connection. It lights up like a Christmas tree. Low utilization, high collisions and an even larger rate of errors. I then take the unit to the Atlantic Internet side. Nothing. Normal traffic from John's servers.

We then plug the network analyzer into the Cisco Catalyst 5000 which is serving as the main switch. Actually, it's more like three switched hubs than a real switch—there are 24 ports grouped into three sections. Each section is a hub, but switched between sections.

The network analyzer lights up like a Christmas tree.

The consensus seems to be that the Catalyst is hosed. It probably didn't survive a DoS attack a few days previously and was slowly going bad. So it was some quick work to rerun a few cables to nearby switches and remove the Catalyst from service.

Mark and I didn't leave the office until 5 am.

[1] http://www.conman.org/people/myg/

[2] /boston/2000/04/14.1

[3] http://www.aibusiness.net/

Gemini Mention this post

Contact the author