It was a misconfiguration

Yesterday's problem [1]? It turned out to be a misconfiguration. Or rather, the configuration file format changed enough to break the configuration files checked in for regression testing.

Sometime since the last regression test, parameters that deal with time can now take a suffix to denote the time unit being used (for example, “9s” for 9 seconds, or “3d” for 3 days) and the base unit for non-suffixed values changed (from “seconds” to “milliseconds” I'm guessing) so what was once configured to time out in 15 seconds would now timeout in 15 millisecconds, and thus, the one component would think the other side timed out.

I saw the initial changes, but I neglected to update a few key parameters properly. It's an easy thing to miss (as it took me two tries to change all the affected parameters).

Sigh.

But that aside, the regression test finally ran (well, it's still running—it takes hours for the thing to run).

[1] /boston/2012/01/17.2

Gemini Mention this post

Contact the author