I wonder how the unit test cultists would deal with the testing I do

I thought that as long as I'm going to such lengths to get “push-button testing” implemented, I might as well mention some of the techniques I've used just in the off chance that it might help someone out there. The techniques I use are probably only relevant to the stuff I work on and it may not apply elsewhere, but it certainly can't hurt to mention it.

So I don't have unit tests (whatever they are) per se, but I do have what is referred to as a “regression test,” which tests “Project: Sippy-Cup [1],” “Project: Lumbergh [2]” and “Project: Cleese [3].” The reason is that taken individually, each of those projects can be considered a “unit,” but to, say, test “Project: Lumbergh” alone would require something to act like “Project: Sippy-Cup” (which feed requests into “Project: Lumbergh”) and “Project: Cleese” (which is notified by “Project: Lumbergh” in some circumstances [4]), so why not run those as well? “Project: Lumbergh” also talks to two different DNS (Domain Name System) servers for various information about a phone number, so when running it, I need something to respond back. I also need an endpoint for “Project: Cleese” to talk to, so what's one more process? Oh, “Project: Lumbergh” will also talk to cell phones, or at least expect a cell phone to request data in some circumstances, so I have a “simulated cell phone” running as well.

Each test case is now a separate file, which describes how to set up the data for the test (the two phone numbers, what names, what feature we're testing, etc) as well as what the expected results are (we get a name, or the reputation, or a different phone number, depending upon what's being tested). This way, we can have the regression test run one test, some of the tests, or “all the things!” The regression test will read in all the test cases and generate all the data required to run them. It then will start the seven programs with configurations generated on the fly, and start feeding SIP (Session Initiation Protocol) messages into the maelstrom, recording what goes on and checking the results as each test runs. And when a test fails, the test case information is recorded in an output file for later analysis.

So far, nothing out of the ordinary. That's pretty much how the previous regression test worked, except it generated all 15,852 test cases. But it's how I test some of the wierder border cases that I want to talk about.

First up—ensuring something that's not supposed to happen didn't happen [5]. In some circumstances, “Project: Lumbergh” will notify “Project: Cleese,” and I have to make sure it happens when it's supposed to, and not when it's not supposed to. I've already mentioned part of my solution [6] to this, but the updated version of that is: the regression test has a side channel to the fake endpoint that “Project: Cleese” talks to. Before each test, the regression test will send that component the test data and whether it should expect a request or not. The fake endpoint will simply record this data for later use. If a request is made for that particular test case, it will also be noted for later. Once the regression test has finished running all the tests and waited a few seconds for any last second requests to clear, it “runs” one more test—it queries the fake endpoint for a count of requests it actually received and compares it to the number the regression test thinks should have happened (and output success or failure). Upon termination of the regression test (as everything is being shut down), the fake endpoint will then go through its list of tests it received from the regression test, and record any discrpancies (a query that was supposed to happen didn't, or a query that wasn't supposed to happen, did). This is recorded in another file for later analysis (which is why I send over all the data to the fake endpoint—just to make it easier to see the conditions of the test in one place).

Second—“Project: Lumbergh” talking to multiple DNS servers. It will generally send out both requests at once, given the rather demanding timing constraints on us, so we have to support reply A coming before reply B, reply B coming before reply A, reply A timing out but getting reply B, and reply B tming out but getting reply A. How to test for those nightmare scenarios automatically? Oh, “Project: Lumbergh” also maintains a continuous “heartbeat” to these services, and if those replies don't get though, the servers will be taken out of rotation by “Project: Lumbergh” and once the last one is gone, “Project: Lumbergh” effectively shuts down. The nightmare just got worse.

Well, again, I have written my own fake endpoints for these services (not terribly hard as the data is fixed, and it's not like I'm going for speed here). And again, I added a side channel for the regression test to communicate to the fake endpoints. After starting up the these fake endpoints, the regression test informs the endpoints what entry is considered the “heartbeat” so no delay what so ever will ever be applied to that query. Then before any test is run, the regression test will inform the endpoints how long to delay the response—all the way from “no delay” to “don't even respond” (except for the “heartbeat”—that will always happen) as it's part of the testing data.

Yes, it all works. Yes, it's a pain to write. Yes, it's a bunch of code to test. No, I don't have XXXXX­XX unit tests or regression tests for the regression test—I'm only willing to go so far.

I just hope we don't have to implement 100% test code coverage, because I'm not looking forward to forcing system calls to fail.

[1] /boston/2014/03/05.1

[2] /boston/2018/09/11.2

[3] /boston/2018/09/11.2

[4] /boston/2021/07/07.1

[5] /boston/2021/06/07.3

[6] /boston/2021/06/02.1

Gemini Mention this post

Contact the author