We're still IOT (Interoperability Testing) testing [1] (yes, five months, but we're doing IOT testing with The Monopolistic “We Don't Have To Care, We're The” Phone Company where things move slower than molasses in Antarctica) and I'm finding it interesting (in the Chinese sense [2] of the term) to see “actual real world data” rather than the “pristine theoretically correct data” we've been using for testing (because in theory, there is no difference between theory and practice, but in practice …).
For instance, I've been reeiving reports that “Project: Sippy Cup [3]” (which implements a subsection of SIP (Session Initiation Protocol) [4]) was only processing less than 5% of the OPTIONS requests (which informs the other side if we're ready for processing, but also informs the other side what other type of requests we support). Odd, as our internal testing was handling every OPTIONS request we sent.
So I added some addtional logging:
and I get back:
>
```
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::9]:5060 datasize=581 sentbytes=581 err=Success
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::8]:5060 datasize=581 sentbytes=581 err=Success
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::9]:5060
Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::9]:5060 datasize=581 sentbytes=581 err=Success
Sippy-Cup9900: received from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9901: OPTIONS from ip6:[fc00:0:60:8a1::8]:5060
Sippy-Cup9902: send OPTIONS response to ip6:[fc00:0:60:8a1::8]:5060 datasize=581 sentbytes=581 err=Success
```
At first, I thought perhaps the output sent was filtered to just those three messages and some other messages, like, “Sippy-Cup0014: ip6:[fc00:0:60:8a1::9]:5060 sent garbage” or “Sippy-Cup0055: ip6:[fc00:0:60:8a1::9]:5060 sent an unknown call” were filtered out.
Nope. What I saw is what got logged.
Fortunately, the actual network packets had been captured so I could look at them. And I immediately saw the problem.
“Project: Sippy-Cup” is written in Lua [5], since it's easier to parse text in Lua than in C (it's easy to parse text with LPeg [6]), and each request is handled in a Lua coroutine [7]—it makes following the code a lot easier since it can be written in a straight-forward procedural way rather than with callbacks or complex state machines and what not. The only trick here is that for any operation that might block, the code has to manually insert a call to coroutine.yield() to let another coroutine run (if there are any to run).
In SIP (Session Initiation Protocol), all the messages pertaining to a phone call each have a unique identifier, and it's this identifier I use to know which coroutine to run when a message comes in (or spwan a new coroutine if this is a new call). The OPTIONS request isn't really part of a normal call, but I had assumed, like other SIP requests, that they too, would have a unique identifier.
It turns out that no, The Monopolistic “We Don't Have To Care, We're The” Phone Company is gladly using the same “unique” identifier for each OPTIONS request, mostly … occasionally they'll generate a new “unique” identifier just to keep us on our toes.
Or something like that.
Okay, that did reveal the actual bug though—I wasn't releasing the reference to the coroutine when processing an OPTIONS request. Had I done that, it wouldn't have mattered, since I wouldn't have a reference to a coroutine with that particular “unique” identifier, and a new coroutine would have been created to handle the OPTIONS request.
Which goes to show you that not only is input weirder than expected, but even in a language with garbage collection [8], you can still leak resources.
[2] http://en.wikipedia.org/wiki/May_you_live_in_interesting_times
[4] http://en.wikipedia.org/wiki/Session_Initiation_Protocol
[6] http://www.inf.puc-rio.br/~roberto/lpeg/
[7] http://www.lua.org/pil/9.html
[8] http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)