I started “Project: Sippy-Cup [1]” six years ago as a “proof-of-concept” and it ended up in production (without my knowledge and to my dismay at the time) about a year or two afterwards. The code was written to be correct, not fast. And for the past four or five years its been running, performance has never been a real issue. But that's changing, as the projected traffic levels shoot past the “oh my” and into the “oh my God” territory.
“Project: Sippy-Cup” processes a lot of SIP (Session Initiation Protocol) messages, which are text based, so there's a lot of text processing. I use LPEG [2] for ease of writing parsers, but it's not necessarily as fast as it could be.
There are two issues with LPEG—it has infinite look-ahead, and ordered choices. So the code that checks for the SIP method:
method = lpeg.P"ACK" + lpeg.P"BYE" + lpeg.P"CANCEL" + lpeg.P"INFO" + lpeg.P"INVITE" + lpeg.P"MESSAGE" + lpeg.P"NOTIFY" + lpeg.P"OPTIONS" + lpeg.P"PRACK" + lpeg.P"PUBLISH" + lpeg.P"REFER" + lpeg.P"REGISTER" + lpeg.P"SUBSCRIBE" + lpeg.P"UPDATE" + (lpeg.R("AZ","az","09") + lpeg.S"-.!%*_+`'~")^1
will first compare the input to “ACK”; if it doesn't match, it then backtracks and tries comparing the input to “BYE”, and so on down the list until it gets the last rule which is a “catch-all” rule. It would be easy to reorder the list so that the checks are “most-likely” to “least-likely,” but really the entire list could be removed leaving just the catch-all:
method = (lpeg.R("AZ","az","09") + lpeg.S"-.!%*_+`'~")^1
I have the same issue with SIP headers—there are 100 headers that are “parsed” (for various values of “parsed”) but I only really look at a dozen headers—the rest just slow things down and can be passed by a generic parsing rule. The full headers were added during the “proof-of-concept” stage since I wasn't sure at the time which headers would be critical and which ones wouldn't, and I've never gone back and cleaned up the code.
Another aspect is the sheer number of validity checks the code makes on the incoming SIP message. Many of the checks don't really have any effect on the processing due to managerial mandate at the time, so they could go (I wanted strict checking that bailed on any error; my manager at the time did not want such strictness—no need to guess who won, but I still track parsing irregularities).
So while I feel these are two areas where the code could be made faster, I don't know if that's where the time is spent, and so it's time to profile the code.
The issue now is that the system profiler will profile the code as C, not as Lua. I don't need to profile the code to know the Lua VM (Virtual Machine) gets called all the time. What I need to know is what Lua code is called all the time. But it can't hurt to try the system profiler, right? And given that the regression test has over 12,000 test cases, we should get some good profiling information, right?
Table: Original Profile—Each sample counts as 0.01 seconds. % time cumulative seconds self seconds calls self ms/call total ms/call name ------------------------------ 13.32 3.47 3.47 match 12.74 6.79 3.32 luaV_execute 9.31 9.22 2.43 luaS_newlstr 6.83 11.00 1.78 luaD_precall 5.31 12.38 1.39 luaH_getstr 3.38 13.26 0.88 luaD_poscall 2.57 13.93 0.67 index2adr 2.19 14.50 0.57 luaV_gettable
Not bad at all. The function match() is the LPEG execution engine, which matches my intial thoughts on the code. It wasn't much work to remove extraneous SIP headers I don't bother with, and to simplify the method parsing (see above). Re-profile the code and:
Table: Modified LPEG—Each sample counts as 0.01 seconds. % time cumulative seconds self seconds calls self ms/call total ms/call name ------------------------------ 14.25 3.67 3.67 luaV_execute 11.22 6.56 2.89 luaS_newlstr 10.49 9.26 2.70 match 6.33 10.89 1.63 luaD_precall 5.20 12.23 1.34 luaH_getstr 2.76 12.94 0.71 index2adr 2.58 13.61 0.67 luaD_poscall 2.41 14.23 0.62 luaV_gettable
match() drops from first to third place, so that's good. And a load test done by the QA (Quality Assurance) engineer showed an easy 25% increase is message processing.
But that's really as far as I can go with profiling. I did a run where I removed most of the validation checks (but only after I saw none of them were triggered over the past 30 days) and didn't see much of a speed improvement. So I really need to profile the Lua code as Lua code and not as C.
That's going to take some work.