It was bugging me [1].
I checked last night, and found the following:
Table: CPU utilization after a week CPU requests ------------------------------ gopher 28:37 8360 gemini 6:02 24345
How can a TCP (Transmission Control Protocol) service take more time than a TLS (Transport Layer Security) service? Especially when TLS runs over TCP? And TLS generates six times the packets than TCP?
There's some difference in how I handle TCP and TLS that's causing this. Profiling the code didn't reveal much as my attempts kept showing that gopher [2] should be using less CPU than Gemini [3], so then it was just a matter of staring at code and thinking hard.
It wasn't easy. Yes, there are some differences between the TCP interface [4] and the TLS interface [5], so it was a matter of going through the code, bringing them closer in implementation and seeing if that fixed things. I would fix the code, run a test, see it not make a difference, and try again. Eventually, I came across this bit of code [6]:
-- --------------------------------------------------------------------- -- It might be a bug *somewhere*, but on Linux, this is required to get -- Unix domain sockets to work with the NFL driver. There's a race -- condition where writting data then calling close() may cause the -- other side to receive no data. This does NOT appoear to happen with -- TCP sockets, but this doesn't hurt the TCP side in any case. -- --------------------------------------------------------------------- while self.__socket.sendqueue and ios.__socket.sendqueue > 0 do nfl.SOCKETS:update(self.__socket,'w') coroutine.yield() end
This bit of code wasn't in the TLS implementation, and as the comment says, it “shouldn't” hurt the TCP path, but hey, it was only needed for local (or Unix) domain sockets to begin with, so let me try removing that bit of code and run a test.
Table: Test before removing the code, 275 requests; time in seconds CPU real user sys ------------------------------ gopher 18 46.726 13.971 4.059 gemini 4 107.928 3.951 0.322
Table: Test after removing the code, 275 requests; time in seconds CPU real user sys ------------------------------ gopher 0 52.403 0.635 0.185 gemini 4 103.290 3.957 0.285
Now, about that code … this is my best guess as to what was going on.
The sendqueue field is a Linux-only field, so I'm checking to see if it exists, and then checking the value (if only to avoid the “attempt to compare number with nil” Lua error on non-Linux systems). This returns the number of bytes the kernel has yet to send, so we switch the socket to trigger an event when we can write to it, and yield. Even worse, the check isn't just a simple variable fetch, but ends up being an ioctl() system call. Twice!
Now the “write” event code:
if event.write then if #ios.__output > 0 then local bytes,err = ios.__socket:send(nil,ios.__output) if err == 0 then ios.__wbytes = ios.__wbytes + bytes ios.__output = ios.__output:sub(bytes + 1,-1) else syslog('error',"socket:send() = %s",errno[err]) nfl.SOCKETS:remove(ios.__socket) nfl.schedule(ios.__co,false,errno[err],err) end end if #ios.__output == 0 then nfl.SOCKETS:update(ios.__socket,'r') nfl.schedule(ios.__co,true) end end
There's probably no data to be sent (ios.__output is empty) so we immediately switch the socket back to trigger for readability, then resume the coroutine handling the socket. It goes back to the while loop, setting the socket back to trigger on writability, and kept bouncing like that until the kernel buffer had been sent.
As I wrote, this fixed an issue I had with local domain sockets—it wasn't an issue with an IP (Internet Protocol) socket, and in fact, was the wrong thing to do for an IP socket.
And as it is with these types of bugs, finding the root cause is not trivial, but the fix almost always is.
[2] https://en.wikipedia.org/wiki/Gopher_(protocol)
[3] https://geminiprotocol.net/
Four lines of code … it was four lines of code | Lobsters
Four lines of code it was four lines of code | Hacker News