Four lines of code … it was four lines of code

It was bugging me [1].

I checked last night, and found the following:

Table: CPU utilization after a week
 	CPU	requests
------------------------------
gopher	28:37	8360
gemini	6:02	24345

How can a TCP (Transmission Control Protocol) service take more time than a TLS (Transport Layer Security) service? Especially when TLS runs over TCP? And TLS generates six times the packets than TCP?

There's some difference in how I handle TCP and TLS that's causing this. Profiling the code didn't reveal much as my attempts kept showing that gopher [2] should be using less CPU than Gemini [3], so then it was just a matter of staring at code and thinking hard.

It wasn't easy. Yes, there are some differences between the TCP interface [4] and the TLS interface [5], so it was a matter of going through the code, bringing them closer in implementation and seeing if that fixed things. I would fix the code, run a test, see it not make a difference, and try again. Eventually, I came across this bit of code [6]:

-- ---------------------------------------------------------------------
-- It might be a bug *somewhere*, but on Linux, this is required to get
-- Unix domain sockets to work with the NFL driver.  There's a race
-- condition where writting data then calling close() may cause the
-- other side to receive no data.  This does NOT appoear to happen with
-- TCP sockets, but this doesn't hurt the TCP side in any case.
-- ---------------------------------------------------------------------
    
while self.__socket.sendqueue and ios.__socket.sendqueue > 0 do
  nfl.SOCKETS:update(self.__socket,'w')
  coroutine.yield()
end

This bit of code wasn't in the TLS implementation, and as the comment says, it “shouldn't” hurt the TCP path, but hey, it was only needed for local (or Unix) domain sockets to begin with, so let me try removing that bit of code and run a test.

Table: Test before removing the code, 275 requests; time in seconds
 	CPU	real	user	sys
------------------------------
gopher	18	46.726	13.971	4.059
gemini	4	107.928	3.951	0.322

Table: Test after removing the code, 275 requests; time in seconds
 	CPU	real 	user	sys
------------------------------
gopher	0	52.403	0.635	0.185
gemini	4	103.290	3.957	0.285

*THAT'S** more like it!

Now, about that code … this is my best guess as to what was going on.

The sendqueue field is a Linux-only field, so I'm checking to see if it exists, and then checking the value (if only to avoid the “attempt to compare number with nil” Lua error on non-Linux systems). This returns the number of bytes the kernel has yet to send, so we switch the socket to trigger an event when we can write to it, and yield. Even worse, the check isn't just a simple variable fetch, but ends up being an ioctl() system call. Twice!

Now the “write” event code:

if event.write then
  if #ios.__output > 0 then
    local bytes,err = ios.__socket:send(nil,ios.__output)
    if err == 0 then
      ios.__wbytes = ios.__wbytes + bytes
      ios.__output = ios.__output:sub(bytes + 1,-1)
    else
      syslog('error',"socket:send() = %s",errno[err])
      nfl.SOCKETS:remove(ios.__socket)
      nfl.schedule(ios.__co,false,errno[err],err)
    end
  end
  
  if #ios.__output == 0 then
    nfl.SOCKETS:update(ios.__socket,'r')
    nfl.schedule(ios.__co,true)
  end
end

There's probably no data to be sent (ios.__output is empty) so we immediately switch the socket back to trigger for readability, then resume the coroutine handling the socket. It goes back to the while loop, setting the socket back to trigger on writability, and kept bouncing like that until the kernel buffer had been sent.

As I wrote, this fixed an issue I had with local domain sockets—it wasn't an issue with an IP (Internet Protocol) socket, and in fact, was the wrong thing to do for an IP socket.

And as it is with these types of bugs, finding the root cause is not trivial, but the fix almost always is.

[1] /boston/2024/06/23.2

[2] https://en.wikipedia.org/wiki/Gopher_(protocol)

[3] https://geminiprotocol.net/

[4] https://github.com/spc476/lua-conmanorg/blob/628992e0a63d062aeabfeb90fc6f524df4f6fd17/lua/nfl/tcp.lua

[5] https://github.com/spc476/lua-conmanorg/blob/628992e0a63d062aeabfeb90fc6f524df4f6fd17/lua/nfl/tls.lua

[6] https://github.com/spc476/lua-conmanorg/blob/628992e0a63d062aeabfeb90fc6f524df4f6fd17/lua/nfl/tcp.lua#L72

Discussions about this entry

Four lines of code … it was four lines of code | Lobsters

Four lines of code it was four lines of code | Hacker News

Four lines of code it was four lines of code - ZeroBytes

Gemini Mention this post

Contact the author