I have two services written in Lua [1]—a gopher server [2] and a Gemini server [3]. The both roughly serve the same data (mainly my blog [4]) and yet, the gopher server accumulates more CPU (Central Processing Unit) time than the Gemini server, despite that the Gemini server uses TLS (Transport Layer Security) and serves more requests. And not by a little bit either:
Table: CPU utilization gopher 17:26 Gemini 0:45
So I started investigating the issue. It wasn't TCP (Transmission Control Protocol)_NODELAY [5] (via Lobsters [6]) as latency wasn't the issue (but I disabled Nagle's algorithm anyway).
Looking further into the issue, it seemed to be one of buffering. the code was not buffering any data with TCP; furthermore, the code was issuing tons of small writes. My thinking here was—Of course! The TCP code was making tons of system calls, whereas the TLS code (thanks to the library I'm using [7]) must be doing buffering for me.
So I added buffering to the gopher server, and now, after about 12 hours (where I restarted both servers) I have:
Table: new CPU utilization gopher 2:25 Gemini 2:13
I … I don't know what to make of this. Obviously, things have improved for gopher, but did I somehow make Gemini worse? (I did change some low level code [8] that both TCP and TLS use; I use full buffering for TCP, no buffering for TLS). Is the load more evenly spread?
It's clear that gopher is still accumulating more CPU time, just not as bad as it was. Perhaps more buffering is required? I'll leave this for a few days and see what happens.
[2] https://github.com/spc476/port70
[3] https://github.com/spc476/GLV-1.12556
[4] https://boston.conman.org/
[5] https://brooker.co.za/blog/2024/05/09/nagle.html
[6] https://lobste.rs/s/kocje7/it_s_always_tcp_nodelay_every_damn_time