Yet more adventures in profiling

While I'm now satisfied with the memory usaage [1], I'm starting to watching the CPU (Central Procesing Unit) utilization and noticed with some dismay that it's quite high (even with the new HTML (HyperText Markup Language) parser being faster overall). I started an instance a day ago (the 27^TH), and it already has accumulated 27 minutes, 35 seconds of CPU time. As a contrast, the web server has only accumulated 37 seconds of CPU time since the 25^TH.

That's a large difference.

The server in question is written in Lua. I have another server written in Lua, and it has only accumulated 1 minute, 26 seconds since the 25^TH.

There are two differences that might account for the discrepancy:

But to be sure, I need a controlled experiment. Since both servers basically do the same thing (mainly, serve up this blog via gopher [2] and Gemini [3], and convert the HTML to text formats, thus the need for an HTML parser) it was easy enough generate a list of comparable requests for both, and profile the execution.

Unfortunately, profiling across shared objects doesn't necessarily work all that well (at least on Linux). I recompiled both Lua and all the Lua modules I use (at least the ones written in C) but it only showed the time spent in the main Lua VM (Virtual Machine) and nowhere else.

I then spent the time constructing a self-contained executable (containing Lua, plus all the modules comprising the application) of port70 [4] (the gopher server) and another one for GLV-1.12556 [5] (the Gemini server). Pretty easy to do, if a bit tedious in tracking down all the modules to include in the executables. I didn't bother with any optimizations for these runs, as I'm trying to get a feel for where the time is spent.

I profiled each executable, making the same 1,171 requests (well, “same” meaning “requesting the same content”) to each program.

First, port70, the gopher server, straight TCP (Transmission Control Protocol) connection. It accumulated 14 seconds of CPU time with the profile run, and the results:

Table: Each sample counts as 0.01 seconds.
% time	cumulative seconds	self seconds	calls	self ms/call	total ms/call	name
------------------------------
18.06	0.56	0.56	33881	0.00	0.00	luaV_execute
17.74	1.11	0.55	986744	0.00	0.00	match
4.03	1.24	0.13	28481743	0.00	0.00	lua_gettop
3.55	1.35	0.11	22087107	0.00	0.00	index2value
2.58	1.43	0.08	11321831	0.00	0.00	yymatchChar
2.26	1.50	0.07	6478653	0.00	0.00	touserdata
2.26	1.57	0.07	2063343	0.00	0.00	pushcapture
1.94	1.63	0.06	2074113	0.00	0.00	lua_getmetatable
1.94	1.69	0.06	2068487	0.00	0.00	auxgetstr
1.61	1.74	0.05	2222138	0.00	0.00	luaS_new
1.29	1.78	0.04	5469355	0.00	0.00	luaV_equalobj
1.29	1.82	0.04	5239401	0.00	0.00	luaH_getshortstr
1.29	1.86	0.04	2042852	0.00	0.00	luaL_checkudata
1.29	1.90	0.04	1207086	0.00	0.00	lua_tolstring
1.29	1.94	0.04	1070855	0.00	0.00	luaT_gettmbyobj
1.29	1.98	0.04	175585	0.00	0.00	internshrstr

Nothing terribly surprising there. The function luaV_execute() is not surprising, as that's the main driver for the Lua VM. match() is from LPEG (Lua Parsing Expression grammar) [6], which is used for all parsing aside from HTML. The function yymatchChar() is from the HTML parser I wrote, so again, no terrible surprise there.

Now, GLV-1.12556, the Gemini server, using TLS. This accumulated 1 minute, 24 seconds of CPU time with the profile run. The results:

Table: Each sample counts as 0.01 seconds.
% time	cumulative seconds	self seconds	calls	self ms/call	total ms/call	name
------------------------------
8.06	0.10	0.10	30070	0.00	0.01	luaV_execute
7.26	0.19	0.09	1494750	0.00	0.00	luaH_getshortstr
5.65	0.26	0.07	11943	0.01	0.01	match
4.03	0.31	0.05	535091	0.00	0.00	luaD_precall
4.03	0.36	0.05	502074	0.00	0.00	moveresults
3.23	0.40	0.04	129596	0.00	0.00	luaS_hash
2.42	0.43	0.03	11321831	0.00	0.00	yymatchChar
2.42	0.46	0.03	4218262	0.00	0.00	yyText
2.42	0.49	0.03	3293376	0.00	0.00	yymatchString
2.42	0.52	0.03	1508070	0.00	0.00	yyrefill
2.42	0.55	0.03	377362	0.00	0.00	luaH_newkey
1.61	0.57	0.02	2898350	0.00	0.00	index2value
1.61	0.59	0.02	2531258	0.00	0.00	lua_gettop
1.61	0.61	0.02	1081241	0.00	0.00	yy_CHAR
1.61	0.63	0.02	230982	0.00	0.00	luaV_equalobj
1.61	0.65	0.02	174295	0.00	0.00	luaV_finishget
1.61	0.67	0.02	136553	0.00	0.00	luaT_gettmbyobj
1.61	0.69	0.02	129534	0.00	0.00	internshrstr
1.61	0.71	0.02	10363	0.00	0.00	traversestrongtable
1.61	0.73	0.02	4684	0.00	0.00	lua_resume

It's satisfying seeing the same number of calls to yymatchChar(), but the times are smaller overall and there's less of a spike with luaV_execute(), leading me to believe the time is actually being spent in TLS. That isn't showing up because I haven't compiled the TLS library to be profiled, and it's dynamically linked in anyway. I'm fairly confident that TLS is sucking the CPU time and it's not necessarily my code. I'm aprehensive about attemping to recompile the TLS library with profiling in mind but it is the next logical step if I want to know for sure.

Sigh.

[1] /boston/2020/12/19.1

[2] https://en.wikipedia.org/wiki/Gopher_(protocol)

[3] https://gemini.circumlunar.space/

[4] https://github.com/spc476/port70

[5] https://github.com/spc476/GLV-1.12556

[6] http://www.inf.puc-rio.br/~roberto/lpeg/

Gemini Mention this post

Contact the author