Some perils of handing signals in Lua

This is one of those “Oh, yeah, I didn't think that through, did I?” type of bug.

I wrote a signal module [1] for Lua [2], which can handle both ANSI C [3] and POSIX signals [4] with largly the same API (the POSIX [5] implementation one has some additional functions defined).

Handling signals in Lua is not that straightforward because of the nature of signals—you are effectively writting multithreaded code [6]. You just can't call back into Lua from the signal handler (while the Lua VM (Virtual Machine) has no static data and each Lua state is isolated unto itself, two threads sharing a Lua state can lead to problemss). The only Lua function you can safely call is lua_sethook() [7], which can be used to stop the Lua VM at the next VM instruction (it's typically used for debugging and signal handing [8]). This callback can then call back into Lua [9]. It is a bit convoluted (the signal handler will call lua_sethook() and return; the Lua VM will resume and then call the hook), but it does allow you to write signal handlers in Lua:

>
```
signal.catch('windowchange',function()
print("Wheeee! Our terminal just resized!")
end)
```

and not have it blow up on you.

So, with that in mind, I give you this code:

>
```
local net = require "org.conman.net"
local clock = require "org.conman.clock"
local signal = require "org.conman.signal"
local raddr = net.address("127.0.0.1",udp,'echo')
local sock = net.socket(raddr.family,'udp')
signal.catch('alarm',function()
sock:send(raddr,tostring(clock.get()))
end)
clock.itimer(1)
local previous = clock.get()
while true do
local _,data = sock:recv()
local now = clock.get()
if data then
local zen = tonumber(data)
print(string.format("%.7f\t%.7f",now - zen,now - previous))
previous = now
end
end
```

This is a UDP (User Datagram Protocol) echo client program. signal.catch() handles the alarm signal (SIGALRM) by sending a packet of data (which is just the current time) to the echo server. clock.itimer() informs the kernel to send the alarm signal once a second. So once a second, our program receives the alarm signal and sends the current time. Then, in an infinite loop, we just wait for packets to arrive (which should be the packets we sent to the echo server—they're “echoed” back to us) and we calculate how long the packet took round trip and how long it was from the previous packet. The output looks like:

>
```
0.0002971 1.0014961
0.0003922 0.9999950
0.0002851 0.9998930
0.0003171 1.0000319
0.0003910 0.9999740
0.0002551 0.9998641
0.0003359 1.0000808
```

The first column is the round trip time (in seconds) for the packet (around 3 to 4 ten thousandths of a second), and the second column is how long (in seconds) from the previous packet (a second, give or take a few ten thousandths).

But our call to sock:recv() is interrupted by the alarm signal. Unfortunately, one side effect of signals is that they will interrupt “long running” system calls, which is almost always system calls dealing with I/O, such as read() or write(). When such a call is interrupted, the system call will return an error of EINTR. We can see this if we change the code a bit:

>
```
local net = require "org.conman.net"
local clock = require "org.conman.clock"
local signal = require "org.conman.signal"
local errno = require "org.conman.errno"
local raddr = net.address("192.168.90.118",'udp',22222)
local sock = net.socket(raddr.family,'udp')
signal.default('int')
signal.catch('alarm',function()
sock:send(raddr,tostring(clock.get()))
end)
clock.itimer(1)
local previous = clock.get()
while true do
local _,data,err = sock:recv()
local now = clock.get()
if data then
local zen = tonumber(data)
print(string.format("%.7f\t%.7f",now - zen,now - previous))
previous = now
else
print(">>>",errno[err])
end
end
```

and when we run it:

>
```
>>> Interrupted system call
0.0003049 1.0015509
>>> Interrupted system call
0.0002320 0.9998269
>>> Interrupted system call
0.0002131 0.9999812
>>> Interrupted system call
0.0001860 0.9999728
>>> Interrupted system call
0.0002639 0.9999781
```

With POSIX, you can specify that for a given signal, system calls are to be automatically restarted so you can dispense with EINTR error handling.

And here's were we finally get to the “Oh, yeah, I didn't think that through, did I?” type of bug.

Not wanting the code to be interrupted by the alarm signal, I changed the call to signal.catch() so it would restart any system calls:

>
```
signal.catch('alarm',function()
sock:send(raddr,tostring(clock.get()))
end,'restart')
```

When I ran the code, I got nothing! There was simply no ouput happening. It caught me by surprise and it took me several minutes to figure out what was happening (or rather, what wasn't happening):

And thus we get to the punchline: the Lua VM doesn't resume because we're still in a system call! And thus, the signal handler written in Lua is never called, which doesn't send a packet, because we're stuck in our system call (recvfrom()) waiting for some data that will never arrive.

D'oh!

If the above code were written in C, there would be no issue; clock_gettime() and sendto() (the system calls underlying the Lua functions clock.get() and sock:send() respectively) are safe to call from a signal handler [10]. I may not have been able to safely convert the time to text (since snprintf()—the only standard C function able to convert numbers to text, isn't documented as being safe to call in a signal handler) but sending the raw binary values would be okay in that case.

But this isn't C, it's Lua. And what we have here is a type of leaky abstraction [11]. That 20/20 hindsight is such a bastard.

[1] https://github.com/spc476/lua-conmanorg/blob/master/src/signal.c

[2] http://www.lua.org/

[3] https://github.com/spc476/lua-conmanorg/blob/163cc0a9659c68c9a730c72c5d9d30492ea93b16/src/signal.c#L70

[4] https://github.com/spc476/lua-conmanorg/blob/163cc0a9659c68c9a730c72c5d9d30492ea93b16/src/signal.c#L486

[5] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04

[6] /boston/2007/10/18.1

[7] http://www.lua.org/manual/5.3/manual.html#lua_sethook

[8] http://www.lua.org/source/5.3/lua.c.html#laction

[9] http://www.lua.org/source/5.3/lua.c.html#lstop

[10] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03

[11] http://www.joelonsoftware.com/articles/LeakyAbstractions.html

Gemini Mention this post

Contact the author