At work, I test the various components of “Project: Wolowizard [1].” These tests usually require running multiple copies of a program on a single computer. I use Lua [2] (with help from a module [3]) to start and monitor the programs being tested. The code starts N copies, and if any of the programs crash, the reason is logged. It's fairly straight forward code.
Now, one of the compents of “Project: Wolowizard” was updated to support a new project (“Project: Sippy-Cup”) and that component is occasionally crashing on an assert [4], but the problem is: there are no core files to check.
And I've spent the past two days trying to figure out why there are no core files to check.
The first culprit—have we told the system not to generate core files? Yup. The account under which the program runs (root) has a core file size limit of zero bytes. There are a few ways to fix this, and I picked what to me, was the simplest solution: in the Lua script that runs the programs, set the core file size to “unlimited.” And this is easy enough to do:
>
```
proc = require "org.conman.process"
proc.limits.hard.core = "inf"
proc.limits.soft.core = "inf"
```
Slight digression: you can set various resource limits for things like maximum memory usage to core file size. The hard limit normally can't be changed, but the soft limit can—any process an lower a limit. But a process running as root can raise a limit, and raise the hard limit. Since the program I'm running is running as root, setting both the hard and soft limits to “infinity” is easy.
But there was still a disturbing lack of core files.
I checked the code of the Lua module I was using, and yes, I flubbed the parsing code. I made the fix, my tests showed I got the logic right, installed the updated module and still, no core files.
I did a bunch more tests and checked off the following reasons for the lack of core files: it wasn't because the program dropped permissions; it wasn't because the program couldn't write the core file in its current working directory; and the program is not setuid [5]. It was clear there was something wrong the module.
I was able to isolate the issue to the following:
>
```
struct rlimit limit;
lua_Number ival;
/* ... */
if (lua_isnumber(L,3))
ival = lua_tonumber(L,3);
/* ... */
if (ival >= RLIM_INFINITY)
ival = RLIM_INFINITY;
limit.rlim_cur = ival;
```
Now, lua_Number is of type double (a floating point value), and imit.rlim_cur is some form of integer. ival was properly HUGE_VAL (the C floating point equivalent of “infinity”) but limit.rlim_cur was 0.
But it worked on my home system just fine.
Then it dawned on me—my home system was a 32-bit system! That was the system I did the patch and initial test; the systems at work are all 64-bit systems. Some digging revealed that the definition of RLIM_INFINITY on the 64-bit system was
>
```
((unsigned long int)(~0UL))
```
or in other words: the largest unsigned long integer value. And on a 64-bit system, an unsigned long integer is 64-bits in size.
I do believe I was bit by an IEEE (Institute of Electrical and Electronics Engineers) 754 floating point implementation detail.
Lua treats all numbers as type double, and on modern systems, that means IEEE 754 floating point [6]. A double can store 53-bit integers without loss [7], and on a 32-bit system, you can pass integer values into and out of a double without issue (32 being less than 53) and because I did my initial testing of the Lua module on a 32-bit system, there was no issue.
But on a 64-bit system … it gets interesting. Doing some empirical testing, I found the largest integer value you can store into a double and get something out is 18,446,744,073,709,550,591 (and what you get out is 18,446,744,073,709,549,568—I'll leave the reason for the discrepancy for the reader); anything larger, you get zero back out.
So, no wonder I wasn't getting any core files! I was inadvertantly setting the core file size to zero bytes!
Sigh.
Off to fix the code …
[3] https://github.com/spc476/lua-conmanorg/blob/master/src/process.c
[4] http://en.wikipedia.org/wiki/Assertion_(software_development)
[5] http://en.wikipedia.org/wiki/Setuid
[6] http://en.wikipedia.org/wiki/IEEE_floating_point
[7] http://en.wikipedia.org/wiki/Double-precision_floating-point_format