Perhaps an 80M script is a bit excessive …

Every so often I'll do a bit of work on an unimportant project, just to keep myself sane from working in PHP and Drupal [1].

About a month ago I decided to save the data from email indexer program [2] as a Lua program, something like:

>
```
emails =
{
filelist =
{
{
file = "/home/spc/Mail/sent" ,
size = 902273,
time = "Tue, 10 Nov 2009 09:35:10 GMT",
},
{
file = "/home/spc/LINUS/Archive.mail/20060607/cctalk",
size = 230140,
time = "Tue, 02 May 2006 18:22:28 GMT",
},
-- and so on ...
},
mbox =
{
{
info = { mboxfile = 1, oh={45, 322}, ob={368, 15}},
['Message-ID'] = "<20081021051331.GA30804@lucy.localdomain>",
['From'] = { "Sean Conner <sean@conman.org>",},
['To'] = { "sean@conman.org",},
['Subject'] = "This is a test",
['Date'] = "Tue, 21 Oct 2008 01:13:31 -0400",
['MIME-Version'] = "1.0",
['Content-Type'] =
{
"text/plain",
"charset=us-ascii",
},
mimeheaders =
{
['Content-Disposition'] = "inline",
},
['Lines'] = 1,
extraheaders =
{
['User-Agent'] = "Mutt/1.4.1i",
['Status'] = "RO",
},
},
-- and so on ...
}
}
```

That way, I could load it into the Lua interpreter and work with the data in Lua, instead of writing a bunch of C code. I debugged the output to make sure it was valid Lua and everything was fine.

Until I threw 80,919 messages from 2,360 email files I had lying around (going back to 1991). Then all I got from Lua was:

>
```
lua: constant table overflow
```

Hmmm … okay, maybe throwing a 80MB (Megabyte) into the Lua interpreter wasn't such a good idea.

But then tonight I decided to give it one more try. The source code to Lua didn't reveal any immediate settings to tweak, so I did a bit of searching. And yes, I'm not the only one with that problem [3]. Reading further, I learned that while there isn't a limit to the size a Lua table can get, there is a limit to the number of constants in a single Lua function [4].

But the code isn't in a Lua function.

Or is it?

It is. When you load Lua code from an external source, it gets compiled into an anonymous function that needs to be run. So, the solution is to break the initialization into several functions, and from some experimenting, I found that things would work (with this particular data set) if I only initialized 16,384 items per function.

But there's a difference between “it worked” and “this is a usable solution.”

Generating the Lua code? 30 seconds.

Loading the Lua code into the interpreter? Six minutes and an overheated CPU (Central Processing Unit)

Interesting …

Update Tuesday, November 10^th, 2009

I managed to hit the worst case run-time [5] with the code. Change the order of things, and it runs in about 15 seconds. Go figure …

Update Wednesday, February 3^rd, 2010

It was a bug in Lua that has since been fixed [6].

[1] /boston/2009/09/15.1

[2] /boston/2009/06/01.1

[3] http://lua-users.org/lists/lua-l/2008-02/msg00255.html

[4] http://lua-users.org/lists/lua-l/2007-06/msg00231.html

[5] /boston/2009/11/10.1

[6] /boston/2010/02/03.1

Gemini Mention this post

Contact the author