Serializing Lua data, part III

Continuing the series on Lua serialization [1], we resume with the Lua [2] userdata type. At this point, all I'm doing is conjecture, since I haven't actually bothered with serializing userdata and this post is about why I didn't bother.

The Lua manual [3] has this to say about userdata:

The type userdata is provided to allow arbitrary C data to be stored in Lua variables. A userdata value represents a block of raw memory. There are two kinds of userdata: full userdata, which is an object with a block of memory managed by Lua, and light userdata, which is simply a C pointer value. Userdata has no predefined operations in Lua, except assignment and identity test. By using metatables, the programmer can define operations for full userdata values (see §2.4 [4]). Userdata values cannot be created or modified in Lua, only through the C API (Application Programming Interface). This guarantees the integrity of data owned by the host program.

Already we're in trouble with respect to serialization. The light userdata is just a pointer, no more, no less. There is no indication of just how much memory it points to, much less what it points to or what the memory even represents. That's not to say that a light userdata has no meaning, but such meaning is only applicable to any code written in C (or C++) and not to Lua.

The serialization code gets a light userdata and … meh? There is nothing we can safely serialize about a light userdata. One answer might be to serialize the pointer value, but a pointer value only has meaning within an instance of a program. Run the same program again, and the value of the light userdata might not be the same (nor should you assume it'll be the same). Give that pointer value to another program, and who knows what it points to in that other program.

We could try reading the memory the light userdata points to, but as I said, there is no indication of how much memory we can read, if any [5]. In general, serialization of light userdata is out of the question (I'm not saying it can't be done, I'm just saying I'm not writing a general purpose serializer that will attempt it).

A full userdata, while it still has no meaning for Lua, does have a length associated with it, so it is conceivable we could serialize the actual memory associated with a full userdata but again, we run into trouble. For instance, let's try to serialize the only standard Lua userdata—the result of opening up a file.

The userdata associated with a file in Lua is the following structure:

>
```
typedef struct luaL_Stream {
FILE *f;
lua_CFunction closef;
} luaL_Stream;
```

The first field is a pointer to a FILE, an opaque type (even to C programmers) that represents an “open file,” and sending the actual FILE value across is meaningless; one, because the contents differ from system (say, Microsoft Windows) to system (Linux), and two, it supposes that the “file” being opened exists when this is deserialized and used.

Now, given that CBOR (Concise Binary Object Representation) allows additional semantics to be attatched to data, we could include the contents of the “open file” as part of the serialization and semantically tag the data as a “file, please create and open this.” We could, but there are a few issues even with this—what if the “open file” isn't a file at all? What if it's the output from a command? Or a network connection? How you do serialize a connected TCP connection? But say that we do have, in fact, a file, one that happens to be two gigabytes in size? Would you really want to serialize a two gigabyte file?

So solve those issues, there's still the issue of how the file was opened. Was it opened “read only?” We need to send a copy. “Write only?” No need to send anything. But the issue here is—there is no standard way to determine if a file was opened “read-only” or for “write-only” or even “read/write but writes are appended.” There are non-standard ways of doing it, but it varies from system to system.

Now, about that second field. It's a pointer to a C function, and as I stated before, serializing a Lua function in C is not possible [6]. I worked around it by tagging the name of the function, but there's no name for this function, at least, one known to Lua. And why point to a hidden function to close the file? Because for files, you need to call the C function fclose(), but for a “file” opened by popen() (a function to run another process and either write data to it, or read data from it) you need to call the C function pclose(). That's why the function pointer.

And that's for a well known, standard Lua userdata. There are other types of userdata, depending upon the Lua modules in use, such as LPeg [7] (where the full userdata for that represents an entirely different virtual machine geared for parsing text) or even my own org.conman.pollset [8] (which is usually used to multiplex I/O with network connections, but can be used for other devices such as serial ports, and whose implementation depends on the operating system, as if things weren't difficult enough).

This also depends upon the module handing a given full userdata already loaded when deserializing. I'm not saying this is all impossible, but I do think it's beyond the scope of a general library to handle. Perhaps having the user register some type of callback to handle userdata might be required. I already support callbacks for tagged CBOR data, so adding support for full userdata using callbacks shouldn't be that hard.

All that's left—threads.

[1] /boston/2015/03/31.1

[2] http://www.lua.org/

[3] http://www.lua.org/manual/5.3/manual.html

[4] http://www.lua.org/manual/5.3/manual.html#2.4

[5] https://www.google.com/search?q=malloc(0)

[6] /boston/2015/03/31.1

[7] http://www.inf.puc-rio.br/~roberto/lpeg/

[8] https://github.com/spc476/lua-conmanorg/blob/master/src/pollset.c

Gemini Mention this post

Contact the author