Managing TLS connections using Lua and Lua coroutines

Getting libtls [1] working with Lua [2] wasn't as straightforward and I thought it would be. It works (for the most part) but I had to change my entire approach. The code is an ugly mess and there's quite a bit of duplication in several spots.

But! I can request web pages, in Lua [3], via HTTPS (HyperText Transfer Protocol-Secure) in an event loop based around select() (or poll() or epoll() or whatever is the low level event notification scheme used). Woot! And I'm going into excruciating detail on this.

Back on Friday, when I wrote some “proof-of-concept [4]” code, I had thought I could switch coroutines [5] in the user-supplied I/O (Input/Output) callback routines [6] (and if coroutines existed in C [7], that is where you would yield to another coroutine). It was easy enough to extend the callback to a Lua routine— in the routine that wraps the libtls function tls_connect_cbs():

static int Ltls_connect_cbs(lua_State *L)
{
  struct tls **tls = lua_touserdata(L,1);
  int rc           = tls_connect_cbs(
			*tls,
			Xtls_read,
			Xtls_write,
			L,
			luaL_checkstring(L,2)
		     );
  
  if (rc != 0)
  {
    lua_pushboolean(L,false);
    return 1;
  }
  
  lua_settop(L,5);
  lua_pushlightuserdata(L,*tls);
  lua_getuservalue(L,1);
  lua_pushvalue(L,1);
  lua_setfield(L,-2,"_ctx");
  lua_pushvalue(L,2);
  lua_setfield(L,-2,"_servername");
  lua_pushvalue(L,3);
  lua_setfield(L,-2,"_userdata");
  lua_pushvalue(L,4);
  lua_setfield(L,-2,"_readf");
  lua_pushvalue(L,5);
  lua_setfield(L,-2,"_writef");
  
  lua_settable(L,LUA_REGISTRYINDEX);
  lua_pushboolean(L,true);
  return 1;
}

I pass in the two callback functions, and I'm using the Lua state context as the userdata in the callbacks. I then create a Lua table, populate it with some useful information, such as the Lua functions to call, and associate it in the Lua registry with the value of the libtls context. Then, when libtls calls one of the callbacks:

static ssize_t Xtls_write(struct tls *tls,void const *buf,size_t buflen,void *cb_arg)
{
  lua_State *L = cb_arg;
  ssize_t    len;
  
  lua_pushlightuserdata(L,tls);
  lua_gettable(L,LUA_REGISTRYINDEX);
  lua_getfield(L,-1,"_writef");
  lua_getfield(L,-2,"_ctx");
  lua_pushlstring(L,buf,buflen);
  lua_getfield(L,-4,"_userdata");
  lua_call(L,3,1);
  
  len = lua_tonumber(L,-1);
  lua_pop(L,2);
  return len;
}

I get the Lua state via the user argument. From that, and the libtls context, I obtain the data I cached into the Lua table, which give me the Lua function to call. Said function can then call coroutine.yield().

Straightforward, easy, and wrong! I got the dreaded “attempt to yield across metamethod/C-call boundary” error. Darn.

The attempted flow looks like (yellow boxes are Lua functions; green boxes are C functions):

{data=tls.read()} → [Ltls_read(lua)] → [tls_read(ctx)] → [Xtls_read(ctx,lua)] → [lua_call(lua)] → {my_callback()} → {coroutine.yield()} {}=Lua function []=C function [8]

There are four layers of C functions that can't be yielded through. Lua does have a way of dealing with intervening C functions, but it's somewhat clunky.

{luaf_a()} → [cf_orig(lua)] → [lua_callk(lua,cf_c)] → {luaf_b()} → {coroutine.yield} / {coroutine.resume} → {luaf_b()*} → [cf_c(lua)*] → {luaf_a()} [9]

In this case, the Lua function lua_callk() [10] is handled specially [11] so it doesn't cause an error. The function cf() needs to be split in half—the portion prior to calling into Lua, and the second half to handle things after a potential call to coroutine.yield(). That's represented above by the functions cf_orig() and cf_c(). The “*” represent the functions returning, not calling. coroutine.resume() will restart luaf_b() right after it's call to coroutine.yield(). And when luaf_b() returns, it “returns” to cf_c(), which does whatever and finally returns, which “returns” to luaf_a().

But in the case I'm dealing with just doesn't work with that model. The code calling into Lua doesn't have the signature:

int function(lua_State *lua_State);

but the signature:

ssize_t function(struct tls *ctx,void *buf,size_t buflen,void *cb_arg);

Not only are the return types different, but they have completely different semantics—for libtls, it's the number of bytes transferred, whereas for Lua, it's the number of items being returned to Lua.

No, I had to rethink the entire approach, and do the call to coroutine.yield() a bit higher in the call stack. Which also meant I had to push dealing with TLS_WANT_POLLIN and TLS_WANT_POLLOUT back to the caller. The documentation states:

> * TLS_WANT_POLLIN The underlying read file descriptor needs to be readable in order to continue.
* TLS_WANT_POLLOUT The underlying write file descriptor needs to be writeable in order to continue.
In the case of blocking file descriptors, the same function call should be repeated immediately. In the case of non-blocking file descriptors, the same function call should be repeated when the required condition has been met.

And here I was, trying to hide such concerns from the user. Ah well.

I eventually got it working, but man, is it ugly. The Lua code wants to read data, so I have to call into libtls. That in turn, calls back into my code, and if I don't have any, I need to return TLS_WANT_POLLIN, which bubbles up through libtls back to my code, which can then yield.

Meanwhile, from the other end, I get data from the network. I can't just feed it into libtls, I have to feed it when libtls calls the callback for the data. But when I get the data, I may need to resume the coroutine, so I have to track that information as well.

I can almost understand the code (and yes, I wrote it; did I mention it's ugly?)

But I'm happy. The following code works in my existing network framework (boy does that sound wierd):

local function request(item)
  syslog('debug',"requesting %s",item.url)
  local u = url:match(item.url)

  -- -------------------------------------------------------
  -- asynchronous DNS lookup---blocks the current coroutine
  -- until a result is returned via the network.
  -- -------------------------------------------------------

  local addr = dns.address(u.host,'ip','tcp',u.port)
  
  if not addr then
    syslog('error',"finished %s---could not look up address",u.host)
    return
  end

  -- ---------------------------------------------------------
  -- This has nothing to do with the iPhone operating system,
  -- but everything to do with "Input/Output Stream"
  -- ---------------------------------------------------------

  local ios
  
  if u.scheme == 'http' then
    ios = tcp.connecta(addr[1]) -- connect via TCP
  else
    ios = tls.connecta(addr[1],u.host) -- connect via TLS
  end
  
  if not ios then
    syslog('error',"could not connect to %s",u.host)
    return
  end
  
  local path    = table.concat(u.path,'/')
  local fhname  = "header/" .. item.hdr
  local fbname  = "body/"   .. item.body
  local fh      = io.open(fhname,"w")
  local fb      = io.open(fbname,"w")
  
  local command = string.format([[
GET /%s HTTP/1.0
Host: %s
Connection: close
User-Agent: TLSTest/2.0 (Lua TLS Testing Program)
Accept: */*

]],path,u.host)

  ios:write(command)
  
  fh:write(ios:read("*h"))
  
  repeat
    local data = ios:read("*a")
    fb:write(data)
  until data == ""
  
  fb:close()
  fh:close()
  ios:close()

  syslog('debug',"finished %s %s",item.url,tostring(addr[1]))
end

Any number of requests can be started and they all run concurrently, which is just what I wanted.

Now, the code I have for the Lua wrapper for libtls covers just what I need to do this. More work is required to finish covering the rest of the API (Application Programming Interface). I also have to clean up the Lua code that backs the above sample code so that I might have a chance of understanding it at some point in the future.

And until I get the working code published, you can look at the “proof-of-concept” Lua coroutine code [12] I worked from (and no, the above code sample will not work as is with this “proof-of-concept” code).

[1] https://man.openbsd.org/tls_init.3

[2] /boston/2018/07/19.1

[3] http://www.lua.org/

[4] https://github.com/spc476/libtls-examples

[5] https://en.wikipedia.org/wiki/Coroutine

[6] https://github.com/spc476/libtls-examples/blob/c9df11bbe1f57c1a5d0efe0896ef3fb131d44ad9/get3.c#L114

[7] /boston/2017/02/27.1

[8] /boston/2018/07/23/callback.gif

[9] /boston/2018/07/23/coroutine.gif

[10] https://www.lua.org/manual/5.3/manual.html#lua_callk

[11] https://www.lua.org/manual/5.3/manual.html#4.7

[12] https://github.com/spc476/libtls-examples/blob/fca85f268c8a2d7e9a913d537d4acb067df5b6c8/Lua/get4.lua

Gemini Mention this post

Contact the author