Trantor's new TLS backend, CSPRNG and improvements

home

I've spent a few months rewriting Trantor's TLS infrastructure. The old one was purely based on OpenSSL and messy. It's hacked together and not very well documented. Over time we added more and more features to it. And it's getting harder and harder to even understand what's going on. This and my need of ALPN pushed me to rewrite entire damn thing.

I know from the get-go that it's going to be a huge undertaking. But I didn't realize how complicated and surprisingly straight forward OpenSSL made things. Here I'll be talking about the new features added as the result of this rewrite. And the bugs that I found and fixed.

TLS backend

I want plan B in case of OpenSSL getting a critical flaw again. The new backend architecture is designed to support multiple TLS libraries. It took a few tries to get the backend architecture right. Turns out, even as a maintainer, there's features that I don't know Trantor supported.

I choose Botan as that plan B because it's really easy to use, have no symbol collision with OpenSSL and popular enough to be in most distros. I first encountered Botan to fill my need for a easy to use password hash. But then grew to like it's strict and correct-by-default design. In the end there's several invisible changes. First instead of getting OpenSSL directly writing to file descriptors. Trantor now uses memory BIO (and the equivalent on Botan) for IO abstraction. Secondly, Weather TLS is available from outside of the library itself is mostly a run time information as checking if OpenSSL is used is no longer sufficient. But I don't want to write the unholy mess of ifdefs.

TLS Policy

The old interface controlling properties of TLS connections is ugly and very coarse. Cramming SNI and CA options into it is already a awkward. Security options wise, it can only choose to accept old protocols (<= TLS 1.1) and if the certificate shall be verified. The API does not allow jsut checking if domain and expiry is valid - which would be very useful in protocols like Gemini that uses TOFU instead of CA. The new TLSPolicy API makes it easy.

What is used to look like to make a client for Gemini:

client.enableSSL(/* useOldTLS= */ false,
    /* validateCert= */ false,
    /* hostname= */" the.hostname.com",
    /* confCmds= */ {},
    /* keyPath= */ "/path/to/client_cert.key",
    /* certPath= */ "/path/to/client_cert.cert");

Without the comments it isn't obvious what does what. The new API is self documenting and can be shared across TLS clients if there's a need.

auto policy = TLSPolicy::defaultClientPolicy();
policy->setAllowBrokenChain(true) // allow self signed
   .setHostname("the.hostname.com")
   .setCertPath("/path/to/client_cert.key")
   .setKeyPath("/path/to/client_cert.key");
client.enableSSL(std::move(policy));

Fixing bugs that we didn't cared

Like said above. Trantor's TLS support is hacked into the library. One of the way this manifested is that closing a connection via `conn->shutdown()` simply closes the TCP connection. It does not do any TLS specific closing action. By RFC, it's required for the party actively closing the connection to send a close notification then close the TCP stream. I assume to prevent attackers abusing the now orphaned connection from a router above. This wasn't a problem for us because the main user of trantor is drogon. And drogon only closes the TCP connection on timeout; it's almost always the peer closing, not us. Funny enough, I found this when I browsed Lupa and saw the not sending TLS close statistics and capsule list. (My Gemini server also runs on Trantor)

Lupa - Statistics on the Gemini space

Furthermore, the old code doesn't handle TLS notifications at all. It just ignores them and closing the connection on `SSL_write` error, when OpenSSL helpfully does. Now it actually handles close and notification messages. Not much but should lead to less code running upon error.

The other bug is more serious. But maybe harmless enough as no one complained. Trantor uses a send queue to track unsent data in case the async socket is full. But the `conn->shutdown()` API doesn't care about that. It always shuts down the socket immediately. Also shutting down TLS if that's running. Somehow OpenSSL doesn't care enough to report meaningful errors. But Botan does care and throws. Boom! This is why early failure is important. I'm glad I found this before someone complained abd have to debug it without any error message guiding me.

No more weak ciphers

We should have done this ages ago. By default OpenSSL enables all cipher suites allowed by TLS. Including the weak ones. That includes 3DES, MD5, etc... Yeiks, I understand that's for compatibility. Just.. No, no, no, nope! I will not allow that. From now we force at least the medium strength in the OpenSSL backend. And Botan straight up only have strong ciphers.

Weak ciphers can die in hell!

ALPN and Sever side SNI

This is mostly to fit personal needs. I want ALPN so I can implement HTTP/2 for drogon. OpenSSL's ALPN API is. ugly. It gives you a byte array and you have to parse it yourself. It's easy to get wrong and lifetime is a mess. I'm glad Botan has much better better API.

| size | ...... data ...... | size | ...... data ...... |
|  2   |       h2           | 8    |  http/1.1          |
        ^
        To select "h2" I have to return a pointer to the first byte of "h2"
        and set the length to 2. Otherwise I leak memory or free before end.
        Dude, what the hell?

I am also thinking to use ALPN to notify Gemini servers about the intent of clients. Gemini lacks the User-Agent header to tell the server what the client is. Or any headers for that matter. APLN allows me to sneak in some information. Currently there's no way for my search engine, TLGS, to notify Gemini servers that I'm crawling. IP addresses can change or be distributed. Yet client certificates may trigger different behavior unwantedly like account registration. As Gemini doesn't demand ALPN, I suspect most hosts won't look at it. But that's out of the scope of this post.

SNI is just added because I can. Drogon doesn't have syntax to specify which domain to serve. So kindof useless. But it's there.

The new CSPRNG

Trantor did not provide any cryptographic interface nor it's intended to provide any. I only added basic hashing and RNG functions to reduce the complexity of our users. Us no longer exclusively supporting OpenSSL forces users to detect which TLS backend is used and choose the corresponding API. That's no good. Users should be able to do anything no matter how their dependencies are compiled. Therefor the added wrapper. However, trantor also supports building without any TLS support. For embedded system support and what not. Providing working hashes are easy, but CSPRNG is complicated.

We used to (in drogon) use `/dev/urandom`, `getentropy` or other OS specific random API when no TLS backend is available. But it can lead to entropy starvation. Embedded systems and some VM have very limited entropy. Calls to `getntropy` may block. But websockets, by RFC, requires a stream of secure random bytes to generate the masking key. Thus ended up bottlenecking the entire application. That's not to mention the default `/dev/urandom` is slow.

After some web searching and looking through debates on the subject. I ended up on Dan Kaminsky's DEFCON talk and the design in it. View it yourselves if you're interested. I'll try to summarize it here. Hopefully I won't strawman it. Combined with talk of arc4random from Theo de Raadt

DEF CON 22 - Dan Kaminsky - Secure Random by Default

Hackfest 2014: Theo de Raadt presented "arc4random - randomization for all occasions"

This RNG is secure as guessing the internal state means breaking BLAKE2b. Even if the state is leaked. Rewinding requires breaking a cryptographic hash. And fast forward demands predicting the exact nanosecond of the next call and how long is the next output size. Which should be impossible given the usual network noise. There's features I want to support. But felt over the top and unnecessary. I'll just leave it here.

The pseudo code looks like this:

struct RngState
{
    Hash256 secret;
    Hash256 prev;
    int64_t time;
    uint64_t counter = 0;
};

static RngState state;
static uint32_t counter = 0; // track when to reseed
if(counter++ % 1024 == 0) {
    // Get some entropy from the OS. Also keep some of the old state.
    systemRandomBytes(state.secret.data(), state.secret.size());
}

// static so we don't use up system entropy. In all major runtimes 
// random_device is a system entropy source.
static int64_t shiftedTime = std::random_device()();
// Use the CPU cycle counter to force bits to be different even if all else is predicted.
state.time = rdtsc() + shiftedTime;

for(size_t i=0; i<len;i+=sizeof(Hash256)) {
    auto hash = blake2b(state);
    memcpy(out+i, hash.data(), std::min(len-i, sizeof(Hash256)));
    state.counter++;
}

Quick performance test shows this CSPRNG produces 57MB/s on a single ARM Cortex-A72 core at 2GHz. Quite good considering I'm using the reference implementation of BLAKE2b. This number falls way behind to OpenSSL's 1306MB/s. But better then Botan's `AutoSeeded_RNG` at 18.4MiB/s. Actually, I only realize upon digging into this performance difference, that my RNG is better then OpenSSL's. OpenSSL uses `/dev/urandom`, PID, UID and time (in seconds) to seed the RNG and SHA1 for hash. I use the same system entropy source but uses nano-second timestamp and BLAKE2b.

Note that in the following benchmark, I didn't remove the overhead initializing the RNG. So the 1st test is always slower.

❯ lscpu | head -n 9
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          16
On-line CPU(s) list:             0-15
Vendor ID:                       ARM
Model name:                      Cortex-A72
Model:                           3
Thread(s) per core:              1

❯ ./csprng_perf_test_botan
Benchmarking secureRandomBytes()
TLS backend: Botan
--------------------------------
size: 16 KiB, time: 4 ms
size: 64 KiB, time: 3 ms
size: 16384 KiB, time: 866 ms
size: 65536 KiB, time: 3461 ms

❯ ./csprng_perf_test_openssl
Benchmarking secureRandomBytes()
TLS backend: OpenSSL
--------------------------------
size: 16 KiB, time: 3 ms
size: 64 KiB, time: 0 ms
size: 16384 KiB, time: 12 ms
size: 65536 KiB, time: 48 ms

❯ ./csprng_perf_test_internal
Benchmarking secureRandomBytes()
TLS backend: None
--------------------------------
size: 16 KiB, time: 0 ms
size: 64 KiB, time: 2 ms
size: 16384 KiB, time: 287 ms
size: 65536 KiB, time: 1150 ms

As a side note, BLAKE2b is the fastest hash that Trantor has built in. SHA3 is 6x slower and SHA256 is 4x slower. All of them are non-vectorized, portable implementations. Also avoid SHA3 when you can. It's secure, but damn it's slow.

=======================

That's mostly the new capabilities I added to Trantor. Now working on HTTP/2 support. Dang that thing is complicated.