💾 Archived View for ftrv.se › 1 captured on 2022-06-11 at 20:51:11. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2020-09-24)

-=-=-=-=-=-=-

Serialization format for your new webscale microservice architecture in the cloud.

Wanna look thinking different? Why use all these XMLs and JSONs yet again? That's too mainstream, let's invent our own, like no one else does!

Escaping and human-readability

You know what sucks about these language-independent data formats? They suck. They are not really human-readable, are hard to serialize data with, and not that easy to write correct parsers for, either. Also, too mainstream.

Let's assume there are two computers -- `A` and `B`. They only know what types of data they are exchanging and what data to expect from each other. Humans debugging the software running on these two computers need to be able to read what's being transferred back and forth, without having to use any tools (for pretty-printing or parsing).

(De)serialization on the software side should be as simple and as fast as possible. It should be straightforward to write a parser/serializer in `awk`, to use `grep` to filter stuff around and `sed` to replace values/etc.

The solution

<data>  ::= <line> | <data>
<line>  ::= <key> [\t] <value> [\n]
<key>   ::= [^\t\n]+
<value> ::= [^\n]+

Only UTF-8 is allowed.

And that's it! As simple as possible. No escaping bullshit. One line, one key, one value! Highly readable!

If you really want newlines in your value, use lists instead.

You can have lists like `[1, 2, 3]` or `["<I'm\tbored>", "\"Привет!\"", "No."]`, just parse it:

numbers	1
numbers	2
numbers	3

what	<I'm	bored>
what	"Привет!"
what	No.

I WANT NESTED OBJECTS!11

Just as with lists, instead of forcing "objects" on our format itself, let's put it on a different layer.

<key>              ::= [^\t\n]+

<nested-key>       ::= <nested-key-level> | <nested-key-level> [ ] <nested-key-level>
<nested-key-level> ::= [^ \t\n]+

So, nested key is a key with space character used to access different level of object. Let's see how a list of objects could look like:

human	
human age	45
human name	John Snow
human	
human name	William Budd
human age	68
human	
human age	57
human name	Yoseph Thomas Clover

This data forms a list of objects:

[ Human{name: "John Snow", age: 45},
  Human{name: "William Budd", age: 68},
  Human{name: "Yoseph Thomas Clover", age: 57}
] :: [Human]

Each object starts with a `human\t` line here, it's like having `key="human"` and `value=""`.

But that format is not for "BigData®"

It's not. But you don't have "big data" either. If you are concerned that much about the number of bytes being transferred, use compression.

How is it simple and fast?

(Simplified) parser in C.

If you care much about the speed of comparison between the key you've got and the ones you defined (to deserialize into a structure field), don't worry. `strcmp` is really fast.

(Simplified) serialization in C.

Conclusion

Wow.