💾 Archived View for gemini.ctrl-c.club › ~fleg › gemlog › 2018-06-10-std-variant.gmi captured on 2023-07-22 at 17:51:46. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
Strongly typed languages provide programmer an awesome ability to find certain bugs quicker by raising compilation errors when the program attempts to use certain memory in an improper way - like treating strings as numbers, and so on. In general it's a great thing, as it's possible to find issues before the program is even executed.
Let's take a look at the following piece of JSON:
hello: [ 1337, "hi", [1, 2, "oh noes"] ]
How can we store such input in C++? Type of the value stored must be known at compilation time, so out job is a little bit harder.
The one that we all saw at some point of our life.
std::string value;
Storing everything as string *kinda works*, but it's far from optimal, both when it comes to space occupied by such structures, and by the need to parse them each time we'd need to get anything that is not a `std::string`. And this is just a tip of an iceberg, if we would start considering any potential problems (how to determine a type of the value?).
We don't like this solution, so let's move on.
Quite obvious idea that comes to mind is to create a simple class that would hold all of those, but know which one is actually used:
class MyData { enum class Type {NUM, STR, VEC}; public: MyData(int i_) : i(i_), type(Type::NUM) {} MyData(const std::string& s_) : s(s_), type(Type::STR) {} MyData(const std::vector<MyData>& v_) : v(v_), type(Type::VEC) {} int& getInt() {assert(Type::NUM == type); return i;}; std::string& getString() {assert(Type::STR == type); return s;}; std::vector<MyData>& getVec() {assert(Type::VEC == type); return v;}; private: int i; std::string s; std::vector<MyData> v; Type type; };
That's a lot of writing, and a lot of potential problems. Also, there's an overhead - every object stores three members, while only one can be used at given time.
This is the one that looks like `struct`, but instead of storing all of its members, it can store only one of them.
union A { int i; std::string s; // just in C++11 onwards! std::vector<A> v; // just in C++11 onwards! // this won't compile, as it needs a destructor };
`union`s are very powerful constructs, but one needs to be careful while using them:
That can be actually pretty useful - imagine putting a `char` as one member, and a bit field[1] as another - modifying one would automatically "update" the other (as they are all one thing underneath) for free. Nice for low-level binary structures!
So, in C++11 if we would want to care about freeing the memory and type safety we would need to do something like this:
struct A { enum class Type {NUM, STR, VEC}; Type type; A(int i_) : type(Type::NUM), i(i_) {} A(const std::string& s_) : type(Type::STR) { new((void*)&s) std::string(s_); } A(const std::vector<A>& v_) : type(Type::VEC) { new((void*)&v) std::vector<A>(v_); } A(const A& a) : type(a.type) { switch(type) { case Type::NUM: i = a.i; break; case Type::STR: new((void*)&s) std::string(a.s); break; case Type::VEC: new((void*)&v) std::vector<A>(a.v); break; } } A& operator=(const A& a) { switch(type) { case Type::STR: s.~basic_string<char>(); break; case Type::VEC: v.~vector<A>(); break; case Type::NUM: break; } type = a.type; switch(type) { case Type::NUM: i = a.i; break; case Type::STR: new((void*)&s) std::string(a.s); break; case Type::VEC: new((void*)&v) std::vector<A>(a.v); break; } return *this; } ~A() { switch(type) { case Type::STR: s.~basic_string<char>(); break; case Type::VEC: v.~vector<A>(); break; case Type::NUM: break; } } int& getInt() {assert(Type::NUM == type); return i;}; std::string& getString() {assert(Type::STR == type); return s;}; std::vector<A>& getVec() {assert(Type::VEC == type); return v;}; private: union { int i; std::string s; std::vector<A> v; }; };
That's *far too much* work that I'm willing to do here, and I'm still pretty sure that anyone who knows C++ better than me would be able to find more issues with the code than I've found. Is there an easier way?
C++17 gives us a nice, type-safe alternative to union that will take care of calling the destructor automatically, while forbidding us from getting the wrong type by accident (so, no bit field tricks for us here).
std::variant<int, std::string> v;
This one will properly free memory after it goes out of scope, and will throw an exception when program would try to obtain value of type, which is not actually stored there. But while `get<T>()` and `get_if<T>()` are obvious, the real gem is `std::visit`, which lets the programmer to use a visitor pattern[2] to obtain values without the need for any checking, like this:
struct MyVisitor { std::string operator()(const int input) const { return std::to_string(input); } std::string operator()(const std::string& input) const { return input; } }; void print(const std::variant<int, std::string>& v) { std::cout << std::visitor(MyVisitor(), v) << std::endl; }
`std::variant` has no problem accepting the same type multiple times:
std::variant<int, int> v;
How to get the value we would need? Turns out we can not only call `get()` with the type we would like to get, but also with the index of the type we would like to obtain. Not only that, but thanks to `std::in_place_index` we can tell which element should be initialized:
std::variant<int, int> v2(std::in_place_index<0>, 12); int i = std::get<0>(v1); int i = std::get<1>(v1); // would throw std::variant<int, int> v2(std::in_place_index<1>, 123); int i = std::get<0>(v2); // would throw int i = std::get<1>(v2); // would prevent compilation, as it is ambiguous std::get<int>(v);
This should be rather rarely needed, but can be useful in case of ultra generic code, where two seemingly different types boil down to the same type.
`std::variant` cannot be created with `void` as one of its types. Therefore `std::monostate`[3] has been introduced - a type which can have only one value, so can be nicely used to represent an empty state.
What if the value cannot be obtained when it is being set? If the old value has already been destroyed, and the new one cannot be set (for example, its constructor throws an exception), then variant becomes *valueless by exception*. That means:
The example for `union` was overly complicated, as the union was *recursive* - it contained an argument which type also contained such union. But, since all memory management issues are solved for us automatically, the only thing that we need to do is to wrap our variant in a structure and have a pointer to its incomplete type as one of variant's types (as variant itself cannot accept incomplete types):
struct RecursiveVariant; struct RecursiveVariant { using Value = std::variant<int, std::string, std::unique_ptr<RecursiveVariant> >; Value value; };
It's worth noting that `std::variant` is essentially a nicer version of the much older Boost.Variant[4], so if you cannot use `std::variant`, you can still try to get Boost and use a variant from there - they're pretty similar. The biggest difference is how do they behave when they cannot get a value:
=> https://en.wikipedia.org/wiki/Bit_field 1
=> https://en.wikipedia.org/wiki/Visitor_pattern 2
=> http://en.cppreference.com/w/cpp/utility/variant/monostate 3
=> https://www.boost.org/doc/libs/1_67_0/doc/html/variant.html 4
=> https://www.boost.org/doc/libs/1_67_0/doc/html/variant/tutorial.html#variant.tutorial.recursive 5
=> https://www.youtube.com/watch?v=k3O4EKX4z1c 6
=> http://en.cppreference.com/w/cpp/utility/variant 7