💾 Archived View for hoseki.iensu.me › posts › handling-optionals-in-rust-macro-rules.gmi captured on 2023-03-20 at 17:24:33. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-01-29)
-=-=-=-=-=-=-
I couldn't bear the amount of repetitive in code in [one of my projects] that had to do with the definition, identification and representation of token types in the scanner part of an interpreter. All of the token string representations were defined as string constants in one place and then I had created an enumeration of the token types themselves with one function for parsing a string into a token type and one function to get the string representation from a token type. This problem seems like a perfect fit for [a Rust macro], but as we'll see it was a bit more involved to implement than I initially thought. However, once I grokked how to make use of macro recursion over multiple match arms the solution turned out to be quite simple and elegant.
Here is how the code looked initially:
const AND: &str = "and"; const BANG: &str = "!"; const BANG_EQUAL: &str = "!="; // ... enum TokenType { AND, BANG, BANG_EQUAL, // ... NUMBER, STRING, } impl TokenType { pub fn as_str(&self) -> Option<&'static str> { match self { Self::AND => Some(AND), Self::BANG => Some(BANG), Self::BANG_EQUAL => Some(BANG_EQUAL), // ... _ => None } } pub fn from_str(s: &str) -> Option<TokenType> { match s { AND => Some(Self::AND), BANG => Some(Self::BANG), BANG_EQUAL => Some(Self::BANG_EQUAL), // ... _ => None, } } }
At the time of writing there are about 40 tokens and each additional token would require edits in four (!) different places just for the TokenType definition. I wanted to get the code down to something like the code snippet below, having the macro take care of generating the function implementation and also removing the need for declaring constant strings.
string_enum! { TokenType { AND = "and", BANG = "!", BANG_EQUAL = "!=", // ... NUMBER, STRING } }
Having set the vision, I thought I'd have a working macro within a few minutes, I mean it sure looked simple enough, but the devil is in the details as always. The specific devil in this case was the fact that not every `TokenType` had a pre-defined string representation (e.g. `STRING` and `NUMBER`) and I wanted the invocation of `as_str()` in those instances to return `None` since the retrieval of the source string was handled elsewhere in the code. My first attempt resulted in two issues:
macro_rules! string_enum { // Macro input ( $name:ident { // the name of the enum $( // Each item identifier optionally followed by `= "foo"` string $item:ident $(= $repr:expr)? ),* } // Macro output ) => { // Declare the enum and add each item pub enum $name { $( $item ),+ } impl $name { pub fn from_str(s: &str) -> Option<TokenType> { match s { // Add match arms for each representation $($($repr)? => Some(Self::$item),) // <- ISSUE 1 _ => None, } } pub fn as_str(&self) -> Option<&'static str> { match self { // Add match arms for each item $(Self::$item => $($repr)?,) // <- ISSUE 2 _ => None, } } } }; }
pub fn from_str(s: &str) -> Option<TokenType> { match s { $($($repr)? => Some(Self::$item),) // <- ISSUE 1 _ => None, } }
In `from_str(s: &str)` I wanted to generate a pattern match arm for each `$repr` to `Some($item)`, but as the code was written a pattern match arm for each `$item` definition would be generated, not each `$repr` definition. Since the string representation for an item is not mandatory this resulted in pattern match arms with the left side missing and therefore it the code did not compile.
In `macro_rules!` definitions, each nested `$(...)` repetition grouping on the matcher (left-hand) side introduces one additional level of repetition nesting on the transcription (right-hand) side. If we look at the matcher I wrote above we see that the `$repr` grouping is nested within the `$item`'s repetition grouping:
( $name:ident { $( // <- Start first level repetition $item:ident $(= $repr:expr)? // <- Second level repetition ),* // <- End first level repetion } ) => { // ... }
Bringing the whole pattern match arm definition to the same repetition level as the `$repr` token solved the issue:
// $($($repr)? => Some(Self::$item),) // <- ISSUE 1 $($($repr => Some(Self::$item),)?) // <- SOLVED!
It's a bit hard to see but `)?` has been moved so that it encloses the whole match arm and not just `$repr`, which puts the whole match arm on the same repetition depth as `$repr`. This means that we will have just as many match arms as we have `$repr` definitions. Problem solved!
pub fn as_str(&self) -> Option<&'static str> { match self { $(Self::$item => $($repr)?,) // <- ISSUE 2 _ => None, } }
The second issue was in `as_str(&self)` and had to do with the right-hand side of the pattern match arm. This was a bit trickier to find a solution for (and it was actually what triggered me to write this post in the first place).
The goal was for the resulting right-hand side of the pattern match to either be `Some($repr)` if there was a `$repr` definition for that `$item`, or `None` otherwise. Turns out there are no operators which allows you to select an output on the transcription side of the macro definition, and fiddling with the expansion depth wouldn't help in this case since we want to generate a line for each `$item` definition regardless whether it has a `$repr` or not.
As often happens, the solution was recursion: `macro_rules!` declarations can have multiple match arms, and these match arms can be used to recursively invoke our macro with different inputs. Furthermore, prefixing a match arm with a unique string allows you to have more fine-grained control when matching and thus create "helper matchers":
macro_rules! recursive_macro { ($x:expr) => { recursive_macro!(foobar $x) }; (foobar $x:expr) => { println!("{:?}", $x); }; }
Above, `foobar` is used to control which match arm is triggered when recursively invoking `recursive_macro!`. Following this pattern the condition can be encoded as match arms in the macro which constitute a "helper function" of sorts which either takes an expression and expands to `Some(expression)` or which takes no arguments and expands to `None`. Using the prefix string `@option` makes it feel even more like a helper function.
macro_rules! recursive_macro { ($($x:expr)?) => { recursive_macro!(@option $($x)?) }; (@option $x:expr) => { Some($x) }; (@option) => { None } }
The `?` repeater operator matches exactly 0 or 1 occurrences of the enclosed pattern so with this `@option` helper I was able to successfully take an optional expression and expand it into an `Option` instance.
Putting it all together [the macro] ended up something like this:
macro_rules! string_enum { ( $name:ident { $($type:ident $(= $repr:expr)?),* } ) => { pub enum $name { $($type),+ } impl $name { pub fn as_str(&self) -> Option<&'static str> { match self { $(Self::$type => crate::string_enum! { @option $($repr)? },)+ } } pub fn from_str(s: &str) -> Option<$name> { match s { $($($repr => Some(Self::$type),)?)* _ => None, } } } }; (@option $repr:expr) => { Some($repr) }; (@option) => { None }; }
With this macro I was able to write my `TokenType` definitions as below, saving me about 100+ lines of code and simplifying the process of adding new tokens.
string_enum! { TokenType { AND = "and", BANG = "!", BANG_EQUAL = "!=", // ... NUMBER, STRING } }
I find the final macro definition to be quite readable making fairly easy to picture the resulting code. I can definitely see the helper function pattern coming in handy in the future. There's also the [tt-muncher pattern] for even more complex scenarios, but when I start to reach for that, I might be better off creating a [procedural macro] instead...
イェンス - 2022-11-08