💾 Archived View for d.moonfire.us › blog › 2024 › 06 › 29 › identifiers-update captured on 2024-09-29 at 01:00:54. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-07-09)
-=-=-=-=-=-=-
Last month, I did a post about snowflake and UUIDv7 identifiers[1]. I was pretty happy with it, but then I was playing around with Clew[2] which is a recent, smaller web search engine and decided to look up my identifers post just to check it out.
1: /blog/2024/05/30/identifiers/
Right underneath mine was a post by Unkey[3] on the same topic. It has some good observations that I didn't think about when I wrote mine, so decided to expand on mine to include the ideas I like.
3: https://www.unkey.com/blog/uuid-ux
2024-05-30 Generating and Using Identifiers
2024-06-29 Generating and Using Identifiers (Part 2)
Yes, but I didn't think about it. Both UUIDs (including my current crush on v7) and my preference for the Crockford encoding both have the same problem with double-clicking to copy, because of the dashes.
I still feel that removing the separators turns it into a long series of numbers that is difficult to parse and elide in a consistent manner, so just removing the separators isn't an option. However, underscores don't have the same behavior (and the Unkey uses them later for the prefix discussion).
I don't normally think about underscores since they conflict with underlines for links, but in this case, the identifier should be treated as a whole, so favoring the double-click and single word behavior is a major benefit.
This isn't from the Unkey post, but with underscores, I think the groups of four is excessive. I don't know why, but the lower bar somehow makes it more obvious. Plus, how often do you need to hand transcribe or read out loud the numbers.
I think splitting the identifiers into groups of eight would be sufficient, as long as they are consistently eight instead of UUIDs default formatting of 8-4-4-4-12 which is difficult for me.
I still think groups of eight (from the right) is still workable, has a natural break point for eliding (“rv1y5c52” in the above example) and doesn't need quite as much space.
I can't believe I forgot about prefixes. When Github and Gitlab both started adding prefixes to their API keys, I thought it was a great idea. Then, when I went to Gitea and later to Foregjo, seeing just a bunch of hex characters for an API key felt subtly “wrong” to me.
The Unkey post pointed out, purpose. A bunch of numbers is one thing, but knowing the purpose of the key helps identify it as a secret (e.g., something not to check into code) and also possibly give a way of preventing usage.
Naturally, since we want to treat an identifier, it should be separated by an underscore from the rest of the key. Also it solves a problem when the first character of the number is a number, it is an invalid code identifier. Not that someone is going to say:
const 01_hz613s22_k8nr6w6g_rv1y5c52 = getIdentifier();
However, it fits better into the “idea” of code to treat it as one. So, starting the prefix with a non-number would make this more useful:
While thinking about it, I considered making a distinction between the identifier and the code with a double underscore.
Originally, I didn't think it made sense because we don't really want to parse the identifier to pull out the prefix from the code itself. That said, there is one situation where we need to be able to parse: eliding.
Just breaking off the last group would result in stripping off the contextual prefix. We want to keep that information even when shortened, which means we have to have a mechanical way of identifying the prefix to determine how to break it apart.
The double underscore also means that it remains a single word for purposes of selection.
Having multiple words does make sense to me, so “bedor_player” seems reasonable to have since it can be arbitrarily long. That said, the identifiers already produce a “globally unique” value, so the prefix also doesn't have to be global. That means we just need enough scope information for the producer of the identifier, but not a full scope like:
Good examples would be:
In this case, shorter is better.
That led into the question about suffixes. For example:
The main reason I don't think suffixes make sense is that I believe most people look at the beginning and the end of variables in most cases. And, having a textual prefix and suffix adds more complexity since both have to be looked at to understand the scope. Having only a prefix means the contextual information is always on the same side of the code.
Eliding (abbreviating) is important when dealing with large identifiers. Git allows you to use an arbitrary number for reducing the hashes to identify a Git, but I remember trying to figure out if six or eight was the best for some purpose. With the grouping above, there is a natural break at eight and sixteen.
I realized this really is only a concern if there are smaller numbers were most of the identifier is zero such as `00000000-0000-0000-0000-000000000000` which has a Crockford encoding of `0`.
However, for most UUIDv7 identifiers, this isn't possible because the first 48-bits are a timestamp and won't ever be zero (again) after that epoch millisecond.
But, there is a known problem with eliding: there is a higher level of collisions:
In these cases, there needs to be a check for duplicates as part of the code. That is the cost of being able to elide, additional complexity when selecting.
I will have to mention, eliding is a “human” thing, not a protocol thing. We wouldn't be sending up an elided identifier as part of a HTTP header or an authorization key. We need eliding for showing a table of all known identifiers so someone can see additional information or delete the key. That said, in a grid, you probably don't need the prefix, but the double underscores makes it possible to have it just in case you mix your keys in the same grid.
There isn't a formal specification, but a current summary of my wandering thoughts.
I'm still happy having a framework for this, though I could see extending some of my favorite C# strong typing library: StronglyTypedId[4] to support the format and prefixes. I mean, it would be nice to be able to say:
4: https://github.com/andrewlock/StronglyTypedId
using System; using StronglyTypedIds; [StronglyTypedId(Template.Crockford, Prefix="a")] public partial struct ApiId { } var id = new ApiId(Guid7.NewGuid()); Console.WriteLine(id); // a__01_hz613s22_k8nr6w6g_rv1y5c52 Console.WriteLine(id.Elide(1)); // a__rv1y5c52
Hopefully, I've worked out most of the kinks I've found. I might come back, or I might formalize this into libraries. Either way, I think it is a workable pattern for the difficulties I've experience in the past.
Categories:
Tags:
Below are various useful links within this site and to related sites (not all have been converted over to Gemini).