________________________________________________________________________________
I find using set, TLA, or other mathematical notation cumbersome to explain these things. Would a visualization would make it more approachable?
I know Raft is easier to understand, at least to me, probably due in part to its visual explanation that's front and center:
EDIT: Turns out it's harder to find a paxos visualization. Here's one:
https://www.scs.stanford.edu/17au-cs244b/labs/projects/vimbe...
https://jivimberg.io/paxos-playground/src/main/html/
Shameless plug:
http://imnaseer.net/paxos-from-the-ground-up.html
I worked on an explanation of Paxos where we start with a simple but incorrect implementation of the protocol. The bug is then fixed and the protocol refined. Which leads to another bug. Interestingly, after fixing 6 or 7 bugs we arrive at the actual working implementation of Paxos. The reader, having walked the path of arriving at the protocol (hopefully) understands the nuances better than just reading the description of the protocol from the get-go.
Bonus: The explanation uses simulated visual runs as well, eg.
http://imnaseer.net/paxos-from-the-ground-up.html?section=3&...
Raft is understandable because that was one of it's key design goals.
Set / TLA / etc tends to fail when a system is only probabilistically stable, doesn't it?
(Not strictly about paxos; more of a general observation.)
nope. it’s not about probabilities. if it is it’s incorrectly modeled
There are probabilistic systems.
This is a really good post. I liked seeing the TLA+ code here. There's a pretty serious ambiguity in the Paxos Made Simple paper, which makes it very easy to follow the paper and still end up with a broken algorithm. From Lamports publications page (
https://lamport.azurewebsites.net/pubs/pubs.html#paxos-simpl...
):
In 2015, Michael Dearderuff of Amazon informed me that one sentence in this paper is ambiguous, and interpreting it the wrong way leads to an incorrect algorithm. Dearderuff found that a number of Paxos implementations on Github implemented this incorrect algorithm. Apparently, the implementors did not bother to read the precise description of the algorithm in [122]. I am not going to remove this ambiguity or reveal where it is. Prose is not the way to precisely describe algorithms. Do not try to implement the algorithm from this paper. Use [122] instead.
Lamport's core point here is using something like TLA+ to describe algorithms like this is essential. Writing is just too prone to ambiguities and misunderstandings, and the formalism really helps. The downside, obviously, is that TLA+ can look impenetrable if you're not familiar with it. While it's not hard to learn (the mathematical underpinnings are quite simple), the initial curve is quite steep.
It's not that simpler written descriptions are bad, just that it's hard to make them exact enough to communicate the algorithm perfectly. On the other hand, learning from the mathematical notation is pretty hard. I don't know how to fix this problem.
I've recently noticed that there's an uptick in the number of Paxos related posts. Any particular reason why the algorithm has gained more popularity recently?
I think it's same reason why Designing Data Intensive Applications is in the top most popular books on safaribooks for quite some time now.
Many people are grinding for job interviews and many companies now copy FAANG and have a "systems design" round, Paxos/Raft is one of the key topics there, thus it's discovered by more and more people.
I suspect Rust and Jepsen have a hand to play in its growing popularity: personally, I'm much more comfortable implementing or using techniques, algorithms, and patterns in Rust that I would never have risked in a higher level language, let alone in C/C++. The risk calculus just never panned out in their favor because of maintenance and availability of talent concerns. The Rust libraries for Raft and Paxos are much more approachable than my past experience with other implementations and the language itself provides a higher level of confidence. Jepsen, likewise, makes testing distributed systems an approachable problem.
Case in point: I'm currently experimenting with Raft and Paxos with the intent of putting it into production for a service that streams live updates to subscribers over websockets with time sensitive business logic that must only be executed once per object on a timer. Before Rust, I would have just slung the whole problem over to whichever infrastructure team was a glutton for punishment so that they could set up Consol/kafka/debezium/whatever and set a reminder three months from now to revisit the issue.
I don't know if there is any specific reason with Paxos, but HN front-page topics always tend to be somewhat thematically-related, so no real surprise here.
One of those topics which hits the right balance of sufficiently useful, interesting and understandable without too much training.
Distributed consensus is quite hot - like blockchain tech. This is another aspect of that?
Baader-Meinhof phenomenon?
I was quietly thinking that as well these past few days. You see, my last submission on HN hit front page and was on the matter of consensus protocols. After that either my awarness on the topic increased or the frequency of postings about it did.
I think it's because distributed consensus is a pattern which is growing in use exponentially as people wake up to all the disadvantages of huge centralized services.
https://blog.the-pans.com/understanding-paxos/
It’s much easier to understand Paxos as a read-modify-write transaction. I wrote to Lamport about it. Here’s part of Lamport’s reply
It’s nice that you found an apparently new way to explain Paxos, but don’t expect everyone to find it helpful. “Grasping” or “intuitively understanding” an algorithm really means giving the reader a warm fuzzy feeling, and different people get warm fuzzy feelings from different things. What really matters is not warm fuzzy feelings, but the ability to rigorously prove that a correct algorithm is correct, and to find the error in an incorrect one.
That's awesome! Thanks for the new framing of the algorithm.
Lamport's reply seems unnecessarily dismissive. It's true that different framings and analogy are "weaker" than rigorous understanding. But they are also pathways to it! Digging into something complex is hard, and having a mental model of which pieces interact together and why is one way to make it approachable enough to actually get into the proofs and the rigor.
For me, Paxos alternates between 'very obvious' and 'there definitely is some problem in this'. Thinking about distributed system is hard.
Lose and loose are very different words and it drives me so crazy when people pick loose instead of lose that I have a difficult time finishing reading articles, emails or anything really that gets it wrong.
Not a native speaker, and have giant problems with spelling in my mother tongue, so I confuse words in writing all the time. Thing vs think is the worst.
That's why there's "fix typo" link at the end :-)
Typo: "loose" where "lose" was intended.