Replication Blues

A large tech corporation is moving to OpenLDAP. They spent months and months evaluating and testing. They finally go into prodction and right away replication goes into "replication storms". The directory is running OK but the servers are way too busy handling these odd and erroneous replication events in the thousands a second.

So the "A Team" jumps on it. They set up test clusters and hammer the exact same code-base for days, trying to reproduce it. Can't. After a while a lot of theories prove false. All the occasional spurts of hope are dashed when double-checks show analysys errors. Finally, a seemingly related fix to a newer release is tried. No reproduction. Just a best shot to get the customer stable. So far so good.

An exhausted A Team. A cautiously happy customer. An even more cautiously happy us.

Replication Blues was published on 2022-02-21