πΎ Archived View for hanicef.me βΊ document βΊ b-plus-tree captured on 2022-07-16 at 13:30:23. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
(C) Gustaf AlhΓ€ll, released under CC BY 4.0
Original document can be found at hanicef.me
License can be found at creativecommons.org
The B+ tree algorithm is an algorithm used to search, insert and delete large amounts of data efficiently. It's commonly used in databases and file systems to improve lookup performance.
B+ trees stores data in so-called nodes, where each node either point to other nodes (called an internal node) or points to the stored data (called a leaf node). A typical B+ tree can be visualized like this:
βββ¬ββββ¬ββ¬ββββ¬ββ β β 5 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ β β β βββββββββββββββββββ β βββββββββββββββββββ β β β βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β β 4 β βββββ β 5 β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β
Here, each box is a node. Nodes contains values and keys that delimit them, where the keys are the squares that contains a number and the values are the squares that has an arrow pointing away from it. The upper box is an internal node; these types of nodes point to other nodes, internal or leaf, and each key acts as a delimiter to limit off the range of keys that exist at a specific value. The lower boxes are all leaf nodes, and these point to actual values. Additionally, the last value in leaf nodes acts as a reference to the next leaf node in the sequence (called the linked list, where each entry is a link), which allows iterating a range of values efficiently even if they span multiple nodes.
All B+ trees also have a "root node", which is the node on the top of the tree. This node is designated as the top of the tree, where all operations on the tree starts from. The root node starts off as a leaf node when the tree is created, but is replaced by an internal node once enough values have been inserted into the tree. It's also important to keep in mind that the root node can change as the tree changes, as the root node can run out of space and need a replacement.
A benefit to B+ trees is that all types of nodes have the same data structure, which simplifies implementation and allows reusing code for both node types. However, this also requires extra care so internal nodes aren't treated as leaf nodes and vice versa.
Let's say we want to find the key 4 in the example tree above. We first start at the root node, which is the upper node. Since this is an internal node, we are looking for a range and not an exact value. Looking at the tree, we can see we have two keys, 5 and 9. The ranges that it falls inbetween must satisfy a β€ k < b, where a is the lower value, b is the upper value and k is the key we are looking for. Since we're searching for 4, we must match a β€ 4 < b, and the only possibility for that is a β€ 4 < 5, or just 4 < 5, as there is no value before 5. This means that the key we are looking for is in the leftmost value.
Following this reference leads us to the leftmost leaf node. At leaf nodes, we are looking for an exact value, and looking at the tree we see that the rightmost key in that node is 4. As the corresponding values are on the left of the key, we go down the value between 2 and 4.
This process visually looks like this (bold lines indicate which path was taken):
βββ¬ββββ¬ββ¬ββββ¬ββ β β 5 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ β β β βββββββββββββββββββ β βββββββββββββββββββ βΜ¬ β β βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β β 4 β βββββ β 5 β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β βΜ¬ β β β β β β
Inserting values is a bit tricky. To avoid putting all values in the same node (which would defeat the purpose of the algorithm), a node should be split into two if the amount of values in it exceeds a specific size. This size limit is called the "degree", and is generally set to be the block size of the storage device to avoid wasting space between nodes. For the sake of illustration, though, the degree we will be using is 4.
Let's say we want to insert the key 6 into the into the example tree above. First, we start at the root node, then we make our way down the tree just like when we were searching. In this case, we are looking for 6, and 5 β€ 6 < 9, so the correct path to go down through is the middle one.
Now we are at the middle leaf node. Since this is a leaf node, we intend to insert a value here, and we need to make sure that the keys remain in order. The keys in this node are 5, 7 and 8, and the only place to put the new key in is between the first and second slot, as 5 < 6 < 7 < 8, thus keeping the keys ordered. This gives us this tree:
βββ¬ββββ¬ββ¬ββββ¬ββ β β 5 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ β β β βββββββββββββββββββ β βββββββββββββββββββ β β β βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β β 4 β βββββ β 5 β β 6 β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β
As we've inserted the key 6 into the middle leaf node, it now has 5 values. If the degree of the tree wasn't exceeded, we would've been done here, and the the insertion could end. However, our degree is 4, which means that the middle leaf node is too large, and need to be split. Each node should be split evenly while maintaining the order of the keys. To do this, we have to split the node between the 6th and 7th key, giving us two nodes, one with the keys 5 and 6, and one with the keys 7 and 8. Finally, to not break iteration, the first of the two nodes have to be linked to the next. This gives us a tree like this:
βββ¬ββββ¬ββ¬ββββ¬ββ β β 5 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ β β β βββββββββββββββββββ β βββββββββββββββββββββββββ β β β βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β β 4 β βββββ β 5 β β 6 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β
Finally, all we need to do in insert a new key in the internal node that points to the new leaf node. To do this, we take the first key in the new node, which is 7, and insert it into the internal node above. As the nodes always need to remain in order, we need to make sure that the key is inserted before the smallest key that is larger than the one we are inserting. In this case, this means that the key should be inserted before 9, as 5 is smaller than 7, making it the wrong place to insert it. The value is then inserted between those two keys and points to the new node, giving us the final tree:
βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ β β 5 β β 7 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ β β β βββββββββββββββββββ βββββββββββββββββββ β βββββββ β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β β 4 β βββββ β 5 β β 6 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β
We are now done with the insertion. However, to illustrate how the root node is split, let's insert another key, 3, into the tree. We start off as previously, making our way down to a leaf node from the root node. In this case, 3 < 5, so we will make our way down the leftmost value, leading us to the leftmost leaf node. Now, we insert as before, keeping the order of the keys the same. This time, it gets inserted between 2 and 4, as 1 < 2 < 3 < 4. This gives us this tree:
βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ β β 5 β β 7 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ β β β βββββββββββββββββββ βββββββββββββββββββ β βββββββ β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β β 3 β β 4 β βββββ β 5 β β 6 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β β
Since we now have 4 values in the leaf node, which exceeds the degree, we need to split it. As before, we split it in the middle, giving us two nodes, one with 1 and 2, and one with 3 and 4. Then we move up one step and insert the first key of the last leaf node. In this case, that's 3, so we insert it before the smallest value that is larger than 3. Since all keys are larger than 3, we insert it at the beginning of the node, along with the value following it. This gives us a tree like this:
βββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ¬ββββ¬ββ β β 3 β β 5 β β 7 β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ΄ββββ΄ββ βββββββββββββββββββ β β β βββββββββββββββββββββββββ β βββββββββββββ β βββββββββββββ β β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β βββββ β 5 β β 6 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β β
This is where it gets interesting. Since the internal node above now exceeds the degree, it should be split. However, since this is the root node, there's no parent node that we can insert the delimiting key into. To solve this problem, we first split the internal node, as usual, then we create a new internal node that contains only one key: the last key of the second node.
Note than unlike when we're splitting leaf nodes, we should also remove the key that we're moving up to the parent node from the child. While keeping that key is not inherently harmful for the algorithm itself, it's pointless to keep is as the value below it would be impossible to reach due to the parent node delimiting it. This gives us a tree that looks like this:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββ ββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β 5 β β β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ βββββββββββββββββββ β β β βββββββββββββββββββββββββ β βββββββββββββ β βββββββββββββ β β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β βββββ β 5 β β 6 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β β
Continuing on the tree above, let's say we want to remove 5 from the tree. Starting off at the root node at the top, we first make our way down the tree as always. In this case, 5 < 7, so we go down the left value. Unlike other cases, however, we've now reached an internal node, which means we need to keep going down the tree. In this case, 5 β€ 5, so we need to go down the last value in the node. This finally brings us to the leaf node containing 5 and 6. Now, all we need to do is to remove the key 5 and it's corresponding value on the left, giving us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββ ββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β 5 β β β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ βββββββββββββββββββ β β β βββββββββββββββββββββββββ β βββββββββββββ β βββββββββββββ β β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β βββββ β 6 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β β
At this point, we're done, but to make things more interesting, let's assume that we also want to remove the key 6. Since we are already at the leaf node that contains 6, we can just remove the key 6 and it's corresponding value to the left. When that's done, we are left with an empty node; all it contains is a link to the next value in the node:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββ ββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β 5 β β β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ ββββββββββββββββ β β β ββββββββββββββββββββββ β ββββββββββ β ββββββββββ β β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β βββββ βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β
In this case, keeping the node around is wasteful, so it's best to just erase the node completely and re-linking the previous value to the next node. To do this, we first save the reference to the next node in the linked list, then we proceed with removing the value. This gives us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββ ββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β 5 β β β β 9 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ βββββββββββββ β β β β β β β β βββββββββββββ β β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β ββ? β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β
Notice that one of the values in one of the internal nodes as well as the linked list in the previous leaf node is now pointing towards a question mark. This question mark represents the erased node, which makes the values now point to nothing (this is called a "dangling reference"). To fix this, we need to update the values so none of them are pointing towards the deleted node. To start, we first make our way up to the parent node, then we remove the value that is now dangling, as well as one of the adjacent keys. Since there's only one adjacent key to that value, which is 5, we remove that key. This gives us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β βββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β β β 9 β β βββ΄ββββ΄ββ βββ΄ββββ΄ββ βββββββββββββ β β β β β β βββββββββββββ β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β ββ? β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β
All we need to do now is to fix the previous leaf node, which still points to the dangling node. To do this, we go down the value right before the now-deleted value, bringing us to the node containing 3 and 4. This node is the one containing the dangling pointer, so all we need to do is to update the link so it points to the next leaf node, which we can easily do here as we saved the reference to the next node. This gives us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β βββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β β β 9 β β βββ΄ββββ΄ββ βββ΄ββββ΄ββ βββββββββββββ β β β β β β βββββββββββββ β β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 1 β β 2 β βββββ β 3 β β 4 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β β β
Now, to illustrate how deletion on internal nodes work, let's remove the value 1 and 2 from the tree. When removing multiple values, it's often faster to start with the lowest value and go up, as it allows using the link between leaf nodes. Because of this, we'll first look up 1, and since 1 < 7, the first branch is the one to go down. This leads us to the internal node containing 3, and since 1 < 3, we go down the first value again.
At this point, we've reached the leaf node containing 1 and 2. These are the two values that we want to remove, so we do exactly that. This leaves us with an empty node, so just as before, we remove that node. Now the tree looks like this:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β βββ¬ββββ¬ββ βββ¬ββββ¬ββ β β 3 β β β β 9 β β βββ΄ββββ΄ββ βββ΄ββββ΄ββ β β β β ? β β βββββββββββββ β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 3 β β 4 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β
Then, we move up to the parent node. Since the value pointing to the child node is now dangling, we need to remove it as well as one of the adjacent keys. The only adjacent key is 3, which is on the right. This gives us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β βββ βββ¬ββββ¬ββ β β β β 9 β β βββ βββ΄ββββ΄ββ β β β β β βββββββββββββ β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 3 β β 4 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β
Since there's only one value left in the internal node, there's no point in keeping the node around anymore, and can therefore be removed to reduce the tree's complexity. However, to avoid losing values, we have to replace the parent's value with the single remaining value of the current node. This gives us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β β βββ¬ββββ¬ββ β β β 9 β β β βββ΄ββββ΄ββ β β β β β βββββββββββββ β β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 3 β β 4 β βββββ β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β β β
An interesting thing to note, however, is that we don't have to update any links. This is because there is no node before the one that was deleted. As a result, no reference was broken and needed to be updated.
Finally, to illustrate how deletion on root nodes work, let's say we want to remove the keys 3 and 4 from the tree. As always, we first search for the smallest value, which is 3, starting from the root node. 3 < 7, so we go down the first value, leading us to the leaf node containing 3 and 4. As always, we remove the keys 3 and 4 and it's corresponding value on the left. Since this node is now empty, we can then remove the entire node, giving us this tree:
βββ¬ββββ¬ββ β β 7 β β βββ΄ββββ΄ββ β β β βββββββ β β β βββ¬ββββ¬ββ β β β 9 β β ? βββ΄ββββ΄ββ β β β βββββββββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β
Then, we can proceed back up to the root node to remove the value that is now a dangling reference and one of the adjacent keys. The only adjacent key is 7, so we remove that key, giving us this tree:
βββ β β βββ β β β βββ¬ββββ¬ββ β β 9 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β
In this case, we have a root node that only have one value left. As before, we can remove empty internal nodes to reduce the tree's complexity, but now there is a catch: there is no parent node that we can link the child node to. This can easily be fixed, though, by simply moving the root node one step down after removing the current root node. Doing so finally leaves us with this tree:
βββ¬ββββ¬ββ β β 9 β β βββ΄ββββ΄ββ β β βββββββ βββββββ β β βββ¬ββββ¬ββ¬ββββ¬ββ βββ¬ββββ¬ββ¬βββββ¬ββ¬βββββ¬ββ β β 7 β β 8 β βββββ β 9 β β 10 β β 12 β β βββ΄ββββ΄ββ΄ββββ΄ββ βββ΄ββββ΄ββ΄βββββ΄ββ΄βββββ΄ββ β β β β β
Now, the internal node containing the key 9 is the new root node.
B+ trees strength lies in it's ability to not just search efficiently but also insert and remove keys efficiently, too. Because of this, B+ trees are often used for managing and indexing huge amounts of data, making it suitable for relational databases and file systems, among other things.
It's time complexity is divided into two components, one for navigating between nodes (lowercase n) and one for searching in a node (uppercase N). The complexity for each of these are:
f(n) = O(log(n)) f(N) (best) = O(1) f(N) (worst) = O(N)
Because of the divided complexity, the performance of the B+ tree can vary based on the amount of keys and the degree of the tree. Therefore, there is no optimal configuration that suits all purposes.
One catch to be aware of when working with B+ trees are the risks of a tree becoming unbalanced. A tree can become unbalanced when there are a lot of very small nodes, meaning that the tree spends a lot of time making it's way down internal nodes which slows down operations. This commonly occurs when values are deleted unevenly, and is often impossible to predict in practice.
Therefore, it's sometimes necessary to rebalance a tree, but this is often a very costly operation. The easiest way of rebalancing a tree is to create a new one and inserting all keys and values in order, from the smallest to the largest, then replacing the new tree with the old one. However, this is often a slow and resource-consuming process, especially on large trees, so different methods are sometimes preferred. Another way of rebalancing a tree is by merging adjacent nodes by moving keys and values to the adjacent nodes. That way, empty nodes can then be deleted and will therefore reduce unnecessary nodes. This method is less resource-consuming, but might not result in an optimal tree and is also much harder to implement in practice.
When implementing a B+ tree, it's important to keep in mind that there are a few things that can vary in implementations. One common difference is if the degree limits the amount of keys or values. In this case, we limited it by values, but sometimes it might be more convenient to limit it by keys, which is basically the amount of values - 1.
Another strategy that is used for B+ trees is to merge adjacent nodes when deleting values, to quickly reduce tree complexity. However, doing this is much harder than just deleting empty nodes, as care must be taken to make sure leaf nodes arent merged with internal nodes while the tree also remains in order. Therefore, this is not covered in this document, but I do encourage you to do your research on this if you want to, as while it might be challenging, it's also rewarding if you manage to pull it off.