The subtleties of correct B+Tree implementation

I discussed earlier that B+Trees are a gnarly beast to implement correctly. On the face of it, it is a actually unusual assertion, as a result of they’re a fairly easy knowledge construction. What’s so advanced in regards to the implementation? You’ve got a set measurement web page, you add to it till it’s full, you then break up the web page, and you’re performed. What’s the trouble?
Right here is a straightforward state of affairs for web page splits, the next web page is totally full. We can’t match one other entry there:
Now, if we attempt to add one other merchandise to the tree, we’ll want to separate the web page, and the consequence might be one thing like this (we add an entry with a key: customers/050):
How did we break up the web page? The code for that is de facto easy:
As you possibly can see, for the reason that knowledge is sorted, we are able to merely take the final half of the entries from the supply, copy them to the brand new web page and name it a day. That is easy, efficient, and can normally work simply fantastic. The important thing phrase right here is normally.
Given a B+Tree that makes use of variable measurement keys, with a web page measurement of 4KB and a most measurement of 1 KB for the keys. On the face of it, this seems like a fairly good setup. If we break up the web page, we are able to make sure that we’ll have sufficient house to accommodate any legitimate key, proper? Nicely, simply so long as the info distribution is smart. It typically doesn’t. Let’s discuss a concrete state of affairs, we could? We retailer within the B+Tree an inventory of public keys.
This seems just like the picture beneath, the place we have now a single web page with 16 entries and three,938 bytes in use, and 158 bytes which can be free. Check out the info for a second, and also you’ll discover some fascinating patterns.
The info is split into two distinct varieties, EdDSA keys and RSA keys. As a result of they’re prefixed with their sort, all of the EdDSA keys are first on the web page, and the RSA keys are final. There’s a massive measurement distinction between the 2 forms of keys. And that seems to be an actual downside for us.
Take into account what is going to occur once we need to insert a brand new key to this web page. We nonetheless have room to a couple extra EdDSA keys, in order that isn’t actually that fascinating, however what occurs once we need to insert a brand new RSA key? There may be not sufficient room right here, so we break up the web page. Utilizing the algorithm above, we get the next tree construction publish break up:
Bear in mind, we have to add an RSA key, so we are actually going to go to the underside proper web page and attempt to add the worth. However there’s not sufficient room so as to add a bit greater than 512 bytes to the web page, is there?
What occurs subsequent is determined by the precise implementation. It’s attainable that you simply’ll get an error, or one other break up, or the tree will try and proceed and do one thing fully weird.
The important thing right here (pun supposed) is that despite the fact that the scenario seems comparatively easy, a wonderfully affordable selection can cover a fairly refined bug for a really very long time. It’s only once you hit the precise problematic circumstances that you simply’ll run into issues.
This has been a reasonably easy downside, however there are numerous such edge circumstances that could be hiding within the weeds of B+Tree implementations. that is without doubt one of the causes that working with manufacturing knowledge is such an enormous situation. Actual world knowledge is messy, it has unpredictable patterns and stuff that you simply’ll doubtless by no means consider. Additionally it is one of the best ways I’ve discovered to smoke out these particulars.