Now Reading
Performant database tree traversal with Rails

Performant database tree traversal with Rails

2023-07-12 12:41:11

We just lately solved an fascinating efficiency downside in our Rails API when performing a tree traversal.

We have now some code that figures out a 3-way merge for your database schemas. To do that, we have to calculate a merge base between two schemas (like git!).

Our Rails API retains observe of modifications to every PlanetScale database schema. We name every of those modifications a “schema snapshot,” much like a git commit that shops the state of the schema at a selected time. Every snapshot can have one or two mother and father. When merging branches, we carry out a breadth-first search on the historical past of every change till we discover the widespread ancestor between each branches. That is the merge base.

An image showing the merge base with two arrows coming out of it: one for alter table teams and another for alter table users. The alter table users has an arrow coming of it for create table payments.

Discovering the merge base entails checking every snapshot within the historical past of each branches.

Often, that is quick as a result of most databases solely have a pair modifications.

Others, although, would have 1000’s. That is the place we might run into efficiency points. Since we had been doing a database question for each step as we traversed the tree. Although every question was quick, the community time alone added up rapidly.

choose * from schema_snapshots the place id = 20
choose * from schema_snapshots the place id = 21
choose * from schema_snapshots the place id = 22
choose * from schema_snapshots the place id = 23
choose * from schema_snapshots the place id = 24
// *1000's extra queries*

This can be a traditional “N+1” efficiency downside. For every step within the course of we set off one other question.

Often, fixing these in Rails is sort of easy. There may be an contains technique for preloading the information we’d like. Sadly on this case, because of the information construction, the traditional
preloading methods don’t work. We needed to invent one thing new.

In-memory cache#

First, we began by creating an in-memory cache for storing the snapshots. This can be a short-term cache that solely exists to search out the merge base. In-memory is essential right here as a result of our major efficiency concern was attributable to the sheer variety of community requests.

class SnapshotMergeBaseCache
  attr_reader :retailer, :hits, :misses

  def initialize
    @retailer = {}
    @misses = 0
    @hits = 0

  def get(id)
    if @retailer[id]
      @hits += 1
      return @retailer[id]

    @misses += 1


  def add(node)
    @retailer[] = node

This offers us a spot to retailer all of our snapshots in reminiscence. It makes use of a easy hash the place the id is the important thing, and the worth is the snapshot information.

The add technique places a snapshot into the cache. The get technique is for retrieving it. The get technique additionally retains stats on the hits/misses. We used these stats to grasp how effectively it labored as soon as in manufacturing.

Preloading the cache#

Now that now we have the cache, the subsequent step is bulk-loading it with snapshots. Conveniently, the snapshot historical past is fairly predictable when discovering the merge base. We might preload the X most up-to-date snapshots for every department and drastically scale back the variety of journeys again to the database.

cache =

from_branch.recent_snapshots.restrict(FROM_PRELOAD_COUNT).every do |snap|
into_branch.recent_snapshots.restrict(INTO_PRELOAD_COUNT).every do |snap|

Now, when working our breadth-first search, we will use cache.get(id) to search out the subsequent node. It hits the cache normally, avoiding the community request and fixing our efficiency downside.

Rolling out & testing#

Making modifications like this may be difficult. There may be usually a large hole between what you anticipate to occur and the fact of manufacturing.

See Also

First, we would have liked to make sure it was correct. We ran just a few checks the place we calculated the merge base utilizing each the previous and new strategies for 1000’s of databases. This made us assured the brand new code was returning the right outcomes.

We then used characteristic flags to check rolling out the brand new code path and recorded information on the way it carried out. The hits and misses information proved helpful for fine-tuning the variety of snapshots we preloaded. After a pair iterations, we launched it to 100% of our clients.

Including an in-memory cache is only one manner of fixing this downside. This labored out greatest for us because of the excessive variety of snapshots we would have liked to traverse for some databases. It was additionally easy to layer this answer onto our present code with out many main modifications. This diminished the chance when rolling it out.

Database recursive CTE#

One possibility for fixing that is letting the database do the work. This may be completed with a recursive common table expression. With this, the database might observe the pointer to every file till it finds the widespread merge base.

Materialized path#

The materialized path technique is a technique to symbolize a graph in a SQL database. It shops the connection historical past in a single column, similar to 20/19/15/10/5/3/1. By doing this, you may then take a look at the historical past of two nodes and discover their widespread mum or dad.

This can be a nice possibility that works effectively for tree constructions with a identified restrict. In our case, storing 1000’s of relationships did not make this possible.

Need to supercharge your Rails database?

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top