Intro – Simply One Single Historical past

Josh combines the benefits of monorepos with these of multirepos by leveraging a blazingly-fast,
incremental, and reversible implementation of git historical past filtering.
Historically, historical past filtering has been seen as an costly operation that ought to solely be
carried out to repair points with a repository, resembling purging huge binary recordsdata or eradicating
accidentally-committed secrets and techniques, or as a part of a migration to a special repository construction, like
switching from multirepo to monorepo (or vice versa).
The implementation shipped with git (git-filter department
) is barely usable as a once-in-a-lifetime
final resort for something however tiny repositories.
Sooner variations of historical past filtering have been applied, resembling
git-filter-repo or the
BFG repo cleaner. These, whereas a lot sooner, are
designed for doing occasional, damaging upkeep duties, normally with the thought already in thoughts
that after the filtering is full the previous historical past ought to be discarded.
The concept behind josh
began with two questions:
- What if historical past filtering might be so quick that it may be a part of a traditional, on a regular basis workflow,
working on each single push and fetch with out the consumer even noticing? - What if historical past filtering was a non-destructive, reversible operation?
Below these two premises a filter operation stops being a upkeep process. It seamlessly relates
histories between repos, which can be utilized by builders and CI techniques interchangeably in no matter
approach is best suited to the duty at hand.
How is that this doable?
Filtering historical past is a extremely predictable process: The set of filters that are typically used for any
given repository is proscribed, such that the enter to the filter (a git department) solely will get modified in
an incremental approach. Thus, by conserving a persistent cache between filter runs, the work wanted to
re-run a filter on a brand new commit (and its historical past) turns into proportional to the variety of modifications
for the reason that final run; The work to filter not is dependent upon the whole size of the historical past.
Moreover, most filters additionally don’t depend upon the dimensions of the timber.
What has lengthy been identified to be true for performing merges additionally applies to historical past filtering: The
extra usually it’s carried out the much less work it takes every time.
To ensure filters are reversible we now have to limit the sort of filter that can be utilized; It’s
not doable to jot down arbitrary filters utilizing a scripting language like is allowed in different instruments.
To nonetheless be capable of cowl a variety of use circumstances we now have launched a domain-specific language to
categorical extra advanced filters as a mix of easier ones. Other than guaranteeing
reversibility, using a DSL additionally allows pre-optimization of filter expressions to attenuate each
the quantity of labor to be carried out to execute the filter in addition to the on-disk dimension of the persistent
cache.
From Linus Torvalds 2007 speak at Google about git:
Viewers:
Can you’ve got simply part of recordsdata pulled out of a repository, not all the repository?
Linus:
You possibly can export issues as tarballs, you may export issues as particular person recordsdata, you may rewrite the
complete historical past to say “I need a new model of that repository that solely accommodates that half”, you
can do this, it’s a pretty costly operation it is one thing you’d do for instance once you
import an previous repository right into a one big git repository after which you may cut up it afterward to be
a number of smaller ones, you are able to do it, what I’m attempting to say is that it’s best to usually attempt to
keep away from it. It is not that git can’t deal with big initiatives, git wouldn’t carry out in addition to it could
in any other case. And you’ll have points that you simply want you did not not have.So I’m skipping this concern and going again to the efficiency concern. One of many issues I need to
say about efficiency is that lots of people appear to assume that efficiency is about doing the
similar factor, simply doing it sooner, and that isn’t true.That isn’t what efficiency is all about. If you are able to do one thing actually quick, very well, individuals
will begin utilizing it in another way.