Scaling Rust builds with Bazel
Printed: 2023-03-20
Final up to date: 2023-03-20
As of March 2023, the Internet Computer repository accommodates about 600 thousand strains of Rust code.
Final yr, we began utilizing Bazel as our main construct system, and we could not have been happier with the change.
This text explains the motivation behind this transfer and the migration course of particulars.
Many Rust newcomers, particularly these with a C++ background, swear by cargo.
Rust tooling is incredible for novices, however we grew to become dissatisfied with cargo because the challenge dimension elevated.
Most of our complaints fall into two classes.
Cargo is not a build system
Cargo is the Rust package deal supervisor.
Cargo downloads your Rust package deal’s dependencies, compiles your packages, makes distributable packages, and uploads them to crates.io.
Let’s acknowledge the elephant within the room: cargo will not be a construct system; it is a software for constructing and distributing Rust packages.
It may well construct Rust code for a particular platform with a given set of options in a single invocation.
Cargo selected simplicity and ease of use over generality and scalability; it doesn’t observe dependencies nicely or assist arbitrary construct graphs.
These trade-offs make cargo straightforward to choose up however impose extreme limitations in a posh challenge.
There are workarounds, equivalent to xtask, however they are going to solely get you to this point.
Let’s think about an instance of what lots of our exams should do:
- Construct a sandbox binary for executing WebAssembly.
- Construct a WebAssembly program.
- Put up-process the WebAssembly program (strip some customized sections and compress the end result, for instance).
- Construct and execute a take a look at binary that launches the sandbox binary, sends the WebAssembly program to the sandbox and interacts with this system.
This straightforward state of affairs requires invoking cargo 3 times with totally different arguments and appropriately post-processing and threading the construct artifacts.
There isn’t any method to specific such a take a look at utilizing cargo alone; one other construct system should orchestrate the take a look at execution.
Poor caching and dependency tracking
Like infamous Make, cargo depends on file modification timestamps for incremental builds.
Updating code feedback or switching git branches can invalidate cargo’s cache, inflicting lengthy rebuilds.
The sccache software can enhance cache hits, however we noticed no enchancment from utilizing it on our continuous integration (CI) servers.
Cargo’s dependency monitoring services are comparatively simplistic.
For instance, we are able to inform cargo to rerun build.rs if some enter recordsdata or surroundings variables change.
Nonetheless, cargo has no thought which recordsdata or different sources exams may be accessing, so it should be conservative with caching.
Consequently, we frequently construct far more than we have to, and typically our builds fail with complicated errors that go away after cargo clear
.
Over the challenge life, we used varied instruments to mitigate the cargo’s limitations, with blended success.
The nix days
Once we began the Rust implementation in mid-2019, we relied on nix to construct all our software program and arrange the event surroundings in a cross-platform method (we develop each on macOS and Linux).
As our code base grew, we began to really feel nix’s limitations.
The unit of caching in nix is a derivation.
If we needed to take full benefit of nix’s caching capabilities, we must “nixify” all our exterior dependencies and inner Rust packages (one derivation per Rust package deal).
After an extended combat with construct reproducibility points, our superb dev-infra group applied fine-grained caching utilizing the cargo2nix challenge.
Sadly, most builders within the group had been uncomfortable with nix.
It grew to become a continuing supply of confusion and misplaced developer productiveness.
Since nix has a steep studying curve, only some nix wizards might perceive and modify the construct guidelines.
This nix-alienation bifurcated our construct surroundings: the CI servers constructed the code with nix-build, and builders constructed the code by coming into the nix-shell and invoking cargo.
The iceberg
The ultimate blow to the nix story got here round late-2020, near the community launch.
Our safety group selected Ubuntu because the deployment goal and insisted that manufacturing binaries hyperlink in opposition to the usually up to date system libraries (libc, libc++, openssl, and many others.) the deployment platform offers.
This setup is difficult to realize in nix with out compromising correctnessWe thought-about utilizing patchelf, but it surely’s a foul thought generally: libc++
from nix packages will be incompatible with the one put in on the deployment platform..
Moreover, the infrastructure group obtained a number of new members unfamiliar with nix and determined to modify to a extra acquainted expertise, Docker containers.
The group applied a brand new construct system that runs cargo builds inside a docker container with the variations of dynamic libraries equivalent to these within the manufacturing surroundings.
The brand new system grew organically and ultimately developed right into a scorching mess of 100 GitLab Yaml configuration recordsdata calling shell and python scripts within the right order.
These scripts used the identified filesystem areas and surroundings variables to go the construct artifacts round.
Most integration exams ended up as shell scripts anticipated some inputs that the CI pipeline produces.
The brand new Docker-based construct system misplaced the granular caching capabilities of nix-build.
The infra group tried to construct a customized caching system however ultimately deserted the challenge.
Cache invalidation is a difficult drawback certainly.
With the brand new system, the chasm between the CI and improvement environments deepened additional as a result of the nix-shell did not go wherever.
The builders continued to make use of nix-shell for on a regular basis improvement.
It is onerous to pinpoint the precise motive.
I attribute that to the truth that coming into the nix-shell is much less invasive than coming into a docker container, and nix-shell doesn’t require working in a digital machine on macOS (Rust compile occasions are sluggish).
Additionally, the infra group was so busy rewriting the construct system that enhancing the on a regular basis developer expertise was out of attain.
I name this setup an “iceberg”: on the floor, a developer wanted solely nix and cargo to work on the code, however in follow, that was solely 10% of the story.
Since most exams required a CI surroundings, builders needed to create merge requests to examine whether or not their code labored past the essential unit exams.
The CI did not know builders had been thinking about working a particular take a look at and executed the whole take a look at suite, losing scarce computing sources and slowing the event cycle.
The exams collected over time, the load on the CI system grew, and ultimately, the builds grew to become unbearably sluggish and flaky.
It was time for an additional change.
Enter Bazel
Amongst a few dozen construct methods I labored with, Bazel is the one one which made sense to meIt may additionally nicely be that I never learned to do anything without involving protocol buffers..
Certainly one of my favourite options of Bazel is how express and intuitive it’s for on a regular basis use.
Bazel is sort of a good videogame: it is simple to be taught and difficult to grasp.
It is easy to outline and wire construct targets (that is what most engineers do), however including new construct guidelines requires some experience.
Each engineer at Google can write right construct recordsdata with out understanding a lot about Blaze (Google’s inner variant of Bazel).
The construct recordsdata are verbose bordering plain boring, but it surely’s a superb factor.
They inform the reader exactly what the module’s artifacts and dependencies are.
Bazel provides many options, however we principally cared in regards to the following:
- Bazel is extensible sufficient to cowl all our use circumstances.
Bazel gracefully dealt with every thing we threw at it: Linux and macOS binaries, WebAssembly applications, OS photos, Docker containers, Motoko programs, TLA+ specifications, and many others.
One of the best half is: We are able to additionally mix and blend these artifacts in any method we like. - Aggressive caching.
The sandboxing function ensures that construct actions don’t use undeclared dependencies, making it a lot safer to cache construct artifacts and, most significantly for us, take a look at outcomes. - Remote caching.
We use the cache from our CI system to hurry up developer builds. - Distributed builds.
Bazel can distribute duties throughout a number of machines to complete builds even sooner. - Visibility control.
Bazel permits package deal authors to mark some packages as inner to forestall different groups from importing the code.
Controlling dependency graphs is essential for quick builds.
Much more importantly, Bazel unifies our improvement and CI environments.
All our exams are Bazel exams now, that means that each developer can run any take a look at domestically.
At its coronary heart, our CI job is bazel take a look at --config=ci //...
.
One good function of our Bazel setup is that we are able to configure variations of our exterior dependencies in a single file.
Paradoxically, cargo builders applied assist for workspace dependency inheritance a few weeks after we completed the migration.
You might be such a naïve tutorial. I requested you find out how to do it, and also you informed me what I ought to do. I do know what I have to do. I simply do not know find out how to do it.
The concept of migrating the construct system got here from a number of engineers (learn Xooglers) who had been bored with preventing with lengthy construct occasions and poor tooling.
To our shock, a number of volunteers expressed curiosity in becoming a member of the rise up at its earliest stage.
We wanted a plan for executing the change and getting the administration’s buy-in.
The primary rule of enormous codebases is to introduce important modifications regularly. This part describes our technique of migration, which took a number of months.
Build a prototype
We began migration by constructing a prototype.
We created a pattern repository that mimicked the options of our code base that we anticipated to deliver essentially the most hassle, equivalent to producing Protocol Buffer sorts utilizing the prost library, compiling Rust to WebAssembly and native code in a single invocation, and establishing rust-analyzer assist.
As soon as we knew that essentially the most complicated issues we face have an answer at a small scale, we offered the case to the administration, defined the ultimate imaginative and prescient, how many individuals and time we wanted, and obtained a inexperienced gentle.
Now the true work started.
Dig a tunnel from the middle
Our CI was a multi-stage course of that handled cargo as a black field producing binaries from the supply code.
There have been two main work streams in our mission to reduce construct occasions:
- Substitute the spaghetti of YAML recordsdata and scripts utilizing cargo as a black field with neat Bazel targets with express dependencies.
This alteration would deliver readability and confidence to our CI routines and allow builders to entry the construct artifacts with out a whole CI run. - Use Bazel to construct binaries from Rust code straight, bypassing cargo.
This alteration would considerably enhance our cache hit price and permit us to keep away from working costly exams on each change.
These work streams require totally different ability units, and we needed to start out engaged on them in parallel.
To unblock the primary workstream, we created a easy Bazel rule, cargo_build
, that handled cargo as a black field and produced binaries for deployment and exams.
This manner, our infrastructure consultants might work out find out how to construct OS photos with Bazel, whereas our Rust consultants might proceed with the Rust code “bazelification”.
Run CI early
We added the bazel take a look at //...
job to our CI pipeline as quickly as we had the primary BUILD
file in our repository.
The additional job barely elevated the CI wait time however ensured that packages transformed to Bazel would not degrade over time.
As a aspect profit, builders began to expertise Bazel-related CI failures throughout their code refactorings.
They actively realized to switch BUILD
recordsdata and regularly grew to become accustomed to the brand new world.
One package at a time
The purpose of the second workstream was changing a number of hundred Rust packages to the brand new construct guidelines.
We began from the core packages on the backside of the stack that wanted particular remedy, after which challenge volunteers bazelified a number of packages at a time after they had a free time slot.
Two little methods helped us with this tedious activity:
- Automation.
The infra group invested a number of days in a script that transformed aCargo.toml
file to a 90% fullBUILD
file matching our tips.
Many packages required guide remedy, and the generatedBUILD
file was removed from optimum, however the script boosted the conversion course of considerably. - Progress visualization.
One group member wrote a utility visualizing the migration progress by inspecting the cargo dependency graph and looking for packages with and with outBUILD
recordsdata.
This little software had an amazing impact on our morale.
Ultimately, we might construct and take a look at each piece of our Rust code with Bazel. We then switched the OS construct from the cargo_build
bootstrapping to the binaries constructed from the supply utilizing Bazel guidelines.
Ensure test parity
The final piece of the puzzle was guaranteeing the take a look at parity.
Cargo discovers exams automagically, whereas Bazel BUILD
recordsdata require express targets for every sort of take a look at (crate exams, doc exams, integration exams).
The infra group wrote one other little utility that analyzed the outputs of cargo and Bazel construct pipelines and in contrast the listing of executed exams, guaranteeing that the volunteers accounted for each take a look at throughout the migration and that builders did not neglect to replace BUILD
recordsdata after they added new exams.
Rough edges
Bazel solves most of our wants relating to constructing artifacts, however we now have but to duplicate a number of cargo options associated to developer movement.
Due to these points, we nonetheless hold cargo recordsdata round. Fortunately, this doesn’t have an effect on our CI occasions a lot as a result of the one examine we want is that cargo examine --tests --benches
succeeds.
The Bazel migration challenge was a definitive success.
I thank our gifted infra group and all of the volunteers who contributed to the challenge.
Particular thanks go to the builders and maintainers of the rules_rust Bazel plugin, who unblocked us many occasions throughout the migration, particularly Andre Uebel and Daniel Wagner-Hall, and to Alex Kladov for taking the time to share his rust-analyzer experience.
You’ll be able to talk about this text on Reddit.