Is coding in Rust as dangerous as in C++?
Is coding in Rust as dangerous as in C++?
A sensible comparability of construct and check velocity between C++ and Rust.
Written by strager on
C++ is infamous for its sluggish construct instances. “My code’s
compiling” is a meme within the programming world, and C++ retains this
joke alive.
Tasks like Google Chromium take
an hour to build
on model new {hardware} and
6 hours to build
on older {hardware}. There are tons of
documented tweaks
to make builds quicker, and
error-prone shortcuts
to compile much less stuff. Even with hundreds of {dollars} of cloud
computational energy, Chromium construct instances are nonetheless on the order of half
a dozen minutes. That is fully unacceptable to me. How can folks
work like this daily?
I’ve heard the identical factor about Rust: construct instances are an enormous drawback.
However is it actually an issue in Rust, or is that this anti-Rust
propaganda? How does it examine to C++’s construct time drawback?
I deeply care about construct velocity and runtime efficiency. Quick build-test
cycles make me a productive, blissful programmer, and I bend over backwards
to make my software program quick so my prospects are blissful too. So I made a decision to
see for myself whether or not Rust construct instances had been as dangerous as they declare. Right here
is the plan:
- Discover an open supply C++ undertaking.
- Isolate a part of the undertaking into its personal mini undertaking.
- Rewrite the C++ code line-by-line into Rust.
- Optimize the construct for each the C++ undertaking and the Rust undertaking.
- Examine compile+check instances between the 2 tasks.
My hypotheses (educated guesses, not conclusions):
-
The Rust port could have barely fewer strains of code than the C++
model.Most features and strategies have to be declared twice in C++ (one in
the header, and one within the implementation file). This is not wanted
in Rust, lowering the road rely. -
For full builds, C++ will take longer to compile than Rust (i.e.
Rust wins).That is due to C++’s
#embrace
characteristic and C++
templates, which have to be compiled as soon as per .cpp file. This
compilation is completed in parallel, however parallelism is imperfect. -
For incremental builds, Rust will take longer to compile than
C++ (i.e. C++ wins).It is because Rust compiles one crate at a time, relatively than
one file at a time like in C++, so Rust has to take a look at extra code
after every small change.
What do you assume? I polled my viewers to get their opinion:
42% of individuals assume that C++ will win the race.
35% of individuals agree with me that “it relies upon™”.
And 17% of individuals assume Rust will show us all fallacious.
Try the
optimizing Rust build times part
if simply need to make your Rust undertaking construct quicker.
Try the
C++ vs Rust build times part if
you simply need the C++ vs Rust comparisons.
Let’s get began!
Making the C++ and Rust check topics
Discovering a undertaking
If I will spend a month rewriting code, what code ought to I port? I
selected a number of standards:
- Few or no third-party dependencies. (Customary library is okay.)
-
Works on Linux and macOS. (I do not care a lot about construct instances on
Home windows.) -
Intensive check suite. (With out one, I would not know if my Rust code
was appropriate.) -
Somewhat little bit of the whole lot: FFI; pointers; normal and {custom}
containers; utility courses and features; I/O; concurrency; generics;
macros; SIMD; inheritance
The selection is straightforward: port the undertaking I have been engaged on for the previous
couple of years! I will port the JavaScript lexer within the
quick-lint-js project.
Trimming the C++ code
The C++ portion of quick-lint-js comprises over 100k
SLOC. I am not going to port that a lot code to Rust; that will
take me half a 12 months! Let’s as an alternative give attention to simply the JavaScript lexer.
This pulls in different elements of the undertaking:
- Diagnostic system
- Translation system (used for diagnostics)
-
Numerous reminiscence allocators and containers (e.g. bump allocator;
SIMD-friendly string) -
Numerous utility features (e.g. UTF-8 decoder; SIMD intrinsic
wrappers) - Take a look at helper code (e.g. {custom} assertion macros)
- C API
Sadly, this subset does not embrace any concurrency or I/O. This
means I can not check the compile time overhead of Rust’s
async
/await
. However that is a small a part of
quick-lint-js, so I am not too involved.
I will begin the undertaking by copying all of the C++ code, then deleting code I
knew was not related to the lexer, such because the parser and LSP server. I
really ended up deleting an excessive amount of code and had so as to add some again in. I
saved trimming and trimming till I could not trim no extra. All through the
course of, I saved the C++ assessments passing.
After stripping the quick-lint-js code right down to the lexer (and the whole lot
the lexer wants), I find yourself with about 17k SLOC of C++:
C++ SLOC |
|
---|---|
src | 9.3k |
check | 7.3k |
subtotal | 16.6k |
dep: Google Take a look at | 69.7k |
The rewrite
How am I going to rewrite hundreds of strains of messy C++ code? One file
at a time. Here is the method:
- Discover a good module to transform.
-
Copy-paste the code and assessments, search-replace to repair some syntax, then
maintain working cargo check till the construct and assessments go. -
If it seems I wanted one other module first, go to step 2 for that
wanted module, then come again to this module. - If I am not executed changing the whole lot, go to step 1.
There’s one main distinction between the Rust and C++ tasks which
may have an effect on construct instances. In C++, the diagnostics system is applied
with loads of code era, macros, and constexpr
. In
the Rust port, I take advantage of code era, proc macros, regular macros, and a
sprint of const
. I’ve heard claims that proc macros are
sluggish, and different claims that proc macros are solely sluggish as a result of they’re
normally poorly written. I hope I did a superb job my with my proc macros.
????
The Rust undertaking seems to be barely bigger than the C++ undertaking:
17.1k SLOC of Rust in comparison with 16.6k SLOC of C++:
C++ SLOC |
Rust SLOC | C++ vs Rust SLOC | |
---|---|---|---|
src | 9.3k | 9.5k |
+0.2k (+1.6%) |
check | 7.3k | 7.6k |
+0.3k (+4.3%) |
subtotal | 16.6k | 17.1k |
+0.4k (+2.7%) |
dep: Google Take a look at | 69.7k | ||
dep: autocfg | 0.6k | ||
dep: lazy_static | 0.4k | ||
dep: libc | 88.6k | ||
dep: memoffset | 0.6k |
Optimizing the Rust construct
I care so much about construct instances. Subsequently, I had already optimized construct
instances for the C++ undertaking (earlier than trimming it down). I must put in a
related quantity of effort into optimizing construct instances for the Rust
undertaking.
Let’s strive these items which may enhance Rust construct instances:
Quicker linker
My first step is to profile the construct. Let’s first profile utilizing the
-Zself-profile rustc flag. In my undertaking, this flag outputs two totally different information. In one of many
information, the run_linker
section stands out:
Merchandise | Self time | % of complete time |
---|---|---|
run_linker | 129.20ms | 60.326 |
LLVM_module_codegen_emit_obj | 23.58ms | 11.009 |
LLVM_passes | 13.63ms | 6.365 |
Prior to now, I efficiently improved C++ construct instances by switching to the
Mold linker. Let’s strive it
with my Rust undertaking:
Disgrace; the advance, if any, is barely noticeable.
That was Linux. macOS additionally has alternate options to the default linker: lld
and zld. Let’s strive these:
On macOS, I additionally see little to no enchancment by switching away from the
default linker. I think that the default linkers on Linux and macOS
are doing a adequate job with my small undertaking. The optimized linkers
(Mould, lld, zld) shine for large tasks.
Cranelift backend
Let us take a look at the
-Zself-profile
profiles once more. In one other file, the
LLVM_module_codegen_emit_obj
and
LLVM_passes
phases stood out:
Merchandise | Self time | % of complete time |
---|---|---|
LLVM_module_codegen_emit_obj | 171.83ms | 24.274 |
typeck | 57.50ms | 8.123 |
eval_to_allocation_raw | 54.56ms | 7.708 |
LLVM_passes | 50.03ms | 7.068 |
codegen_module | 40.58ms | 5.733 |
mir_borrowck | 36.94ms | 5.218 |
I heard speak about different rustc backends to LLVM, specifically Cranelift.
If I construct with the
rustc Cranelift backend, -Zself-profile seems promising:
Merchandise | Self time | % of complete time |
---|---|---|
outline operate | 69.21ms | 12.307 |
typeck | 57.94ms | 10.303 |
eval_to_allocation_raw | 55.77ms | 9.917 |
mir_borrowck | 37.44ms | 6.657 |
Sadly, precise construct instances are worse with Cranelift than with
LLVM:
Compiler and linker flags
Compilers have a bunch of knobs to hurry up builds (or sluggish them down).
Let’s strive a bunch:
- -Zshare-generics=y (rustc) (Nightly solely)
- -Clink-args=-Wl,-s (rustc)
- debug = false (Cargo)
- debug-assertions = false (Cargo)
- incremental = true and incremental = false (Cargo)
- overflow-checks = false (Cargo)
- panic="abort" (Cargo)
- lib.doctest = false (Cargo)
- lib.check = false (Cargo)
Be aware: fast, -Zshare-generics=y is identical as
fast, incremental=true however with the
-Zshare-generics=y flag enabled. Different bars exclude
-Zshare-generics=y as a result of that flag will not be secure (thus
requires the nightly Rust compiler).
Most of those knobs are documented elsewhere, however I have never seen anybody
point out linking with -s. -s strips debug data,
together with debug data from the statically-linked Rust normal library.
This implies the linker must do much less work, lowering hyperlink instances.
Workspace and check layouts
Rust and Cargo have some flexibility in the way you place your information on
disk. For this undertaking, there are three cheap layouts:
In concept, if you happen to cut up your code into a number of crates, Cargo can
parallelize rustc invocations. As a result of I’ve a 32-thread CPU on my
Linux machine, and a 10-thread CPU on my macOS machine, I count on
unlocking parallelization to scale back construct instances.
For a given crate, there are additionally a number of locations on your assessments in a
Rust undertaking:
Due to dependency cycles, I could not benchmark the
assessments inside src information structure. However I did benchmark the opposite
layouts in some combos:
The workspace configurations (with both separate check
executables (many check exes) or one merged check executable (1 check exes)) appears to be the all-around winner. Let’s persist with the
workspace; many check exes configuration from right here onward.
Decrease dependency options
Many crates assist non-obligatory options. Typically, non-obligatory options are
enabled by default. Let’s have a look at what options are enabled utilizing the
cargo tree command:
The libc
crate has a characteristic referred to as std
. Let’s
flip it off, check it, and see if construct instances enhance:
Construct instances are no higher. Possibly the std
characteristic
does not really do something significant? Oh properly. On to the following tweak.
cargo-nextest
cargo-nextest is a instrument which claims to
be “as much as 60% quicker than cargo check.”. My Rust code base is
44% assessments, so perhaps cargo-nextest is simply what I would like. Let’s strive it and
examine construct+check instances:
On my Linux machine, cargo-nextest both does not assist or makes issues
worse. The output does look fairly, although…
How about on macOS?
cargo-nextest does barely velocity up builds+assessments on my MacBook Professional. I
marvel why speedup is OS-dependent. Maybe it is really
hardware-dependent?
From right here on, on macOS I’ll use cargo-nextest, however on Linux I’ll
not.
Customized-built toolchain with PGO
For C++ builds, I discovered that constructing the compiler myself with
profile-guided optimizations (PGO, also referred to as
FDO) gave
vital efficiency wins. Let’s strive PGO with the Rust toolchain.
Let’s additionally strive
LLVM BOLT
to additional optimize rustc. And -Ctarget-cpu=native
as properly.
In comparison with C++ compilers, it seems just like the Rust toolchain revealed
by way of rustup is already well-optimized. PGO+BOLT gave us lower than a ten%
efficiency enhance. However a perf win is a perf win, so let’s use this
quicker toolchain within the struggle versus C++.
Once I first tried constructing a {custom} Rust toolchain, it was slower than
Nightly by about 2%. I struggled for days to a minimum of attain parity,
tweaking all types of knobs within the Rust config.toml
, and
cross-checking Rust’s CI construct scripts with my very own. As I used to be placing the
ending touches on this text, I made a decision to
rustup replace
, git pull
, and re-build the
toolchain from scratch. Then my {custom} toolchain was quicker! I assume
this was what I wanted; maybe I used to be by chance on the fallacious commit
within the Rust repo. ????♀️
Optimizing the C++ construct
When engaged on the unique C++ undertaking, quick-lint-js, I already
optimized construct instances utilizing frequent methods, as utilizing
PCH, disabling exceptions and
RTTI, tweaking construct
flags, eradicating pointless #embrace
s, transferring code out of
headers, and extern
ing template instantiations. However there
are a number of C++ compilers and linkers to select from. Let’s examine them
and select one of the best earlier than I examine C++ with Rust:
On Linux, GCC is a transparent outlier. Clang fares significantly better. My
custom-built Clang (which is constructed with PGO and BOLT, like my {custom}
Rust toolchain) actually improves construct instances in comparison with Ubuntu’s Clang.
libstdc++ builds barely quicker on common than libc++. Let’s use my
{custom} Clang with libstdc++ in my C++ vs Rust comparability.
On macOS, the Clang toolchain which comes with Xcode appears to be
better-optimized than the Clang toolchain from LLVM’s web site. I will use
the Xcode Clang for my C++ vs Rust comparability.
C++20 modules
My C++ code makes use of #embrace
. However what about
import
launched in C++20? Aren’t C++20 modules supposed
to make compilation tremendous quick?
I attempted to make use of C++20 modules for this undertaking. As of
, CMake assist for modules on Linux is so
experimental that even
‘hello world’ doesn’t work.
Possibly 2023 would be the 12 months of C++20 modules. As somebody who cares so much
about construct instances, I actually hope so! However for now, I’ll pit Rust
towards traditional C++ #embrace
s.
C++ vs Rust construct instances
I ported the C++ undertaking to Rust and optimized the Rust construct instances as
a lot as I may. Which one compiles quicker: C++ or Rust?
Sadly, the reply is: it relies upon!
On my Linux machine, Rust builds are typically quicker than C++ builds,
however typically slower or the identical velocity. Within the
incremental lex benchmark, which modifies the biggest src file,
Clang was quicker than rustc. However for the opposite incremental benchmarks,
rustc got here out on high.
On my macOS machine, nonetheless, the story may be very totally different. C++ builds
are normally a lot quicker than Rust builds. Within the
incremental test-utf-8 benchmark, which modifies a medium-sized
check file, rustc compiled barely quicker than Clang. However for the opposite
incremental benchmarks, and for the total construct benchmark, Clang clearly
got here out on high.
Scaling past 17k SLOC
I benchmarked a 17k SLOC undertaking. However that was a small undertaking.
How do construct instances examine for a bigger undertaking of, say, 100k SLOC or
extra?
To check how properly the C++ and Rust compilers scale, I took the most important
module (the lexer) and copy-pasted its code and assessments, making 8, 16, and
24 copies.
As a result of my benchmarks additionally embrace the time it takes to run assessments, I
count on instances to extend linearly, even with prompt construct instances.
C++ SLOC |
Rust SLOC | |||
---|---|---|---|---|
1x | 16.6k | 17.1k | ||
8x | 52.3k | (+215%) | 43.7k | (+156%) |
16x | 93.1k | (+460%) | 74.0k | (+334%) |
24x | 133.8k | (+705%) | 104.4k | (+512%) |
Each Rust and Clang scaled linearly, which is nice to see.
For C++, altering a header file (incremental diag-types) result in
the most important change in construct time, as anticipated. Construct time scaled with a
low issue for the opposite incremental benchmarks, principally due to the
Mould linker.
I’m disenchanted with how poorly Rust’s construct scales, even with the
incremental test-utf-8 benchmark which should not be affected that
a lot by including unrelated information. This check makes use of the
workspace; many check exes crate structure, which implies test-utf-8
ought to get its personal executable which ought to compile independently.
Conclusion
Are compilation instances an issue with Rust? Sure. There
are some ideas and tips to hurry up builds, however I did not discover the
magical order-of-magnitude enhancements which might make me blissful
creating in Rust.
Are construct instances as dangerous with Rust as with C++? Sure. And
for greater tasks, growth compile instances are worse with Rust than
with C++, a minimum of with my code type.
my hypotheses, I used to be fallacious on all counts:
- The Rust port had extra strains than the C++ model, not fewer.
-
For full builds, in comparison with Rust, C++ builds took about the identical
period of time (17k SLOC) or took much less time (100k+ SLOC), not longer. -
For incremental builds, in comparison with C++, Rust builds had been typically
shorter and typically longer (17k SLOC) or for much longer (100k+ SLOC),
not at all times longer.
Am I unhappy? Sure. In the course of the porting course of, I’ve realized to love some
facets of Rust. For instance, proc macros would let me exchange three
totally different code mills, simplifying the construct pipeline and making
life simpler for brand new contributors. I do not miss header information in any respect. And
I recognize Rust’s tooling (particularly Cargo, rustup, and miri).
I made a decision to not port the remainder of quick-lint-js to Rust.
However… if construct instances enhance considerably, I’ll change my
thoughts! (Except I develop into enchanted by
Zig first.)
Appendix
Supply code
Source code for
the trimmed C++ undertaking, the Rust port (together with totally different undertaking
layouts), code era scripts, and benchmarking scripts.
GPL-3.0-or-later.
Linux machine
- title
- strapurp
- CPU
- AMD Ryzen 9 5950X (PBO; inventory clocks) (32 threads) (x86_64)
- RAM
- G.SKILL F4-4000C19-16GTZR 2×16 GiB (overclocked to 3800 MT/s)
- OS
- Linux Mint 21.1
- Kernel
-
Linux strapurp 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux - Linux efficiency governor
- schedutil
- CMake
- model 3.19.1
- Ninja
- model 1.10.2
- GCC
- model 12.1.0-2ubuntu1~22.04
- Clang (Ubuntu)
- model 14.0.0-1ubuntu1
- Clang ({custom})
-
model 15.0.6 (Rust fork; commit
3dfd4d93fa013e1c0578d3ceac5c8f4ebba4b6ec) - libstdc++ for Clang
- model 11.3.0-1ubuntu1~22.04
- Rust Secure
- 1.66.0 (69f9c33d7 2022-12-12)
- Rust Nightly
- model 1.68.0-nightly (c7572670a 2023-01-03)
- Rust ({custom})
- model 1.68.0-dev (c7572670a 2023-01-03)
- Mould
- model 0.9.3 (ec3319b37f653dccfa4d1a859a5c687565ab722d)
- binutils
- model 2.38
macOS machine
- title
- strammer
- CPU
- Apple M1 Max (10 threads) (AArch64)
- RAM
- Apple 64 GiB
- OS
- macOS Monterey 12.6
- CMake
- model 3.19.1
- Ninja
- model 1.10.2
- Xcode Clang
- Apple clang model 14.0.0 (clang-1400.0.29.202) (Xcode 14.2)
- Clang 15
- model 15.0.6 (LLVM.org web site)
- Rust Secure
- 1.66.0 (69f9c33d7 2022-12-12)
- Rust Nightly
- model 1.68.0-nightly (c7572670a 2023-01-03)
- Rust ({custom})
- model 1.68.0-dev (c7572670a 2023-01-03)
- lld
- model 15.0.6
- zld
- commit d50a975a5fe6576ba0fd2863897c6d016eaeac41
Benchmarks
- construct+check w/ deps
-
C++:
cmake -S construct -B . -G Ninja && ninja -C construct quick-lint-js-test
&& construct/check/quick-lint-js-test
timed -
Rust:
cargo fetch
untimed, then
cargo check
timed - construct+check w/o deps
-
C++:
cmake -S construct -B . -G Ninja && ninja -C construct gmock gmock_main
gtest
untimed, then
ninja -C construct quick-lint-js-test &&
construct/check/quick-lint-js-test
timed -
Rust:
cargo construct --package lazy_static --package libc --package
memoffset"
untimed, thencargo check
timed - incremental diag-types
-
C++: construct+check untimed, then modify
diagnostic-types.h
,
then
ninja -C construct quick-lint-js-test &&
construct/check/quick-lint-js-test -
Rust: construct+check untimed, then modify
diagnostic_types.rs
, thencargo check
- incremental lex
- Like incremental diag-types, however with lex.cpp/lex.rs
- incremental test-utf-8
- Like incremental diag-types, however with test-utf-8.cpp/test_utf_8.rs
For every executed benchmark, 12 samples had been taken. The primary two had been
discarded. Bars present the typical of the final 10 samples. Error bars present
the minimal and most pattern.