Iterating on Testing in Rust

With the discharge of rust 1.70, there was
some surprise and frustation
that
unstable test
features now require nightly,
like all different unstable options in Rust.
One of many options most affected is --format json
which has been in
limbo for 5 years.
This drew consideration to a sense I’ve had: the testing story in Rust has been
stagnating.
I have been gathering my ideas on this for the final 3 months and not too long ago had
some downtime between duties so I’ve began to look additional into this.
The tl;dr is to consider this as discovering proper abstractions to stabilize components
of
cargo_test_support
and cargo nextest
.
Testing as we speak
Working cargo take a look at
will construct and run all take a look at binaries
$ cargo take a look at
Compiling cargo v0.72.0
Completed take a look at [unoptimized + debuginfo] goal(s) in 0.62s
Working /dwelling/epage/src/cargo/assessments/testsuites/fundamental.rs
working 1 take a look at
take a look at some_case ... okay
take a look at consequence: okay. 1 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s
Take a look at binaries are created from a bundle’s [lib]
and for every .rs
file in
assessments/
.
test
.
awesomeness-rs/
Cargo.toml
src/ # whitebox/unit assessments go right here
lib.rs
submodule.rs
submodule/
assessments.rs
assessments/ # blackbox/integration assessments go right here
is_awesome.rs
A take a look at is so simple as including the #[test]
attribute and doing one thing that
panics:
#[test]
fn some_case() {
assert_eq!(1, 2);
}
And that’s principally it.
There are a pair extra particulars (doc assessments, different attributes) however testing is
comparatively easy in Rust.
Strengths
Earlier than the rest, we should always acknowledge the strengths of the present story
round testing; what we should always ensure we defend.
Scouring boards, some factors I noticed referred to as out embrace:
- Low friction for getting assessments up and working
- Being the usual approach to take a look at makes it simple to leap between initiatives
- Excessive value-to-ceremony ratio
- Solely working assessments in parallel places strain on assessments being scalable
Issues
For some background, while you run cargo take a look at
, the logic is cut up between two
key items:
cargo take a look at
command which enumerates, builds, and runs take a look at binaries fairly
a lot solely caring about their exit code- libtest which is linked into every
take a look at binary and parses flags, enumerates assessments, runs them, and prints out a
report.
Conditional ignores
libtest is static.
If you happen to #[ignore]
a take a look at, that’s it.
You can also make a take a look at conditioned on a platform or the presence of characteristic flags,
like #[cfg_attr(windows, ignore)]
.
Nevertheless, you possibly can’t ignore assessments based mostly on runtime circumstances.
In cargo, we’ve assessments that require put in software program.
The naive strategy is to return early:
#[test]
fn simple_hg() {
if !has_command("hg") {
return;
}
// ...
}
However that offers builders a deceptive view of their take a look at protection:
$ cargo take a look at
Compiling cargo v0.72.0
Completed take a look at [unoptimized + debuginfo] goal(s) in 0.62s
Working /dwelling/epage/src/cargo/assessments/testsuites/fundamental.rs
working 1 take a look at
take a look at new::simple_hg ... okay
take a look at consequence: okay. 1 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s
In cargo, we have labored round this by offering a customized take a look at macro that at compile
time checks if hg
is put in and provides an #[ignore]
attribute:
#[cargo_test(requires_hr)]
fn simple_hg() {
// ...
}
$ cargo take a look at
Compiling cargo v0.72.0
Completed take a look at [unoptimized + debuginfo] goal(s) in 16.49s
Working /dwelling/epage/src/cargo/assessments/testsuites/fundamental.rs
working 1 assessments
take a look at init::simple_hg::case ... ignored, hg not put in
take a look at consequence: okay. 0 handed; 0 failed; 1 ignored; 0 measured; 0 filtered out; completed in 0.10s
Having to wrap #[test]
is not preferrred and requires you to bake in each runtime
situation into your macro.
This additionally then would not compose with different options.
Cargo can be unlikely to have the ability to acknowledge that it must recompile assessments
when the circumstances change.
See additionally rust-lang/rust#68007.
Take a look at technology
Knowledge pushed assessments are a simple approach to cowl plenty of circumstances (granted, property testing is even higher).
Probably the most trivial method of doing that is simply looping over your circumstances, like this code from toml_edit
:
#[test]
fn integers() {
let circumstances = [
("+99", 99),
("42", 42),
("0", 0),
("-17", -17),
("1_2_3_4_5", 1_2_3_4_5),
("0xF", 15),
("0o0_755", 493),
("0b1_0_1", 5),
(&std::i64::MIN.to_string()[..], std::i64::MIN),
(&std::i64::MAX.to_string()[..], std::i64::MAX),
];
for &(enter, anticipated) in &circumstances {
let parsed = integer.parse(new_input(enter));
assert_eq!(parsed, Okay(anticipated));
}
}
Nevertheless,
- You do not know which
enter
was being processed on failure (with out further steps)- Any debug output from prior iterations will flood the show when analyzing a failure
- Its “fail-fast”: a damaged case prevents different circumstances from working, requiring cautious ordering
to make sure the extra normal case is first - You aren’t getting the larger image of whats working and or not by seeing the entire failures directly
- You possibly can’t choose a selected case to run / debug
Some initiatives will create bespoke macros are created so that you get a #[test]
per
knowledge level.
When this occurs often sufficient throughout initiatives, folks will write their
personal libraries to automate this, together with:
Or libtest-mimic for the choose-your-own journey route.
For instance, with trybuild, you create a devoted take a look at binary with a take a look at
operate and every thing is then delegated to trybuild:
#[test]
fn ui() {
let t = trybuild::TestCases::new();
t.compile_fail("assessments/ui/*.rs");
}
And also you get this output:
Alternatively, some testing libraries substitute libtest because the take a look at harness
# Cargo.toml ...
[[bench]]
title = "my_benchmark"
harness = false
use std::iter;
use criterion::BenchmarkId;
use criterion::Criterion;
use criterion::Throughput;
fn from_elem(c: &mut Criterion) {
static KB: usize = 1024;
let mut group = c.benchmark_group("from_elem");
for dimension in [KB, 2 * KB, 4 * KB, 8 * KB, 16 * KB].iter() b, &dimension
group.end();
}
criterion_group!(benches, from_elem);
criterion_main!(benches);
Customized harnesses are a second class expertise
- Require their very own take a look at binary, distinct from different assessments
- Various ranges of help or extensions for methods to work together with them
- The price for writing your personal is excessive
Take a look at Initialization and Cleanup
When speaking about this, folks typically consider the basic JUnit setup with its personal downsides:
public class JUnitTestCase extends TestCase {
@Override
protected void setUp() throws Exception {
// ...
}
public void testSomeSituation() {
// ...
}
@Override
protected void tearDown() throws Exception {
// ...
}
}
In Rust, we typically resolve this with RAII:
fn cargo_add_lockfile_updated() {
let scratch = tempfile::tempdir::new().unwrap();
// ...
}
This has its personal limitations, like some teardown errors being ignored.
I’ve had bugs masked by this on Home windows, requiring guide cleanup to catch them:
fn cargo_add_lockfile_updated() {
let scratch = tempfile::tempdir::new().unwrap();
// ...
scratch.shut().unwrap();
}
Generally generic libraries like tempfile
aren’t adequate.
Inside cargo, we deliberately leak the temp directories, solely cleansing them
up on the following run so folks can debug failures.
That is additionally supplied by #[cargo_test]
.
Nevertheless, we’ve repeatedly hit CI storage limits and it will be an enormous assist if
the fixture tracked the scale of those directories, very like monitoring take a look at
instances.
Cargo additionally has plenty of fixture initialization coupled to the listing managed
by #[cargo_test]
, requiring a bundle to buy-in to the entire system
to simply use slightly portion of it.
Generally a fixture is dear to create and also you need to have the ability to share it. For instance in cargo,
we sometimes put multiple “tests” in the same function to share the fixture,
working into related issues as we do with the dearth of take a look at technology.
The counter argument might be made that we simply aren’t composing issues proper.
That’s doubtless the case however I really feel this natural development is the pure consequence
of not having higher supporting instruments and needing to prioritize our personal
improvement.
Having composable fixtures would go a good distance in the direction of making take a look at code extra reusable.
Take as an example pytest.
In a earlier a part of my profession, I made a Python API for {hardware} that
interacted with the
CAN bus.
This needed to be examined on the system stage and required entry to {hardware}.
With pytest, I may specify {that a}
test required a can_in_interface resource.
can_in_interface is a fixture
that might be initialized from the command-line and skip all dependent assessments, if not specified.
def pytest_addoption(parser):
parser.addoption(
"--can-in-interface", default="None",
motion="retailer",
assist="The CAN interface to make use of with the assessments")
@pytest.fixture
def can_in_interface(request):
interface = request.config.getoption("--can-in-interface")
if interface.decrease() == "none":
pytest.skip("Take a look at requires a CAN board")
return interface
def test_wait_for_intf_communicating(can_in_interface):
# ...
We have now crates like rstest, however they’re
like #[cargo_test]
and construct on high of libtest.
We will not prolong the command-line, have fixtures skip assessments, and so forth
Scaling up improvement
As we have labored round limitations, we have misplaced the energy of transferability.
These options are additionally not composable; if there is not a customized take a look at harness
that works to your case, it’s important to construct every thing up from scratch.
libtest-mimic reduces the work to construct from scratch but it surely nonetheless requires you
to do that for every situation, reasonably than having a approach to compose testing
logic.
This takes a toll on these initiatives till they are saying sufficient is sufficient and construct
one thing customized.
Friction between cargo take a look at
and libtest
To this point I’ve principally talked about take a look at writing.
There are additionally issues with take a look at working and reporting.
I feel cargo nextest has helped spotlight gaps within the
present workflow.
Nevertheless, cargo nextest
is working inside the limitations of the present system.
For instance, what would usually be attributes on the take a look at operate in different language’s take a look at
libraries, it’s important to specify in a separate config file.
cargo nextest
additionally does course of isolation for assessments.
Whereas it has advantages, I am involved about what would lose by making this the default workflow.
For instance, you possibly can’t run cargo nextest
on cargo as we speak due to shared
state between assessments, specifically the creation of quick identifiers for temp
directories which permits us to have a secure set of directories to make use of and
clear up from.
Course of isolation additionally will get in the best way of making an attempt to help shared fixtures in
the longer term.
Going again over our backlog, we have issues associated to cargo take a look at
and libtest’s interactions embrace:
Answer
To keep away from optimizing for an area maxima, I wish to deal with the perfect case and
then step again to what’s sensible.
My preferrred situation
Earlier than, I made a reference to pytest.
That has been the most effective mannequin for testing I’ve used to date.
It supplies a shared, composable conference for extending testing capabilities
that I really feel assist in the eventualities I mapped out.
Runtime-conditional ignores:
@pytest.mark.skipif(!has_command("hg"), purpose="requires `hg` CLI")
def test_simple_hg():
go
Case technology
@pytest.mark.parametrize("sample_rs", trybuild.discover("assessments/ui/*.rs"))
def ui(sample_rs):
attemptconstruct.confirm(sample_rs)
Initialization and cleanup
def cargo_add_lockfile_updated(tmpdir):
# ...
As for the UX, we will shift a number of the obligations from libtest to
cargo take a look at
if we formalize their relationship.
Presently, cargo take a look at
arms off all obligations and makes no assumptions
about command-line arguments, output codecs, whats secure to run in parallel,
and so on.
After all, this is not all that straightforward or else it will have already been executed.
For libtest, its tough to get suggestions on unstable options which is
one purpose issues have remained in limbo for thus lengthy.
This additionally extends to stabilizing the json output for permitting tighter
integration between cargo take a look at
and libtest.
A naive strategy to tighter integration would even be a breaking change because it
adjustments expectations for customized take a look at harnesses and even particular person assessments are
run.
Prototype
I began prototyping the libtest-side of my ideal scenario
whereas ready for some FCPs to shut out.
My thought was to start out right here, reasonably than on cargo take a look at
as this could let me
discover what the json output ought to appear to be earlier than working to stabilize it
for cargo take a look at
to make use of.
I even went a step additional and carried out all different output codecs (fairly,
terse and, junit) on high of the constructions used for json output, serving to to
additional refine its design and ensuring its adequate for cargo take a look at
to
create the specified UX.
This prototype continues to be pretty early; we do not even have full parity with
libtest-mimic.
Because of this, not one of the crates have been printed but.
The premise for the design is “what if this might substitute the unique libtest?”.
Even when we this doesn’t turn out to be the premise for libtest, my hope is that core
components of the code could be shared to make sure constant conduct, like serde varieties and the CLI.
So that is why I made
yet another argument parser.
I get pleasure from clap and usually suggest it to folks (which is why I’ve taken on sustaining it).
When somebody wants one thing extra light-weight, I typically level them to
lexopt
because of the simpleness of the design.
However.
When designing this prototype, I needed to design-in help
for customers to increase the command-line like you possibly can in pytest.
This implies it must be pluggable.
If this had been uncovered in libtest’s API, then it might probably’t break compatibility.
The simplest method to do that is to have as easy of an API as doable.
Clap’s API is simply too massive.
I used to be involved even in regards to the quantity of coverage in lexopt.
I do not know if lexarg
will go wherever but it surely allowed me to get an thought of
how a perma-1.0 take a look at library may presumably have an extensible CLI.
Libs group assembly
After presenting on this at RustNL 2023
(video, slides),
I had the chance to attend an in-person libs group assembly to debate components
of this with them
(together with OsStr).
Query: how a lot of this work will make it into libtest?
I went in with the audacious objective of “every thing”, initially engaged on
extension factors to permit out-of-tree experiments on a “pytest”-like
API after which slowly pull items of that into libtest.
We additionally mentioned experimenting with unstable options by publishing libtest to
crates.io the place we will break compatibility till we’re happy after which add
the #[stable]
attribute to the model shipped with rust.
This concept
isn’t new.
The largest concern was with the compatibility floor that must be
maintained and as a substitute we went the wrong way.
Our purpose is to make customized take a look at harnesses first-class and shift the main target
in the direction of them reasonably than extending libtest.
Hopefully, we will consolidate down on simply a few frameworks with libtest
offering us with the baseline API, decreasing the prospect that take a look at writing
between initiatives is simply too disjoint. I additionally hope we will share code between these
initiatives to enhance consistency and make it simpler to evolve to expectations.
Query: the place can we draw the road for these customized take a look at harnesses?
Immediately, you both get libtest with take a look at enumeration, a fundamental
, and so on, otherwise you
should construct all of it up from scratch.
Beforehand
eRFC #2318
laid out a plan for rust to nonetheless personal the #[test]
macro and take a look at enumeration
however be accessible from customized take a look at harnesses.
For the circumstances I’ve enumerated and my intestine feeling when prototyping, my
suspicion is that I will be wanting to permit customized #[test]
macros so my take a look at
harness can management what code will get generated and might have nested
attributes (like #[test(exclusive)]
) reasonably than repeating our current
sample of separate macros (e.g. #[ignore]
).
To get parity with libtest, we’ll want secure help for
- Take a look at enumeration (see beneath)
- Disabling of the present
#[test]
macro (no plans but) - Customized preludes to drag in
#[test]
macro from a dependency (no plans but) - Pulling in
fundamental
from a dependency (no plans but) - Capturing of
println
(see beneath)
Plus a low ceremony approach to opt-in to all of this (like
rust-lang/cargo#6945
).
We did not cowl every thing however we made sufficient progress to really feel pleased with this plan
Query: how will we do take a look at enumeration?
I had hoped that
inventory
or
linkme
might be used for this however there was concern about supporting these throughout
platforms with out hiccups, together with from dtolnay who’s the maintainer of
them.
As a substitute, we’re taking a look at introducing a brand new language characteristic to interchange
libtest’s use of inside compiler options.
This could most definitely be a
#[distributed_slice]
although we did not hash out additional particulars within the assembly.
Query: how will we seize println
?
When a take a look at fails, its nice that the output is captured and reported again from
println
, dbg
, and panic messages (like from assert
).
How does it work although?
I am sorry you requested.
At first of a take a look at, a buffer is handed to
set_output_capture
which places it into thread-local storage.
Once you name println, print, eprintln, or eprint,
they name
_print and _eprint
which calls print_to,
writing to the buffer.
On the finish of the take a look at, set_output_capture
is named once more to revive
printing to stdout
/ stderr
.
This implies
- If you happen to write on to
std::io::{stdout,stderr}
, use libc, or have C code utilizing libc, it is not going to be captured - If the take a look at launches a thread, its output is not going to be captured (so far as I can inform)
- This API is just accessible in nightly which libtest has particular entry to on secure and customized take a look at harnesses don’t
Beforehand, this
whole process was more more reusable but more complex
and there’s hesitance to return on this route.
For this to be stabilized, this additionally must be extra normal
This must cowl std::io::stdout
/ std::io::stderr
and certain libc.
This must be extra resilient, capturing throughout all threads or inherited when
new threads are spawned.
Then there’s async
the place its not about what thread you’re working on.
Implicit
contexts
for stdout
and stderr
would cowl most wants however that I am doubt we’ll get
that any time quickly, if ever (regardless of how a lot I really like the thought of it).
We may workaround this by working every take a look at in its personal course of however that comes
with its personal downsides as already talked about.
So we do not have a plan that may meet these wants but.
See additionally rust-lang/rust#90785.
Subsequent Steps
Now could be the place you possibly can assist.
This, sadly, is not my highest precedence as a result of I do not need wish to
depart a path of incomplete initiatives.
I’ve beforehand dedicated to
MSRV,
[lints]
,
cargo script
,
and keeping up with my crates.
Even when this was my highest precedence, that is an excessive amount of for one particular person and is
unfold throughout rust language design, the rust compiler, the usual library, and
cargo.
This can even take time to undergo the RFC course of for every half so
the earlier we begin on these, the higher.
However I do not suppose meaning we should always quit.
For anybody who want to assist out, the components I see which can be unblocked embrace:
- Getting ready a
#[distributed_slice]
Pre-RFC and transferring that ahead so we’ve take a look at enumeration - End up what could be executed on the prototype for additional json output suggestions
- Design a low ceremony approach to opt-in to all of this (like rust-lang/cargo#6945)
- Sketching out concepts how we would disable the present
#[test]
macro - Researching the place customized preludes are at and see what would possibly be capable to transfer ahead so we will pull within the
#[test]
macro - Equally, researching the potential of pulling in
fundamental
from a dependency
These are roughly in precedence order based mostly on a combination of
- The time it’ll take earlier than its usable
- How assured I’m that it may be solved
- The repay from fixing it, whether or not as a result of its extra typically helpful or the next ache level to not have (e.g.
distributed_slice
in each circumstances)
Alternatively, in case your pursuits align with one among my larger priorities, I
welcome the assistance with them so extra of us can deal with this downside.