Now Reading
Iterating on Testing in Rust

Iterating on Testing in Rust

2023-06-17 03:02:14

With the discharge of rust 1.70, there was
some surprise and frustation
that
unstable test features now require nightly,
like all different unstable options in Rust.
One of many options most affected is --format json which has been in
limbo for 5 years.
This drew consideration to a sense I’ve had: the testing story in Rust has been
stagnating.
I have been gathering my ideas on this for the final 3 months and not too long ago had
some downtime between duties so I’ve began to look additional into this.

The tl;dr is to consider this as discovering proper abstractions to stabilize components
of
cargo_test_support
and cargo nextest.

Testing as we speak

Working cargo take a look at will construct and run all take a look at binaries

$ cargo take a look at
   Compiling cargo v0.72.0
    Completed take a look at [unoptimized + debuginfo] goal(s) in 0.62s
     Working  /dwelling/epage/src/cargo/assessments/testsuites/fundamental.rs

  working 1 take a look at
take a look at some_case ... okay

take a look at consequence: okay. 1 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s

Take a look at binaries are created from a bundle’s [lib] and for every .rs file in
assessments/.
test.

awesomeness-rs/
   Cargo.toml
   src/          # whitebox/unit assessments go right here
    lib.rs
    submodule.rs
    submodule/
      assessments.rs
   assessments/        # blackbox/integration assessments go right here
     is_awesome.rs

A take a look at is so simple as including the #[test] attribute and doing one thing that
panics:

#[test]
fn some_case() {
    assert_eq!(1, 2);
}

And that’s principally it.
There are a pair extra particulars (doc assessments, different attributes) however testing is
comparatively easy in Rust.

Strengths

Earlier than the rest, we should always acknowledge the strengths of the present story
round testing; what we should always ensure we defend.
Scouring boards, some factors I noticed referred to as out embrace:

  • Low friction for getting assessments up and working
  • Being the usual approach to take a look at makes it simple to leap between initiatives
  • Excessive value-to-ceremony ratio
  • Solely working assessments in parallel places strain on assessments being scalable

Issues

For some background, while you run cargo take a look at, the logic is cut up between two
key items:

  • cargo take a look at command which enumerates, builds, and runs take a look at binaries fairly
    a lot solely caring about their exit code
  • libtest which is linked into every
    take a look at binary and parses flags, enumerates assessments, runs them, and prints out a
    report.

Conditional ignores

libtest is static.
If you happen to #[ignore] a take a look at, that’s it.
You can also make a take a look at conditioned on a platform or the presence of characteristic flags,
like #[cfg_attr(windows, ignore)].
Nevertheless, you possibly can’t ignore assessments based mostly on runtime circumstances.

In cargo, we’ve assessments that require put in software program.
The naive strategy is to return early:

#[test]
fn simple_hg() {
    if !has_command("hg") {
        return;
    }

    // ...
}

However that offers builders a deceptive view of their take a look at protection:

$ cargo take a look at
   Compiling cargo v0.72.0
    Completed take a look at [unoptimized + debuginfo] goal(s) in 0.62s
     Working  /dwelling/epage/src/cargo/assessments/testsuites/fundamental.rs

  working 1 take a look at
take a look at new::simple_hg ... okay

take a look at consequence: okay. 1 handed; 0 failed; 0 ignored; 0 measured; 0 filtered out; completed in 0.00s

In cargo, we have labored round this by offering a customized take a look at macro that at compile
time
checks if hg is put in and provides an #[ignore] attribute:

#[cargo_test(requires_hr)]
fn simple_hg() {
    // ...
}
$ cargo take a look at
   Compiling cargo v0.72.0
    Completed take a look at [unoptimized + debuginfo] goal(s) in 16.49s
     Working  /dwelling/epage/src/cargo/assessments/testsuites/fundamental.rs

working 1 assessments
take a look at init::simple_hg::case ... ignored, hg not put in

take a look at consequence: okay. 0 handed; 0 failed; 1 ignored; 0 measured; 0 filtered out; completed in 0.10s

Having to wrap #[test] is not preferrred and requires you to bake in each runtime
situation into your macro.
This additionally then would not compose with different options.

Cargo can be unlikely to have the ability to acknowledge that it must recompile assessments
when the circumstances change.

See additionally rust-lang/rust#68007.

Take a look at technology

Knowledge pushed assessments are a simple approach to cowl plenty of circumstances (granted, property testing is even higher).
Probably the most trivial method of doing that is simply looping over your circumstances, like this code from toml_edit:

#[test]
fn integers() {
    let circumstances = [
        ("+99", 99),
        ("42", 42),
        ("0", 0),
        ("-17", -17),
        ("1_2_3_4_5", 1_2_3_4_5),
        ("0xF", 15),
        ("0o0_755", 493),
        ("0b1_0_1", 5),
        (&std::i64::MIN.to_string()[..], std::i64::MIN),
        (&std::i64::MAX.to_string()[..], std::i64::MAX),
    ];
    for &(enter, anticipated) in &circumstances {
        let parsed = integer.parse(new_input(enter));
        assert_eq!(parsed, Okay(anticipated));
    }
}

Nevertheless,

  • You do not know which enter was being processed on failure (with out further steps)
    • Any debug output from prior iterations will flood the show when analyzing a failure
  • Its “fail-fast”: a damaged case prevents different circumstances from working, requiring cautious ordering
    to make sure the extra normal case is first
  • You aren’t getting the larger image of whats working and or not by seeing the entire failures directly
  • You possibly can’t choose a selected case to run / debug

Some initiatives will create bespoke macros are created so that you get a #[test] per
knowledge level.
When this occurs often sufficient throughout initiatives, folks will write their
personal libraries to automate this, together with:

Or libtest-mimic for the choose-your-own journey route.

For instance, with trybuild, you create a devoted take a look at binary with a take a look at
operate and every thing is then delegated to trybuild:

#[test]
fn ui() {
    let t = trybuild::TestCases::new();
    t.compile_fail("assessments/ui/*.rs");
}

And also you get this output:
trybuild output

Alternatively, some testing libraries substitute libtest because the take a look at harness

# Cargo.toml ...

[[bench]]
title = "my_benchmark"
harness = false
use std::iter;

use criterion::BenchmarkId;
use criterion::Criterion;
use criterion::Throughput;

fn from_elem(c: &mut Criterion) {
    static KB: usize = 1024;

    let mut group = c.benchmark_group("from_elem");
    for dimension in [KB, 2 * KB, 4 * KB, 8 * KB, 16 * KB].iter() b, &dimension
    group.end();
}

criterion_group!(benches, from_elem);
criterion_main!(benches);

Customized harnesses are a second class expertise

  • Require their very own take a look at binary, distinct from different assessments
  • Various ranges of help or extensions for methods to work together with them
  • The price for writing your personal is excessive

Take a look at Initialization and Cleanup

When speaking about this, folks typically consider the basic JUnit setup with its personal downsides:

public class JUnitTestCase extends TestCase {
    @Override
    protected void setUp() throws Exception {
        // ...
    }

    public void testSomeSituation() {
        // ...
    }

    @Override
    protected void tearDown() throws Exception {
        // ...
    }
}

In Rust, we typically resolve this with RAII:

fn cargo_add_lockfile_updated() {
    let scratch = tempfile::tempdir::new().unwrap();

    // ...


}

This has its personal limitations, like some teardown errors being ignored.
I’ve had bugs masked by this on Home windows, requiring guide cleanup to catch them:

fn cargo_add_lockfile_updated() {
    let scratch = tempfile::tempdir::new().unwrap();

    // ...

    scratch.shut().unwrap();
}

Generally generic libraries like tempfile aren’t adequate.
Inside cargo, we deliberately leak the temp directories, solely cleansing them
up on the following run so folks can debug failures.
That is additionally supplied by #[cargo_test].
Nevertheless, we’ve repeatedly hit CI storage limits and it will be an enormous assist if
the fixture tracked the scale of those directories, very like monitoring take a look at
instances.

Cargo additionally has plenty of fixture initialization coupled to the listing managed
by #[cargo_test], requiring a bundle to buy-in to the entire system
to simply use slightly portion of it.

Generally a fixture is dear to create and also you need to have the ability to share it. For instance in cargo,
we sometimes put multiple “tests” in the same function to share the fixture,
working into related issues as we do with the dearth of take a look at technology.

The counter argument might be made that we simply aren’t composing issues proper.
That’s doubtless the case however I really feel this natural development is the pure consequence
of not having higher supporting instruments and needing to prioritize our personal
improvement.

Having composable fixtures would go a good distance in the direction of making take a look at code extra reusable.
Take as an example pytest.
In a earlier a part of my profession, I made a Python API for {hardware} that
interacted with the
CAN bus.
This needed to be examined on the system stage and required entry to {hardware}.
With pytest, I may specify {that a}
test required a can_in_interface resource.
can_in_interface is a fixture
that might be initialized from the command-line and skip all dependent assessments, if not specified.

def pytest_addoption(parser):
    parser.addoption(
        "--can-in-interface", default="None",
        motion="retailer",
        assist="The CAN interface to make use of with the assessments")

@pytest.fixture
def can_in_interface(request):
    interface = request.config.getoption("--can-in-interface")
    if interface.decrease() == "none":
        pytest.skip("Take a look at requires a CAN board")
    return interface

def test_wait_for_intf_communicating(can_in_interface):
    # ...

We have now crates like rstest, however they’re
like #[cargo_test] and construct on high of libtest.
We will not prolong the command-line, have fixtures skip assessments, and so forth

Scaling up improvement

As we have labored round limitations, we have misplaced the energy of transferability.
These options are additionally not composable; if there is not a customized take a look at harness
that works to your case, it’s important to construct every thing up from scratch.
libtest-mimic reduces the work to construct from scratch but it surely nonetheless requires you
to do that for every situation, reasonably than having a approach to compose testing
logic.
This takes a toll on these initiatives till they are saying sufficient is sufficient and construct
one thing customized.

Friction between cargo take a look at and libtest

To this point I’ve principally talked about take a look at writing.
There are additionally issues with take a look at working and reporting.
I feel cargo nextest has helped spotlight gaps within the
present workflow.
Nevertheless, cargo nextest is working inside the limitations of the present system.
For instance, what would usually be attributes on the take a look at operate in different language’s take a look at
libraries, it’s important to specify in a separate config file.
cargo nextest additionally does course of isolation for assessments.
Whereas it has advantages, I am involved about what would lose by making this the default workflow.
For instance, you possibly can’t run cargo nextest on cargo as we speak due to shared
state between assessments, specifically the creation of quick identifiers for temp
directories which permits us to have a secure set of directories to make use of and
clear up from.
Course of isolation additionally will get in the best way of making an attempt to help shared fixtures in
the longer term.

Going again over our backlog, we have issues associated to cargo take a look at and libtest’s interactions embrace:

Answer

To keep away from optimizing for an area maxima, I wish to deal with the perfect case and
then step again to what’s sensible.

My preferrred situation

Earlier than, I made a reference to pytest.
That has been the most effective mannequin for testing I’ve used to date.
It supplies a shared, composable conference for extending testing capabilities
that I really feel assist in the eventualities I mapped out.

Runtime-conditional ignores:

@pytest.mark.skipif(!has_command("hg"), purpose="requires `hg` CLI")
def test_simple_hg():
    go

Case technology

@pytest.mark.parametrize("sample_rs", trybuild.discover("assessments/ui/*.rs"))
def ui(sample_rs):
    attemptconstruct.confirm(sample_rs)

Initialization and cleanup

def cargo_add_lockfile_updated(tmpdir):
    # ...

As for the UX, we will shift a number of the obligations from libtest to
cargo take a look at if we formalize their relationship.
Presently, cargo take a look at arms off all obligations and makes no assumptions
about command-line arguments, output codecs, whats secure to run in parallel,
and so on.

After all, this is not all that straightforward or else it will have already been executed.
For libtest, its tough to get suggestions on unstable options which is
one purpose issues have remained in limbo for thus lengthy.
This additionally extends to stabilizing the json output for permitting tighter
integration between cargo take a look at and libtest.
A naive strategy to tighter integration would even be a breaking change because it
adjustments expectations for customized take a look at harnesses and even particular person assessments are
run.

Prototype

I began prototyping the libtest-side of my ideal scenario
whereas ready for some FCPs to shut out.
My thought was to start out right here, reasonably than on cargo take a look at as this could let me
discover what the json output ought to appear to be earlier than working to stabilize it
for cargo take a look at to make use of.
I even went a step additional and carried out all different output codecs (fairly,
terse and, junit) on high of the constructions used for json output, serving to to
additional refine its design and ensuring its adequate for cargo take a look at to
create the specified UX.

This prototype continues to be pretty early; we do not even have full parity with
libtest-mimic.
Because of this, not one of the crates have been printed but.

The premise for the design is “what if this might substitute the unique libtest?”.
Even when we this doesn’t turn out to be the premise for libtest, my hope is that core
components of the code could be shared to make sure constant conduct, like serde varieties and the CLI.

So that is why I made
yet another argument parser.
I get pleasure from clap and usually suggest it to folks (which is why I’ve taken on sustaining it).
When somebody wants one thing extra light-weight, I typically level them to
lexopt
because of the simpleness of the design.

However.

When designing this prototype, I needed to design-in help
for customers to increase the command-line like you possibly can in pytest.
This implies it must be pluggable.
If this had been uncovered in libtest’s API, then it might probably’t break compatibility.
The simplest method to do that is to have as easy of an API as doable.
Clap’s API is simply too massive.
I used to be involved even in regards to the quantity of coverage in lexopt.
I do not know if lexarg will go wherever but it surely allowed me to get an thought of
how a perma-1.0 take a look at library may presumably have an extensible CLI.

Libs group assembly

After presenting on this at RustNL 2023
(video, slides),
I had the chance to attend an in-person libs group assembly to debate components
of this with them
(together with OsStr).

Query: how a lot of this work will make it into libtest?

I went in with the audacious objective of “every thing”, initially engaged on
extension factors to permit out-of-tree experiments on a “pytest”-like
API after which slowly pull items of that into libtest.

We additionally mentioned experimenting with unstable options by publishing libtest to
crates.io the place we will break compatibility till we’re happy after which add
the #[stable] attribute to the model shipped with rust.
This concept
isn’t new.

The largest concern was with the compatibility floor that must be
maintained and as a substitute we went the wrong way.
Our purpose is to make customized take a look at harnesses first-class and shift the main target
in the direction of them reasonably than extending libtest.
Hopefully, we will consolidate down on simply a few frameworks with libtest
offering us with the baseline API, decreasing the prospect that take a look at writing
between initiatives is simply too disjoint. I additionally hope we will share code between these
initiatives to enhance consistency and make it simpler to evolve to expectations.

Query: the place can we draw the road for these customized take a look at harnesses?

Immediately, you both get libtest with take a look at enumeration, a fundamental, and so on, otherwise you
should construct all of it up from scratch.
Beforehand
eRFC #2318
laid out a plan for rust to nonetheless personal the #[test] macro and take a look at enumeration
however be accessible from customized take a look at harnesses.
For the circumstances I’ve enumerated and my intestine feeling when prototyping, my
suspicion is that I will be wanting to permit customized #[test] macros so my take a look at
harness can management what code will get generated and might have nested
attributes (like #[test(exclusive)]) reasonably than repeating our current
sample of separate macros (e.g. #[ignore]).

To get parity with libtest, we’ll want secure help for

  • Take a look at enumeration (see beneath)
  • Disabling of the present #[test] macro (no plans but)
  • Customized preludes to drag in #[test] macro from a dependency (no plans but)
  • Pulling in fundamental from a dependency (no plans but)
  • Capturing of println (see beneath)

Plus a low ceremony approach to opt-in to all of this (like
rust-lang/cargo#6945
).

We did not cowl every thing however we made sufficient progress to really feel pleased with this plan

Query: how will we do take a look at enumeration?

I had hoped that
inventory
or
linkme
might be used for this however there was concern about supporting these throughout
platforms with out hiccups, together with from dtolnay who’s the maintainer of
them.

As a substitute, we’re taking a look at introducing a brand new language characteristic to interchange
libtest’s use of inside compiler options.
This could most definitely be a
#[distributed_slice]
although we did not hash out additional particulars within the assembly.

Query: how will we seize println?

When a take a look at fails, its nice that the output is captured and reported again from
println, dbg, and panic messages (like from assert).

How does it work although?
I am sorry you requested.
At first of a take a look at, a buffer is handed to
set_output_capture
which places it into thread-local storage.
Once you name println, print, eprintln, or eprint,
they name
_print and _eprint
which calls print_to,
writing to the buffer.
On the finish of the take a look at, set_output_capture is named once more to revive
printing to stdout / stderr.

This implies

  • If you happen to write on to std::io::{stdout,stderr}, use libc, or have C code utilizing libc, it is not going to be captured
  • If the take a look at launches a thread, its output is not going to be captured (so far as I can inform)
  • This API is just accessible in nightly which libtest has particular entry to on secure and customized take a look at harnesses don’t

Beforehand, this
whole process was more more reusable but more complex
and there’s hesitance to return on this route.
For this to be stabilized, this additionally must be extra normal
This must cowl std::io::stdout / std::io::stderr and certain libc.
This must be extra resilient, capturing throughout all threads or inherited when
new threads are spawned.
Then there’s async the place its not about what thread you’re working on.
Implicit
contexts
for stdout and stderr would cowl most wants however that I am doubt we’ll get
that any time quickly, if ever (regardless of how a lot I really like the thought of it).

We may workaround this by working every take a look at in its personal course of however that comes
with its personal downsides as already talked about.

So we do not have a plan that may meet these wants but.

See additionally rust-lang/rust#90785.

Subsequent Steps

Now could be the place you possibly can assist.

This, sadly, is not my highest precedence as a result of I do not need wish to
depart a path of incomplete initiatives.
I’ve beforehand dedicated to
MSRV,
[lints],
cargo script,
and keeping up with my crates.
Even when this was my highest precedence, that is an excessive amount of for one particular person and is
unfold throughout rust language design, the rust compiler, the usual library, and
cargo.
This can even take time to undergo the RFC course of for every half so
the earlier we begin on these, the higher.

However I do not suppose meaning we should always quit.

For anybody who want to assist out, the components I see which can be unblocked embrace:

  1. Getting ready a
    #[distributed_slice]
    Pre-RFC and transferring that ahead so we’ve take a look at enumeration
  2. End up what could be executed on the prototype for additional json output suggestions
  3. Design a low ceremony approach to opt-in to all of this (like rust-lang/cargo#6945)
  4. Sketching out concepts how we would disable the present #[test] macro
  5. Researching the place customized preludes are at and see what would possibly be capable to transfer ahead so we will pull within the #[test] macro
  6. Equally, researching the potential of pulling in fundamental from a dependency

These are roughly in precedence order based mostly on a combination of

  • The time it’ll take earlier than its usable
  • How assured I’m that it may be solved
  • The repay from fixing it, whether or not as a result of its extra typically helpful or the next ache level to not have (e.g. distributed_slice in each circumstances)

Alternatively, in case your pursuits align with one among my larger priorities, I
welcome the assistance with them so extra of us can deal with this downside.

Talk about on
reddit
mastadon

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top