I’m pleased to announce the release of Criterion.rs v0.3, available today. Version 0.3 provides a
number of new features including preliminary support for plugging in custom measurements (eg.
hardware timers or POSIX CPU time), hooks to start/stop profilers, a new BenchmarkGroup
struct
that provides more flexibility than the older Benchmark
and ParameterizedBenchmark
structs, and
an implementation of a #[criterion]
custom-test-framework macro for those on Nightly.
What is Criterion.rs?
Criterion.rs is a statistics-driven benchmarking library for Rust. It provides precise measurements of changes in the performance of benchmarked code, and gives strong statistical confidence that apparent performance changes are real and not simply noise. Clear output, a simple API and reasonable defaults make it easy to use even for developers without a background in statistics. Unlike the benchmarking harness provided by Rust, Criterion.rs can be used with stable versions of the compiler.
If you aren’t already using Criterion.rs for your benchmarks, check out the Getting Started guide or go right to the GitHub repo.
New Features
These are only some of the improvements made to Criterion.rs in v0.3.0 - for a more complete list, see the CHANGELOG.
Custom Measurements
Criterion.rs now has basic support for plugging in custom measurements to replace the default wall-clock time measurement. This has been a highly-requested feature during the lifetime of 0.2.0, so I look forward to seeing all the neat things people use it for.
Profiler Hooks
Some profiling tools require the programmer to instrument their code with calls to start and stop
the profiler. Criterion.rs now provides hooks for benchmark authors to plug in their preferred
profiler so that it can be used in --profile-time
mode, without having to constantly recompile
the benchmarks.
Added the BenchmarkGroup
Type
The older Benchmark
and ParameterizedBenchmark
structs were used to group together related
benchmarks so that Criterion.rs could generate summaries of the measurements comparing different
functions on different inputs. Unfortunately, they could be very limiting. It was not possible
to change the benchmark configuration based on the input or the function being tested (for example,
to reduce the sample count on long-running benchmarks over large inputs while keeping the higher
sample count for smaller inputs). It was also awkward to benchmark over multi-dimensional input
ranges, they didn’t allow much programmer control over the benchmark IDs, etc.
After some re-thinking of the problem, I realized that a much simpler, more-flexible design was
possible, so I built BenchmarkGroup
. The older structs still exist and still work, but will be
deprecated sometime during the lifetime of 0.3.0 and removed in 0.4.0.
Examples:
#[macro_use] extern crate criterion; | |
use self::criterion::*; | |
use std::time::Duration; | |
fn bench_simple(c: &mut Criterion) { | |
let mut group = c.benchmark_group("My Group"); | |
// Now we can perform benchmarks with this group | |
group.bench_function("Bench 1", |b| b.iter(|| 1 )); | |
group.bench_function("Bench 2", |b| b.iter(|| 2 )); | |
// It's recommended to call group.finish() explicitly at the end, but if you don't it will | |
// be called automatically when the group is dropped. | |
group.finish(); | |
} | |
fn bench_nested(c: &mut Criterion) { | |
let mut group = c.benchmark_group("My Second Group"); | |
// We can override the configuration on a per-group level | |
group.measurement_time(Duration::from_secs(1)); | |
// We can also use loops to define multiple benchmarks, even over multiple dimensions. | |
for x in 0..3 { | |
for y in 0..3 { | |
let point = (x, y); | |
let parameter_string = format!("{} * {}", x, y); | |
group.bench_with_input(BenchmarkId::new("Multiply", parameter_string), &point, | |
|b, (p_x, p_y)| b.iter(|| p_x * p_y)); | |
} | |
} | |
group.finish(); | |
} | |
fn bench_throughput(c: &mut Criterion) { | |
let mut group = c.benchmark_group("Summation"); | |
for size in [1024, 2048, 4096].iter() { | |
// Generate input of an appropriate size... | |
let input = vec![1u64, *size]; | |
// We can use the throughput function to tell Criterion.rs how large the input is | |
// so it can calculate the overall throughput of the function. If we wanted, we could | |
// even change the benchmark configuration for different inputs (eg. to reduce the | |
// number of samples for extremely large and slow inputs) or even different functions. | |
group.throughput(Throughput::Elements(*size as u32)); | |
group.bench_with_input(BenchmarkId::new("sum", *size), &input, | |
|b, i| b.iter(|| i.iter().sum::<u64>())); | |
group.bench_with_input(BenchmarkId::new("fold", *size), &input, | |
|b, i| b.iter(|| i.iter().fold(0u64, |a, b| a + b))); | |
} | |
group.finish(); | |
} | |
criterion_group!(benches, bench_simple, bench_nested, bench_throughput); | |
criterion_main!(benches); |
Custom Test Framework
Nightly-compiler users can now add a dependency on criterion_macro
and use #[criterion]
to mark their benchmarks instead of using the criterion_group!/criterion_main!
macros.
Examples:
#![feature(custom_test_frameworks)] | |
#![test_runner(criterion::runner)] | |
use criterion::{Criterion, black_box}; | |
use criterion_macro::criterion; | |
fn fibonacci(n: u64) -> u64 { | |
match n { | |
0 | 1 => 1, | |
n => fibonacci(n - 1) + fibonacci(n - 2), | |
} | |
} | |
fn custom_criterion() -> Criterion { | |
Criterion::default() | |
.sample_size(50) | |
} | |
#[criterion] | |
fn bench_simple(c: &mut Criterion) { | |
c.bench_function("Fibonacci-Simple", |b| b.iter(|| fibonacci(black_box(10)))); | |
} | |
#[criterion(custom_criterion())] | |
fn bench_custom(c: &mut Criterion) { | |
c.bench_function("Fibonacci-Custom", |b| b.iter(|| fibonacci(black_box(20)))); | |
} |
Breaking Changes
Unfortunately, some breaking changes were necessary to implement these new features.
The format of the raw.csv
file has changed
Some additional columns were added to include throughput information. Also, sample_time_nanos
has been split into sample_measured_value
and unit
to accommodate custom measurements.
External Program Benchmarks have been removed.
This feature was never used enough to justify the maintenance burden, so it was deprecated in 0.2.6
and removed in 0.3.0. With some extra effort on the part of the benchmark author, the new
iter_custom
timing loop can be used to implement external program benchmarks.
Throughput has been expanded to u64
Throughputs previously contained a u32 value representing the number of bytes or elements processed by an iteration of the benchmark. This has been expanded to u64 to allow for extremely large iterations.
Thank You
Thank you to all of the many folks who have contributed pull requests or ideas and suggestions to Criterion over the last few years.
Also, thank you to all you folks who use Criterion.rs for their benchmarks.