Criterion.rs v0.3 - Custom Measurements, Profiling Hooks, Custom Test Framework, API Changes

3 minute read Published:

I’m pleased to announce the release of Criterion.rs v0.3, available today. Version 0.3 provides a number of new features including preliminary support for plugging in custom measurements (eg. hardware timers or POSIX CPU time), hooks to start/stop profilers, a new BenchmarkGroup struct that provides more flexibility than the older Benchmark and ParameterizedBenchmark structs, and an implementation of a #[criterion] custom-test-framework macro for those on Nightly.

What is Criterion.rs?

Criterion.rs is a statistics-driven benchmarking library for Rust. It provides precise measurements of changes in the performance of benchmarked code, and gives strong statistical confidence that apparent performance changes are real and not simply noise. Clear output, a simple API and reasonable defaults make it easy to use even for developers without a background in statistics. Unlike the benchmarking harness provided by Rust, Criterion.rs can be used with stable versions of the compiler.

If you aren’t already using Criterion.rs for your benchmarks, check out the Getting Started guide or go right to the GitHub repo.

New Features

These are only some of the improvements made to Criterion.rs in v0.3.0 - for a more complete list, see the CHANGELOG.

Custom Measurements

Criterion.rs now has basic support for plugging in custom measurements to replace the default wall-clock time measurement. This has been a highly-requested feature during the lifetime of 0.2.0, so I look forward to seeing all the neat things people use it for.

Profiler Hooks

Some profiling tools require the programmer to instrument their code with calls to start and stop the profiler. Criterion.rs now provides hooks for benchmark authors to plug in their preferred profiler so that it can be used in --profile-time mode, without having to constantly recompile the benchmarks.

Added the BenchmarkGroup Type

The older Benchmark and ParameterizedBenchmark structs were used to group together related benchmarks so that Criterion.rs could generate summaries of the measurements comparing different functions on different inputs. Unfortunately, they could be very limiting. It was not possible to change the benchmark configuration based on the input or the function being tested (for example, to reduce the sample count on long-running benchmarks over large inputs while keeping the higher sample count for smaller inputs). It was also awkward to benchmark over multi-dimensional input ranges, they didn’t allow much programmer control over the benchmark IDs, etc.

After some re-thinking of the problem, I realized that a much simpler, more-flexible design was possible, so I built BenchmarkGroup. The older structs still exist and still work, but will be deprecated sometime during the lifetime of 0.3.0 and removed in 0.4.0.

Examples:

#[macro_use] extern crate criterion;
use self::criterion::*;
use std::time::Duration;
fn bench_simple(c: &mut Criterion) {
let mut group = c.benchmark_group("My Group");
// Now we can perform benchmarks with this group
group.bench_function("Bench 1", |b| b.iter(|| 1 ));
group.bench_function("Bench 2", |b| b.iter(|| 2 ));
// It's recommended to call group.finish() explicitly at the end, but if you don't it will
// be called automatically when the group is dropped.
group.finish();
}
fn bench_nested(c: &mut Criterion) {
let mut group = c.benchmark_group("My Second Group");
// We can override the configuration on a per-group level
group.measurement_time(Duration::from_secs(1));
// We can also use loops to define multiple benchmarks, even over multiple dimensions.
for x in 0..3 {
for y in 0..3 {
let point = (x, y);
let parameter_string = format!("{} * {}", x, y);
group.bench_with_input(BenchmarkId::new("Multiply", parameter_string), &point,
|b, (p_x, p_y)| b.iter(|| p_x * p_y));
}
}
group.finish();
}
fn bench_throughput(c: &mut Criterion) {
let mut group = c.benchmark_group("Summation");
for size in [1024, 2048, 4096].iter() {
// Generate input of an appropriate size...
let input = vec![1u64, *size];
// We can use the throughput function to tell Criterion.rs how large the input is
// so it can calculate the overall throughput of the function. If we wanted, we could
// even change the benchmark configuration for different inputs (eg. to reduce the
// number of samples for extremely large and slow inputs) or even different functions.
group.throughput(Throughput::Elements(*size as u32));
group.bench_with_input(BenchmarkId::new("sum", *size), &input,
|b, i| b.iter(|| i.iter().sum::<u64>()));
group.bench_with_input(BenchmarkId::new("fold", *size), &input,
|b, i| b.iter(|| i.iter().fold(0u64, |a, b| a + b)));
}
group.finish();
}
criterion_group!(benches, bench_simple, bench_nested, bench_throughput);
criterion_main!(benches);

Custom Test Framework

Nightly-compiler users can now add a dependency on criterion_macro and use #[criterion] to mark their benchmarks instead of using the criterion_group!/criterion_main! macros.

Examples:

#![feature(custom_test_frameworks)]
#![test_runner(criterion::runner)]
use criterion::{Criterion, black_box};
use criterion_macro::criterion;
fn fibonacci(n: u64) -> u64 {
match n {
0 | 1 => 1,
n => fibonacci(n - 1) + fibonacci(n - 2),
}
}
fn custom_criterion() -> Criterion {
Criterion::default()
.sample_size(50)
}
#[criterion]
fn bench_simple(c: &mut Criterion) {
c.bench_function("Fibonacci-Simple", |b| b.iter(|| fibonacci(black_box(10))));
}
#[criterion(custom_criterion())]
fn bench_custom(c: &mut Criterion) {
c.bench_function("Fibonacci-Custom", |b| b.iter(|| fibonacci(black_box(20))));
}

Breaking Changes

Unfortunately, some breaking changes were necessary to implement these new features.

The format of the raw.csv file has changed

Some additional columns were added to include throughput information. Also, sample_time_nanos has been split into sample_measured_value and unit to accommodate custom measurements.

External Program Benchmarks have been removed.

This feature was never used enough to justify the maintenance burden, so it was deprecated in 0.2.6 and removed in 0.3.0. With some extra effort on the part of the benchmark author, the new iter_custom timing loop can be used to implement external program benchmarks.

Throughput has been expanded to u64

Throughputs previously contained a u32 value representing the number of bytes or elements processed by an iteration of the benchmark. This has been expanded to u64 to allow for extremely large iterations.

Thank You

Thank you to all of the many folks who have contributed pull requests or ideas and suggestions to Criterion over the last few years.

Also, thank you to all you folks who use Criterion.rs for their benchmarks.