Benchmarks and Metrics#

Inheritance diagram of datapipe_testbench.benchmark, datapipe_testbench.auto_benchmark

Benchmarks#

Defines what is a Benchmark.

class Benchmark[source]#

Bases: object

Defines what a “scientist” should define for a given benchmark.

A benchmark is defined by two functions:

  1. generate_metrics(), which will be used to transform event data into one or more Metric stored in a MetricsStore corresponding to a single set of processed events.

  2. compare_to_reference(), which compares one or more MetricsStores to a reference MetricsStore. This function is run after there are multiple MetricsStores available that have been processed with generate_metrics.

check_input_dataset(input_dataset: InputDataset)[source]#

Ensure required inputs exist, or raise MissingInputError.

abstractmethod compare_to_reference(metric_store_list: list[MetricsStore], result_store: ResultStore) dict | None[source]#

Perform the comparison study on metrics associated with a set of experiments.

This function takes a list of MetricStores that have been previously filled by datapipe_testbench.benchmark.Benchmark.generate_metrics(), generates plots and comparison results, and writes the output to a ResultsStore. If there are more than one MetricStore to compare, the first one is considered the _reference_.

Parameters:
metric_store_list: list[MetricsStore]

The list of MetricStores to compare, the first of which is the _reference_ .

result_store: ResultStore

WHere the results of the comparison study are stored.

abstractmethod generate_metrics(metric_store: MetricsStore) dict | None[source]#

Produce metrics for this benchmark later comparison.

Called once per benchmark per MetricsStore, i.e. for one set of input events, defines how to transform the event into to a metrics stored in a MetricsStore that can later be compared.

Parameters:
metric_store: MetricsStore

Where to store the metrics generated by this Benchmark. It must be initialized with an datapipe_testbench.inputdataset.InputDataset containing the inputs to use for the transformation into metrics.

abstractmethod make_report(result_store: ResultStore)[source]#

Make a report for the benchmark using the results of the comparisons.

The result_store must already contain the outputs of compare_to_reference().

Parameters:
result_storeResultStore

Contains the input comparisons and is where the report will be stored.

missing_outputs(metric_store: MetricsStore)[source]#

Return list of missing outputs.

property name#

Return friendly name of this benchmark.

abstractmethod outputs() dict[str, Path][source]#

Return mapping of key to Path of all outputs that this Benchmark generates.

abstract property required_inputs: set[str]#

Return the set of required keys in the InputDataset that this Benchmark needs.

These are checked before generating metrics

class BenchmarkResult(benchmark_name: str, reference_dataset_name: str, test_dataset_name: str, status: ComparisonStatus, comment: str, plots: list[FigureBase] | None)[source]#

Bases: object

Output of a benchmark is one or more of these.

class ComparisonStatus(*values)[source]#

Bases: str, Enum

Status of a benchmark check.

FAILED = '3'#

benchmark failed

OTHER = '4'#

other failure, or couldn’t complete benchmark

PASSED = '1'#

benchmark passed

WARNING = '2'#

benchmark passed, but only barely, should check

exception MissingInputError[source]#

Bases: RuntimeError

Raised when required inputs are missing.

Metrics#

Ndhistogram metric and datamodel location definition.

class Metric(axis_list: list[AxesMixin], unit: Unit | str = '', label: str | None = None, output_store: Storable | None = None, other_attributes: dict | None = None, data=None, hist: Hist | None = None)[source]#

Bases: Storable

ND-Histogram that can be serialized to ASDF.

Class stores ND-histogram based on scikit-hep Hist package to create bins by accumulation.

This initializer should not be overridden in subclasses! If you need a specialized constructor, define a @classmethod that does what is needed and calls this generic one to return a properly constructed Metric. Otherwise serialization will not work.

Parameters:
axis_listlist[axis.AxesMixin]

List of hist.axis objects. Each can define a metadata keyword (str) that will be used as the. If the axis has a unit, set it in the “unit” key in the axis.metadata dict

labelstr | None

if none, an automatic label will be generated.

unit: u.Unit | None:

Unit of the _data_ stored in the metric. Note that you should set _axis_ units by passing them as metadata to the axes in the axis_list, i.e. RegularAxis(..., metadata=dict(unit="deg"))

output_store: ResultStore | None

If specified, the name of this metric will be updated to reflect the output name. This is used e.g. by AutoBenchmark for nicer bookkeeping and to avoid name collisions.

other_attributes: dict[str,Any] | None

Other attributes of this metric that should be stored along with it.

data: np.array | None

If specified, will set the internal histogram to this value.. Must be the same shape as the axes specify.

hist: Hist | None

If set, use this for the internal histogram rather than constructing one. Must have the same axis definition as specified .

property axes#

Wrapper to transfer to _storage.

category_dimensions() list[int][source]#

Return indices of non-continuous dimensions.

abstractmethod compare(others: list)[source]#

Compare several instances of the class.

Uses self as reference when comparing with others. Plots the spectra on top of each other, and calculates the wasserstein distance.

Parameters:
otherslist[Metric]

List containing other instances of this class.

Returns:
ComparisonResult | None

Results of comparison of this reference to the others list.

compatible(other)[source]#

Compare objects and ensure number and name of axis match.

Parameters:
other
Metric other

Other instance to compare self to.

Returns:
True if the objects are compatible.
Raises:
IndexError.
fill(*columns)[source]#

Fill defined axis.

Parameters:
*columns
get_identifier()[source]#

Auto generation of a unique identifier for the metric.

Use axis order and name to return a tuple that need to be unique and allow identification of the current metric throughout multiple metric stores.

classmethod load(init)[source]#

Load class instance from file.

Parameters:
cls
init

Dict or string

normalize_to_probability(axis_index: int = 1, threshold_frac=0.005, fill_value=nan) Self[source]#

Normalize the data such that the integral over axis axis_index 1.0.

Parameters:
arrnp.ndarray

The input array.

axis_indexint

The axis along which to normalize.

threshold_frac: float

if the integral along the axis is below this fraction of the total integral, mask off this row as having too low stats to plot. This prevents low-stats values form saturating the color scale

fill_value:

value to replace low-stats entries with

Returns:
np.ndarray

A new Metric normalized along the specified axis.

plot(full=False, *args, **kwargs)[source]#

Redirect plot calls to underlying histogram.

plot_compare_1d(others: Self | list[Self] = None, fig: None | FigureBase = None, legend=True, show_xlabel=True, layout: str | None = 'constrained', sharex: Axes | None = None, sharey: Axes | None = None, sharey_diff: Axes | None = None, **kwargs) dict[str, Axes][source]#

Generate a subfigure with comparisons and residuals.

The residuals are the relative differences, \(\Delta_\mathrm{rel} \equiv \frac{(v_i - v_\mathrm{ref})}{v_\mathrm{ref}}\)

Parameters:
others: Metric | list[Metric] | None

List of metrics to compare to reference

fig: matplotlib.Figure | None

Figure or SubFigure to add the axes to, or none to use current.

legend: bool

Show legend on plot

layout: str | None

If fig not passed, in, use this maptlotlib layout engine.

sharex: Axes | None

If specified, share all x-axes with this axis.

sharey: Axes | None

If specified share the value’s y-axis with this axis

sharey_diff: Axes | None

If specified share the diff’s y-axis with this axis.

**kwargs:

Any other args are passed to the plot() functions.

Returns:
dict[matplotlib.Axes]:

“val”: value_axis, “diff”: relative_difference_axis

project(*columns)[source]#

Project underlying histogram.

Parameters:
*columns
save(output_file: Path)[source]#

Save metric to file.

Parameters:
output_filePath

Output filename.

abstractmethod classmethod setup(*args, **kwargs) Self[source]#

Create a new Metric with specialized parameters.

Overridden by subclasses to properly setup the Metric the first time with user-specified and metric-specific parameters. This should be called instead of the default constructor to set up the Metric correctly. Afterward, it will be loaded from a file using the default Metric constructor. This is because all Metrics must share the same __init__ method for serialization to work.

split_by_category() dict[Self][source]#

Turn a Metric with categories into a dict of Metrics by category.

stack(*columns)[source]#

Stack underlying histogram.

Parameters:
*columns
to_numpy(flow=False)[source]#

Return numpy representation of underlying histogram.