Benchmarks and Metrics#

Inheritance diagram of datapipe_testbench.benchmark, datapipe_testbench.auto_benchmark

Benchmarks#

Defines what is a Benchmark.

class Benchmark[source]#

Bases: object

Defines what a “scientist” should define for a given benchmark.

A benchmark is defined by two functions:

generate_metrics(), which will be used to transform event data into one or more Metric stored in a MetricsStore corresponding to a single set of processed events.
compare_to_reference(), which compares one or more MetricsStores to a reference MetricsStore. This function is run after there are multiple MetricsStores available that have been processed with generate_metrics.

check_input_dataset(input_dataset: InputDataset)[source]#: Ensure required inputs exist, or raise MissingInputError.

abstractmethod compare_to_reference(metric_store_list: list[MetricsStore], result_store: ResultStore) → dict | None[source]#

Perform the comparison study on metrics associated with a set of experiments.

This function takes a list of MetricStores that have been previously filled by datapipe_testbench.benchmark.Benchmark.generate_metrics(), generates plots and comparison results, and writes the output to a ResultsStore. If there are more than one MetricStore to compare, the first one is considered the _reference_.

Parameters:

metric_store_list: list[MetricsStore]: The list of MetricStores to compare, the first of which is the _reference_ .
result_store: ResultStore: WHere the results of the comparison study are stored.

abstractmethod generate_metrics(metric_store: MetricsStore) → dict | None[source]#

Produce metrics for this benchmark later comparison.

Called once per benchmark per MetricsStore, i.e. for one set of input events, defines how to transform the event into to a metrics stored in a MetricsStore that can later be compared.

Parameters:

metric_store: MetricsStore: Where to store the metrics generated by this Benchmark. It must be initialized with an datapipe_testbench.inputdataset.InputDataset containing the inputs to use for the transformation into metrics.

abstractmethod make_report(result_store: ResultStore)[source]#

Make a report for the benchmark using the results of the comparisons.

The result_store must already contain the outputs of compare_to_reference().

Parameters:

result_storeResultStore: Contains the input comparisons and is where the report will be stored.

missing_outputs(metric_store: MetricsStore)[source]#: Return list of missing outputs.

property name#: Return friendly name of this benchmark.

abstractmethod outputs() → dict[str, Path][source]#: Return mapping of key to Path of all outputs that this Benchmark generates.

abstract property required_inputs: set[str]#

Return the set of required keys in the InputDataset that this Benchmark needs.

These are checked before generating metrics

class BenchmarkResult(benchmark_name: str, reference_dataset_name: str, test_dataset_name: str, status: ComparisonStatus, comment: str, plots: list[FigureBase] | None)[source]#

Bases: object

Output of a benchmark is one or more of these.

class ComparisonStatus(*values)[source]#

Bases: str, Enum

Status of a benchmark check.

FAILED = '3'#: benchmark failed

OTHER = '4'#: other failure, or couldn’t complete benchmark

PASSED = '1'#: benchmark passed

WARNING = '2'#: benchmark passed, but only barely, should check

exception MissingInputError[source]#

Bases: RuntimeError

Raised when required inputs are missing.

Metrics#

Ndhistogram metric and datamodel location definition.

Bases: Storable

ND-Histogram that can be serialized to ASDF.

Class stores ND-histogram based on scikit-hep Hist package to create bins by accumulation.

This initializer should not be overridden in subclasses! If you need a specialized constructor, define a @classmethod that does what is needed and calls this generic one to return a properly constructed Metric. Otherwise serialization will not work.

Parameters:

axis_listlist[axis.AxesMixin]: List of hist.axis objects. Each can define a metadata keyword (str) that will be used as the. If the axis has a unit, set it in the “unit” key in the axis.metadata dict
labelstr | None: if none, an automatic label will be generated.
unit: u.Unit | None:: Unit of the _data_ stored in the metric. Note that you should set _axis_ units by passing them as metadata to the axes in the axis_list, i.e. RegularAxis(..., metadata=dict(unit="deg"))
output_store: ResultStore | None: If specified, the name of this metric will be updated to reflect the output name. This is used e.g. by AutoBenchmark for nicer bookkeeping and to avoid name collisions.
other_attributes: dict[str,Any] | None: Other attributes of this metric that should be stored along with it.
data: np.array | None: If specified, will set the internal histogram to this value.. Must be the same shape as the axes specify.
hist: Hist | None: If set, use this for the internal histogram rather than constructing one. Must have the same axis definition as specified .

property axes#: Wrapper to transfer to _storage.

category_dimensions() → list[int][source]#: Return indices of non-continuous dimensions.

abstractmethod compare(others: list)[source]#

Compare several instances of the class.

Uses self as reference when comparing with others. Plots the spectra on top of each other, and calculates the wasserstein distance.

Parameters:

otherslist[Metric]: List containing other instances of this class.

Returns:

ComparisonResult | None: Results of comparison of this reference to the others list.

compatible(other)[source]#

Compare objects and ensure number and name of axis match.

Parameters:

other
Metric other: Other instance to compare self to.

Returns:

True if the objects are compatible.

Raises:

IndexError.

fill(*columns)[source]#

Fill defined axis.

Parameters:

*columns

get_identifier()[source]#

Auto generation of a unique identifier for the metric.

Use axis order and name to return a tuple that need to be unique and allow identification of the current metric throughout multiple metric stores.

classmethod load(init)[source]#

Load class instance from file.

Parameters:

cls
init: Dict or string

normalize_to_probability(axis_index: int = 1, threshold_frac=0.005, fill_value=nan) → Self[source]#

Normalize the data such that the integral over axis axis_index 1.0.

Parameters:

arrnp.ndarray: The input array.
axis_indexint: The axis along which to normalize.
threshold_frac: float: if the integral along the axis is below this fraction of the total integral, mask off this row as having too low stats to plot. This prevents low-stats values form saturating the color scale
fill_value:: value to replace low-stats entries with

Returns:

np.ndarray: A new Metric normalized along the specified axis.

plot(full=False, *args, **kwargs)[source]#: Redirect plot calls to underlying histogram.

Generate a subfigure with comparisons and residuals.

The residuals are the relative differences, \(\Delta_\mathrm{rel} \equiv \frac{(v_i - v_\mathrm{ref})}{v_\mathrm{ref}}\)

Parameters:

others: Metric | list[Metric] | None: List of metrics to compare to reference
fig: matplotlib.Figure | None: Figure or SubFigure to add the axes to, or none to use current.
legend: bool: Show legend on plot
layout: str | None: If fig not passed, in, use this maptlotlib layout engine.
sharex: Axes | None: If specified, share all x-axes with this axis.
sharey: Axes | None: If specified share the value’s y-axis with this axis
sharey_diff: Axes | None: If specified share the diff’s y-axis with this axis.
**kwargs:: Any other args are passed to the plot() functions.

Returns:

dict[matplotlib.Axes]:: “val”: value_axis, “diff”: relative_difference_axis

project(*columns)[source]#

Project underlying histogram.

Parameters:

*columns

save(output_file: Path)[source]#

Save metric to file.

Parameters:

output_filePath: Output filename.

abstractmethod classmethod setup(*args, **kwargs) → Self[source]#

Create a new Metric with specialized parameters.

Overridden by subclasses to properly setup the Metric the first time with user-specified and metric-specific parameters. This should be called instead of the default constructor to set up the Metric correctly. Afterward, it will be loaded from a file using the default Metric constructor. This is because all Metrics must share the same __init__ method for serialization to work.

split_by_category() → dict[Self][source]#: Turn a Metric with categories into a dict of Metrics by category.

stack(*columns)[source]#

Stack underlying histogram.

Parameters:

*columns

to_numpy(flow=False)[source]#: Return numpy representation of underlying histogram.

Benchmarks and Metrics#

Benchmarks#

Metrics#

This Page