Evaltypes – Nextra

evaluation.evaltypes

`Metric`

class Metric()
 
M = TypeVar("M", bound=Metric)

Base class for all metrics.

The Metric abstract base class also serves as the bound for the M type

Methods

`calculate`

def calculate()

Calculate the metric value.

`description`

def description()

Description of the metric. Implemented by each class

`TestPipeline`

class TestPipeline()
 
P = TypeVar("P", bound=TestPipeline)

Base class for Pipeline runs

The Pipeline abstract base class also serves as the bound for the P type

Methods

`run`

def run()

Run the pipeline

`drop`

def drop()

`PipelineTest`

class PipelineTest(
	name: str
	pipeline: P
	metrics: list[M]
)

Base class for Pipeline tests

Methods

`init`

def __init__(
	name: str
	pipeline: P
	metrics: list[M]
)

`run_pipeline`

def run_pipeline()

`evaluate`

def evaluate()

`metric_descriptions`

def metric_descriptions()

`drop_pipeline`

def drop_pipeline()

`SingleResultMetric`

class SingleResultMetric()

Metric for evaluating pipelines that return a single result.

`InformationRetrievalMetric`

class InformationRetrievalMetric(
)

Metric for evaluating information retrieval pipelines.

`SingleResultPipeline`

class SingleResultPipeline(
)

Base class for pipelines returning a single result

`SingleResultPipelineTest`

class SingleResultPipelineTest(
	name: str
	pipeline: SingleResultPipeline
	metrics: list[SingleResultMetric]
)

Methods

`init`

def __init__(
	name: str
	pipeline: SingleResultPipeline
	metrics: SingleResultMetric[list]
)

`run_pipeline`

def run_pipeline(
	input_data: 
)

Run the pipeline with the given input data.

Parameters

input_data The input data for the pipeline.

Returns

The result of running the pipeline on the input data.

`evaluate`

def evaluate(
	input_data: 
	expected_output: 
)

Evaluate the pipeline by running it on the input data and comparing the result to the expected output using all metrics.

Parameters

input_data The input data for the pipeline. expected_output The expected output to compare against.

####### Returns A dictionary mapping metric names to their calculated values.

`EvalDataLoader`

class EvalDataLoader(
	file_path: str
)

Provides an abstract base class for loading data for an EvaluationFramework. The methods are left abstract to be implemented as required for different pipeline evaluations.

Methods

`init`

def __init__(
	file_path: str
)

Initialises the EvalDataLoader

Parameters

file_path: str A path to the file to be loaded.

`input_data`

def input_data()

An EvaluationFramework requires an EvalDataLoader to provide input_data, but subclasses must implement it

`expected_output`

def expected_output()

An EvaluationFramework requires an EvalDataLoader to provide expected_output, but subclasses must implement it

`EvaluationFramework`

class EvaluationFramework(
	name: str
	pipeline_tests: List[PipelineTest]
	dataset: EvalDataLoader
	description: str
	results_path: str
)

This class provides a container for running multiple pipeline tests. It loads the data from an EvalDataLoader, runs the specified pipeline tests, and saves the output to a .json file

Methods

`init`

def __init__(
	name: str
	pipeline_tests: list[PipelineTest]
	dataset: EvalDataLoader
	description: str
	results_path: str
)

Initialises the EvaluationFramework

Parameters

name: str The name of the evaluation experiment, as stored in the output file pipeline_tests: List[PipelineTest] A list of pipeline tests to run for an evaluation dataset: EvalDataLoader An EvalDataLoader for the data used for the pipeline tests description: str A description of the experiment for the output file results_path: str A path pointing to the file for results storage

`run_evaluations`

def run_evaluations()

Runs the pipeline tests, storing the results labelled by the name of the pipeline test, then saves to the results file

`_save_evaluations`

def _save_evaluations()

If there is a file in the results_path, loads the json and rewrites it with the current experiment appended. Otherwise, creates a new output file

Eval Tests Metrics