evaluation.evaltypes
Metric
class Metric()
M = TypeVar("M", bound=Metric)
Base class for all metrics.
The Metric abstract base class also serves as the bound for the M type
Methods
calculate
def calculate()
Calculate the metric value.
description
def description()
Description of the metric. Implemented by each class
TestPipeline
class TestPipeline()
P = TypeVar("P", bound=TestPipeline)
Base class for Pipeline runs
The Pipeline abstract base class also serves as the bound for the P type
Methods
run
def run()
Run the pipeline
drop
def drop()
PipelineTest
class PipelineTest(
name: str
pipeline: P
metrics: list[M]
)
Base class for Pipeline tests
Methods
__init__
def __init__(
name: str
pipeline: P
metrics: list[M]
)
run_pipeline
def run_pipeline()
evaluate
def evaluate()
metric_descriptions
def metric_descriptions()
drop_pipeline
def drop_pipeline()
SingleResultMetric
class SingleResultMetric()
Metric for evaluating pipelines that return a single result.
InformationRetrievalMetric
class InformationRetrievalMetric(
)
Metric for evaluating information retrieval pipelines.
SingleResultPipeline
class SingleResultPipeline(
)
Base class for pipelines returning a single result
SingleResultPipelineTest
class SingleResultPipelineTest(
name: str
pipeline: SingleResultPipeline
metrics: list[SingleResultMetric]
)
Methods
__init__
def __init__(
name: str
pipeline: SingleResultPipeline
metrics: SingleResultMetric[list]
)
run_pipeline
def run_pipeline(
input_data:
)
Run the pipeline with the given input data.
Parameters
input_data
The input data for the pipeline.
Returns
The result of running the pipeline on the input data.
evaluate
def evaluate(
input_data:
expected_output:
)
Evaluate the pipeline by running it on the input data and comparing the result to the expected output using all metrics.
Parameters
input_data
The input data for the pipeline.
expected_output
The expected output to compare against.
####### Returns A dictionary mapping metric names to their calculated values.
EvalDataLoader
class EvalDataLoader(
file_path: str
)
Provides an abstract base class for loading data for an EvaluationFramework. The methods are left abstract to be implemented as required for different pipeline evaluations.
Methods
__init__
def __init__(
file_path: str
)
Initialises the EvalDataLoader
Parameters
file_path: str
A path to the file to be loaded.
input_data
def input_data()
An EvaluationFramework requires an EvalDataLoader to provide input_data, but subclasses must implement it
expected_output
def expected_output()
An EvaluationFramework requires an EvalDataLoader to provide expected_output, but subclasses must implement it
EvaluationFramework
class EvaluationFramework(
name: str
pipeline_tests: List[PipelineTest]
dataset: EvalDataLoader
description: str
results_path: str
)
This class provides a container for running multiple pipeline tests. It loads the data from an EvalDataLoader, runs the specified pipeline tests, and saves the output to a .json file
Methods
__init__
def __init__(
name: str
pipeline_tests: list[PipelineTest]
dataset: EvalDataLoader
description: str
results_path: str
)
Initialises the EvaluationFramework
Parameters
name: str
The name of the evaluation experiment, as stored in the output file
pipeline_tests: List[PipelineTest]
A list of pipeline tests to run for an evaluation
dataset: EvalDataLoader
An EvalDataLoader for the data used for the pipeline tests
description: str
A description of the experiment for the output file
results_path: str
A path pointing to the file for results storage
run_evaluations
def run_evaluations()
Runs the pipeline tests, storing the results labelled by the name of the pipeline test, then saves to the results file
_save_evaluations
def _save_evaluations()
If there is a file in the results_path, loads the json and rewrites it with the current experiment appended. Otherwise, creates a new output file