evaluate_file
Lettuce’s evaluation framework can run end-to-end, and this provides an example. The JSON output from running this script can be found in the repo.
Process
Load the example dataset
The file found in evaluation/datasets/example.csv
is loaded using a SingleInputCSVforLLM
Define a prompt
In this example, every LLM pipeline uses the same prompt, but you can test alternative prompts
Define pipelines
A list of different test pipelines using LLMs is created
Define tests
A list of PipelinTests
is created from the list of pipelines
Main function evaluates the pipelines
The main function creates an EvaluationFramework
.
Its run_evaluations()
method then runs the data through the pipelines, calculates the metrics and saves the result as a JSON file