`components.models`

source

This module provides functionality to load the user specified model as part of the components.pipeline.LLMPipeline. It supports both local model inference via llama.cpp and remote inference via OpenAI’s API.

The module manages LLM initialisation and selection to power the drug name standardisation pipeline, with support for:

Local model inference using quantized GGUF models
Remote inference using OpenAI models
Automatic model downloading from Hugging Face Hub

Functions

`get_model`

def get_model(model, logger, temperature=0.7, path_to_local_weights=None
	model: LLMModel
	logger: Logger  
	temperature: float = 0.7,  
	path_to_local_weights: os.PathLike[Any] | str | None = None 
) -> OpenAIGenerator | LlamaCppGenerator

Get an interface for interacting with an LLM.

If a path to a .gguf model file is provided via path_to_local_weights, the model is loaded locally using a LlamaCppGenerator. In this case, no remote download or API requests will take place.

If no local path is provided, the function uses Haystack Generators to provide an interface to a model. If the model_name is a GPT, then the interface is to a remote OpenAI model. Otherwise, uses a LlamaCppGenerator to start a llama.cpp model and provide an interface.

Parameters

Parameter	Type	Description
`model`	`LLMModel`	An enum representing the desired model. The enum’s value should match one of the registered model names (e.g., `"llama-3.1-8b"` or `"gpt-4"`).
`logger`	`logging.Logger`	Logger instance for tracking progress and errors.
`temperature`	`float`	Controls the randomness of the output. Higher values (e.g., 1.0) make output more diverse, while lower values (e.g., 0.2) make it more deterministic. Defaults to 0.7.
`path_to_local_weights`	`os.PathLike`, `str`, or `None`	Path to a local `.gguf` model weights file. If provided, the function skips remote model loading and uses this local file. If not provided, the function will attempt to load the model from Hugging Face or connect to OpenAI.

Returns

OpenAIGenerator or LlamaCppGenerator

A LLM text generation interface compatible with Haystack’s component framework.

Implemented models

Model name	Summary
llama-3.1-8b	Recommended Meta’s Llama 3.1 with 8 billion parameters, quantized to 4 bits
llama-2-7b-chat	Meta’s Llama 2 with 7 billion parameters, quantized to 4 bits
llama-3-8b	Meta’s Llama 3 with 8 billion parameters, quantized to 4 bits
llama-3-70b	Meta’s Llama 3 with 70 billion parameters, quantized to 4 bits
gemma-7b	Google’s Gemma with 7 billion parameters, quantized to 4 bits
llama-3.2-3b	Meta’s Llama 3.2 with 3 billion parameters, quantized to 6 bits
mistral-7b	Mistral at 7 billion parameters, quantized to 4 bits
kuchiki-l2-7b	A merge of several models at 7 billion parameters, quantized to 4 bits
tinyllama-1.1b-chat	Llama 2 extensively pre-trained, with 1.1 billion parameters, quantized to 4 bits
biomistral-7b	Mistral at 7 billion parameters, pre-trained on biomedical data, qunatized to 4 bits
qwen2.5-3b-instruct	Alibaba’s Qwen 2.5 at 3 billion parameters, quantized to 5 bits
airoboros-3b	Llama 2 pre-trained on the airoboros 3.0 dataset at 3 billion parameters, quantized to 4 bits
medicine-chat	Llama 2 pre-trained on medical data, quantized to 4 bits
medicine-llm-13b	Llama pre-trained on medical data at 13 billion parameters, quantized to 4 bits
med-llama-3-8b-v1	Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 5 bits
med-llama-3-8b-v2	Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 4 bits
med-llama-3-8b-v3	Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits
med-llama-3-8b-v4	Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits

If you would like to add a model, raise an issue

Embeddings Pipeline