components.models
This module provides functionality to load the user specified model as part of the components.pipeline.LLMPipeline
.
It supports both local model inference via llama.cpp and remote inference via OpenAI’s API.
The module manages LLM initialisation and selection to power the drug name standardisation pipeline, with support for:
- Local model inference using quantized GGUF models
- Remote inference using OpenAI models
- Automatic model downloading from Hugging Face Hub
Functions
get_model
def get_model(model, logger, temperature=0.7, path_to_local_weights=None
model: LLMModel
logger: Logger
temperature: float = 0.7,
path_to_local_weights: os.PathLike[Any] | str | None = None
) -> OpenAIGenerator | LlamaCppGenerator
Get an interface for interacting with an LLM.
If a path to a .gguf
model file is provided via path_to_local_weights
, the model is loaded locally using a LlamaCppGenerator
. In this case, no remote download or API requests will take place.
If no local path is provided, the function uses Haystack Generators to provide an interface to a model.
If the model_name
is a GPT, then the interface is to a remote OpenAI model.
Otherwise, uses a LlamaCppGenerator
to start a llama.cpp model and provide an interface.
Parameters
Parameter | Type | Description |
---|---|---|
model | LLMModel | An enum representing the desired model. The enum’s value should match one of the registered model names (e.g., "llama-3.1-8b" or "gpt-4" ). |
logger | logging.Logger | Logger instance for tracking progress and errors. |
temperature | float | Controls the randomness of the output. Higher values (e.g., 1.0) make output more diverse, while lower values (e.g., 0.2) make it more deterministic. Defaults to 0.7. |
path_to_local_weights | os.PathLike , str , or None | Path to a local .gguf model weights file. If provided, the function skips remote model loading and uses this local file. If not provided, the function will attempt to load the model from Hugging Face or connect to OpenAI. |
Returns
OpenAIGenerator
or LlamaCppGenerator
A LLM text generation interface compatible with Haystack’s component framework.
Implemented models
Model name | Summary |
---|---|
llama-3.1-8b | Recommended Meta’s Llama 3.1 with 8 billion parameters, quantized to 4 bits |
llama-2-7b-chat | Meta’s Llama 2 with 7 billion parameters, quantized to 4 bits |
llama-3-8b | Meta’s Llama 3 with 8 billion parameters, quantized to 4 bits |
llama-3-70b | Meta’s Llama 3 with 70 billion parameters, quantized to 4 bits |
gemma-7b | Google’s Gemma with 7 billion parameters, quantized to 4 bits |
llama-3.2-3b | Meta’s Llama 3.2 with 3 billion parameters, quantized to 6 bits |
mistral-7b | Mistral at 7 billion parameters, quantized to 4 bits |
kuchiki-l2-7b | A merge of several models at 7 billion parameters, quantized to 4 bits |
tinyllama-1.1b-chat | Llama 2 extensively pre-trained, with 1.1 billion parameters, quantized to 4 bits |
biomistral-7b | Mistral at 7 billion parameters, pre-trained on biomedical data, qunatized to 4 bits |
qwen2.5-3b-instruct | Alibaba’s Qwen 2.5 at 3 billion parameters, quantized to 5 bits |
airoboros-3b | Llama 2 pre-trained on the airoboros 3.0 dataset at 3 billion parameters, quantized to 4 bits |
medicine-chat | Llama 2 pre-trained on medical data, quantized to 4 bits |
medicine-llm-13b | Llama pre-trained on medical data at 13 billion parameters, quantized to 4 bits |
med-llama-3-8b-v1 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 5 bits |
med-llama-3-8b-v2 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 4 bits |
med-llama-3-8b-v3 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits |
med-llama-3-8b-v4 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits |
If you would like to add a model, raise an issue