components.models
This module provides functionality to load the user specified model as part of the components.pipeline.LLMPipeline
.
It supports both local model inference via llama.cpp or Ollama and remote inference via OpenAI’s API.
The module manages LLM initialisation and selection to power the source term standardisation pipeline, with support for:
- Local model inference using quantized GGUF models
- Remote inference using OpenAI models
- Automatic model downloading from Hugging Face Hub
- Ollama server integration for local model serving
Functions
get_local_weights
def get_local_weights(
path_to_weights: os.PathLike | str | None,
temperature: float,
logger: logging.Logger,
verbose: bool
) -> LlamaCppGenerator:
Load a local GGUF model weights file and return a LlamaCppGenerator object.
Parameters
Parameter | Type | Description |
---|---|---|
path_to_weights | os.PathLike | str | None | The full path to the local GGUF model weights file (e.g., “/path/to/llama-2-7b-chat.Q4_0.gguf”). |
temperature | float | The temperature for model generation |
logger | logging.Logger | Logger instance for tracking progress and errors. |
verbose | bool | If true, the generator logs information about loading weights and generation |
Returns
LlamaCppGenerator
A loaded LlamaCppGenerator object ready for inference.
Raises
FileNotFoundError
If the specified file_path does not exist or is not a file.
download_model_from_huggingface
def download_model_from_huggingface(
model_name: str,
temperature: float,
logger: logging.Logger,
verbose: bool,
fallback_model: str = "llama-3.1-8b",
n_ctx: int = 1024,
n_batch: int = 32,
max_tokens: int = 128
) -> LlamaCppGenerator:
Load GGUF model weights from a hugging face repository.
Parameters
Parameter | Type | Description |
---|---|---|
model_name | str | The name of a model with repository details in the local_models dictionary |
temperature | float | The temperature for model generation |
logger | logging.Logger | Logger instance for tracking progress and errors. |
verbose | bool | If true, the generator logs information about loading weights and generation |
fallback_model | str | If the model name that’s specified is not in the local_models dictionary, loads this one instead. Defaults to llama-3.1-8b |
n_ctx | int | Context size for the model |
n_batch | int | Number of tokens sent to the model in each batch. Defaults to 32 |
max_tokens | int | Maximum tokens to generate. Defaults to 128. |
Returns
LlamaCppGenerator
A loaded LlamaCppGenerator object ready for inference.
Raises
ValueError
If the model fails to download or initialize.
connect_to_openai
def connect_to_openai(
model_name: str,
temperature: float,
logger: logging.Logger,
) -> OpenAIGenerator:
Connect to OpenAI API and return an OpenAIGenerator object.
Parameters
Parameter | Type | Description |
---|---|---|
model_name | str | The name of the OpenAI model (e.g., “gpt-4”, “gpt-3.5-turbo”) |
temperature | float | The temperature for model generation |
logger | logging.Logger | Logger instance for tracking progress and errors. |
Returns
OpenAIGenerator
A configured OpenAIGenerator object for API-based inference.
connect_to_ollama
def connect_to_ollama(
model_name: str,
url: str,
temperature: float,
logger: logging.Logger,
max_tokens: int = 128,
) -> OllamaGenerator:
Connect to an Ollama server and return an OllamaGenerator object.
Parameters
Parameter | Type | Description |
---|---|---|
model_name | str | The name of the Ollama model to use |
url | str | The URL of the Ollama server |
temperature | float | The temperature for model generation |
logger | logging.Logger | Logger instance for tracking progress and errors. |
max_tokens | int | Maximum number of tokens to generate. Defaults to 128. |
Returns
OllamaGenerator
A configured OllamaGenerator object for Ollama server-based inference.
Raises
Exception
If connection to the Ollama server fails or the model is not available.
get_model
def get_model(
model: LLMModel,
logger: logging.Logger,
inference_type: InferenceType,
url: str,
temperature: float = 0.7,
path_to_local_weights: os.PathLike[Any] | str | None = None,
verbose: bool = False,
) -> OpenAIGenerator | LlamaCppGenerator | OllamaGenerator:
Get an interface for interacting with an LLM.
If a path to a .gguf
model file is provided via path_to_local_weights
, the model is loaded locally using a LlamaCppGenerator
. In this case, no remote download or API requests will take place.
If no local path is provided, the function uses Haystack Generators to provide an interface to a model based on the inference_type
:
OPEN_AI
: Creates an interface to a remote OpenAI modelOLLAMA
: Creates an interface to an Ollama server- Other types: Uses a
LlamaCppGenerator
to start a llama.cpp model and provide an interface
Parameters
Parameter | Type | Description |
---|---|---|
model | LLMModel | An enum representing the desired model. The enum’s value should match one of the registered model names (e.g., "llama-3.1-8b" or "gpt-4" ). |
logger | logging.Logger | Logger instance for tracking progress and errors. |
inference_type | InferenceType | Whether to use Llama.cpp, Ollama, or the OpenAI API for inference |
url | str | The URL for the Ollama server (only used when inference_type is OLLAMA) |
temperature | float | Controls the randomness of the output. Higher values (e.g., 1.0) make output more diverse, while lower values (e.g., 0.2) make it more deterministic. Defaults to 0.7. |
path_to_local_weights | os.PathLike[Any] | str | None | Path to a local .gguf model weights file. If provided, the function skips remote model loading and uses this local file. If not provided, the function will attempt to load the model from Hugging Face or connect to OpenAI. |
verbose | bool | If true, the generator logs information about loading weights and generation. Defaults to False |
Returns
OpenAIGenerator | LlamaCppGenerator | OllamaGenerator
A LLM text generation interface compatible with Haystack’s component framework.
Implemented models
Model name | Summary |
---|---|
llama-3.1-8b | Recommended Meta’s Llama 3.1 with 8 billion parameters, quantized to 4 bits |
llama-2-7b-chat | Meta’s Llama 2 with 7 billion parameters, quantized to 4 bits |
llama-3-8b | Meta’s Llama 3 with 8 billion parameters, quantized to 4 bits |
llama-3-70b | Meta’s Llama 3 with 70 billion parameters, quantized to 4 bits |
gemma-7b | Google’s Gemma with 7 billion parameters, quantized to 4 bits |
llama-3.2-3b | Meta’s Llama 3.2 with 3 billion parameters, quantized to 6 bits |
mistral-7b | Mistral at 7 billion parameters, quantized to 4 bits |
kuchiki-l2-7b | A merge of several models at 7 billion parameters, quantized to 4 bits |
tinyllama-1.1b-chat | Llama 2 extensively pre-trained, with 1.1 billion parameters, quantized to 4 bits |
biomistral-7b | Mistral at 7 billion parameters, pre-trained on biomedical data, quantized to 4 bits |
qwen2.5-3b-instruct | Alibaba’s Qwen 2.5 at 3 billion parameters, quantized to 5 bits |
airoboros-3b | Llama 2 pre-trained on the airoboros 3.0 dataset at 3 billion parameters, quantized to 4 bits |
medicine-chat | Llama 2 pre-trained on medical data, quantized to 4 bits |
medicine-llm-13b | Llama pre-trained on medical data at 13 billion parameters, quantized to 3 bits |
med-llama-3-8b-v1 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 5 bits |
med-llama-3-8b-v2 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 4 bits |
med-llama-3-8b-v3 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits |
med-llama-3-8b-v4 | Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits |
If you would like to add a model, raise an issue