components.models

source

This module provides functionality to load the user specified model as part of the components.pipeline.LLMPipeline. It supports both local model inference via llama.cpp or Ollama and remote inference via OpenAI’s API.

The module manages LLM initialisation and selection to power the source term standardisation pipeline, with support for:

  • Local model inference using quantized GGUF models
  • Remote inference using OpenAI models
  • Automatic model downloading from Hugging Face Hub
  • Ollama server integration for local model serving

Functions

get_local_weights

def get_local_weights(
    path_to_weights: os.PathLike | str | None, 
    temperature: float, 
    logger: logging.Logger,
    verbose: bool
) -> LlamaCppGenerator:

Load a local GGUF model weights file and return a LlamaCppGenerator object.

Parameters

ParameterTypeDescription
path_to_weightsos.PathLike | str | NoneThe full path to the local GGUF model weights file (e.g., “/path/to/llama-2-7b-chat.Q4_0.gguf”).
temperaturefloatThe temperature for model generation
loggerlogging.LoggerLogger instance for tracking progress and errors.
verboseboolIf true, the generator logs information about loading weights and generation

Returns

LlamaCppGenerator

A loaded LlamaCppGenerator object ready for inference.

Raises

FileNotFoundError

If the specified file_path does not exist or is not a file.

download_model_from_huggingface

def download_model_from_huggingface(
    model_name: str, 
    temperature: float, 
    logger: logging.Logger, 
    verbose: bool,
    fallback_model: str = "llama-3.1-8b",
    n_ctx: int = 1024,
    n_batch: int = 32,
    max_tokens: int = 128 
) -> LlamaCppGenerator:

Load GGUF model weights from a hugging face repository.

Parameters

ParameterTypeDescription
model_namestrThe name of a model with repository details in the local_models dictionary
temperaturefloatThe temperature for model generation
loggerlogging.LoggerLogger instance for tracking progress and errors.
verboseboolIf true, the generator logs information about loading weights and generation
fallback_modelstrIf the model name that’s specified is not in the local_models dictionary, loads this one instead. Defaults to llama-3.1-8b
n_ctxintContext size for the model
n_batchintNumber of tokens sent to the model in each batch. Defaults to 32
max_tokensintMaximum tokens to generate. Defaults to 128.

Returns

LlamaCppGenerator

A loaded LlamaCppGenerator object ready for inference.

Raises

ValueError

If the model fails to download or initialize.

connect_to_openai

def connect_to_openai(
    model_name: str, 
    temperature: float, 
    logger: logging.Logger,
) -> OpenAIGenerator:

Connect to OpenAI API and return an OpenAIGenerator object.

Parameters

ParameterTypeDescription
model_namestrThe name of the OpenAI model (e.g., “gpt-4”, “gpt-3.5-turbo”)
temperaturefloatThe temperature for model generation
loggerlogging.LoggerLogger instance for tracking progress and errors.

Returns

OpenAIGenerator

A configured OpenAIGenerator object for API-based inference.

connect_to_ollama

def connect_to_ollama(
    model_name: str,
    url: str,
    temperature: float,
    logger: logging.Logger,
    max_tokens: int = 128,
) -> OllamaGenerator:

Connect to an Ollama server and return an OllamaGenerator object.

Parameters

ParameterTypeDescription
model_namestrThe name of the Ollama model to use
urlstrThe URL of the Ollama server
temperaturefloatThe temperature for model generation
loggerlogging.LoggerLogger instance for tracking progress and errors.
max_tokensintMaximum number of tokens to generate. Defaults to 128.

Returns

OllamaGenerator

A configured OllamaGenerator object for Ollama server-based inference.

Raises

Exception

If connection to the Ollama server fails or the model is not available.

get_model

def get_model(
    model: LLMModel, 
    logger: logging.Logger, 
    inference_type: InferenceType,
    url: str,
    temperature: float = 0.7, 
    path_to_local_weights: os.PathLike[Any] | str | None = None,
    verbose: bool = False,
) -> OpenAIGenerator | LlamaCppGenerator | OllamaGenerator:

Get an interface for interacting with an LLM.

If a path to a .gguf model file is provided via path_to_local_weights, the model is loaded locally using a LlamaCppGenerator. In this case, no remote download or API requests will take place.

If no local path is provided, the function uses Haystack Generators to provide an interface to a model based on the inference_type:

  • OPEN_AI: Creates an interface to a remote OpenAI model
  • OLLAMA: Creates an interface to an Ollama server
  • Other types: Uses a LlamaCppGenerator to start a llama.cpp model and provide an interface

Parameters

ParameterTypeDescription
modelLLMModelAn enum representing the desired model. The enum’s value should match one of the registered model names (e.g., "llama-3.1-8b" or "gpt-4").
loggerlogging.LoggerLogger instance for tracking progress and errors.
inference_typeInferenceTypeWhether to use Llama.cpp, Ollama, or the OpenAI API for inference
urlstrThe URL for the Ollama server (only used when inference_type is OLLAMA)
temperaturefloatControls the randomness of the output. Higher values (e.g., 1.0) make output more diverse, while lower values (e.g., 0.2) make it more deterministic. Defaults to 0.7.
path_to_local_weightsos.PathLike[Any] | str | NonePath to a local .gguf model weights file. If provided, the function skips remote model loading and uses this local file. If not provided, the function will attempt to load the model from Hugging Face or connect to OpenAI.
verboseboolIf true, the generator logs information about loading weights and generation. Defaults to False

Returns

OpenAIGenerator | LlamaCppGenerator | OllamaGenerator

A LLM text generation interface compatible with Haystack’s component framework.

Implemented models

Model nameSummary
llama-3.1-8bRecommended Meta’s Llama 3.1 with 8 billion parameters, quantized to 4 bits
llama-2-7b-chatMeta’s Llama 2 with 7 billion parameters, quantized to 4 bits
llama-3-8bMeta’s Llama 3 with 8 billion parameters, quantized to 4 bits
llama-3-70bMeta’s Llama 3 with 70 billion parameters, quantized to 4 bits
gemma-7bGoogle’s Gemma with 7 billion parameters, quantized to 4 bits
llama-3.2-3bMeta’s Llama 3.2 with 3 billion parameters, quantized to 6 bits
mistral-7bMistral at 7 billion parameters, quantized to 4 bits
kuchiki-l2-7bA merge of several models at 7 billion parameters, quantized to 4 bits
tinyllama-1.1b-chatLlama 2 extensively pre-trained, with 1.1 billion parameters, quantized to 4 bits
biomistral-7bMistral at 7 billion parameters, pre-trained on biomedical data, quantized to 4 bits
qwen2.5-3b-instructAlibaba’s Qwen 2.5 at 3 billion parameters, quantized to 5 bits
airoboros-3bLlama 2 pre-trained on the airoboros 3.0 dataset at 3 billion parameters, quantized to 4 bits
medicine-chatLlama 2 pre-trained on medical data, quantized to 4 bits
medicine-llm-13bLlama pre-trained on medical data at 13 billion parameters, quantized to 3 bits
med-llama-3-8b-v1Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 5 bits
med-llama-3-8b-v2Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 4 bits
med-llama-3-8b-v3Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits
med-llama-3-8b-v4Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits

If you would like to add a model, raise an issue