components.models

source

This module provides functionality to load the user specified model as part of the components.pipeline.LLMPipeline. It supports both local model inference via llama.cpp and remote inference via OpenAI’s API.

The module manages LLM initialisation and selection to power the drug name standardisation pipeline, with support for:

  • Local model inference using quantized GGUF models
  • Remote inference using OpenAI models
  • Automatic model downloading from Hugging Face Hub

Functions

get_model

def get_model(model, logger, temperature=0.7, path_to_local_weights=None
	model: LLMModel
	logger: Logger  
	temperature: float = 0.7,  
	path_to_local_weights: os.PathLike[Any] | str | None = None 
) -> OpenAIGenerator | LlamaCppGenerator

Get an interface for interacting with an LLM.

If a path to a .gguf model file is provided via path_to_local_weights, the model is loaded locally using a LlamaCppGenerator. In this case, no remote download or API requests will take place.

If no local path is provided, the function uses Haystack Generators to provide an interface to a model. If the model_name is a GPT, then the interface is to a remote OpenAI model. Otherwise, uses a LlamaCppGenerator to start a llama.cpp model and provide an interface.

Parameters

ParameterTypeDescription
modelLLMModelAn enum representing the desired model. The enum’s value should match one of the registered model names (e.g., "llama-3.1-8b" or "gpt-4").
loggerlogging.LoggerLogger instance for tracking progress and errors.
temperaturefloatControls the randomness of the output. Higher values (e.g., 1.0) make output more diverse, while lower values (e.g., 0.2) make it more deterministic. Defaults to 0.7.
path_to_local_weightsos.PathLike, str, or NonePath to a local .gguf model weights file. If provided, the function skips remote model loading and uses this local file. If not provided, the function will attempt to load the model from Hugging Face or connect to OpenAI.

Returns

OpenAIGenerator or LlamaCppGenerator

A LLM text generation interface compatible with Haystack’s component framework.

Implemented models

Model nameSummary
llama-3.1-8bRecommended Meta’s Llama 3.1 with 8 billion parameters, quantized to 4 bits
llama-2-7b-chatMeta’s Llama 2 with 7 billion parameters, quantized to 4 bits
llama-3-8bMeta’s Llama 3 with 8 billion parameters, quantized to 4 bits
llama-3-70bMeta’s Llama 3 with 70 billion parameters, quantized to 4 bits
gemma-7bGoogle’s Gemma with 7 billion parameters, quantized to 4 bits
llama-3.2-3bMeta’s Llama 3.2 with 3 billion parameters, quantized to 6 bits
mistral-7bMistral at 7 billion parameters, quantized to 4 bits
kuchiki-l2-7bA merge of several models at 7 billion parameters, quantized to 4 bits
tinyllama-1.1b-chatLlama 2 extensively pre-trained, with 1.1 billion parameters, quantized to 4 bits
biomistral-7bMistral at 7 billion parameters, pre-trained on biomedical data, qunatized to 4 bits
qwen2.5-3b-instructAlibaba’s Qwen 2.5 at 3 billion parameters, quantized to 5 bits
airoboros-3bLlama 2 pre-trained on the airoboros 3.0 dataset at 3 billion parameters, quantized to 4 bits
medicine-chatLlama 2 pre-trained on medical data, quantized to 4 bits
medicine-llm-13bLlama pre-trained on medical data at 13 billion parameters, quantized to 4 bits
med-llama-3-8b-v1Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 5 bits
med-llama-3-8b-v2Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 4 bits
med-llama-3-8b-v3Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits
med-llama-3-8b-v4Llama 3 at 8 billion parameters, pre-trained on medical data, quantized to 3 bits

If you would like to add a model, raise an issue