components.embeddings
EmbeddingModelName
class EmbeddingModelName()
This class enumerates the embedding models we have the download details for.
Model | Version | Dimensions | Summary |
---|---|---|---|
Bidirectional Gated Encoder | Small | 384 | Efficient sentence embeddings for semantic similarity tasks. |
Sentence-BERT | MiniLM | 384 | Compact, optimized for sentence embeddings and semantic tasks. |
Generalizable T5 Retrieval | Base | 768 | Dual encoder for scalable, general-purpose semantic search. |
Generalizable T5 Retrieval | Large | 1024 | Enhanced version of GTR-T5 Base, ideal for large-scale tasks. |
Embedding Models for Search | Base | 768 | Dense multilingual embeddings for semantic search and retrieval. |
Embedding Models for Search | Large | 1024 | Larger model offering improved cross-domain retrieval performance. |
DistilBERT | Base Uncased | 768 | Smaller, faster BERT variant retaining high performance. |
DistilUSE | Base Multilingual | 512 | Efficient multilingual embeddings for cross-lingual tasks. |
Contriever | Contriever | 768 | Unsupervised dense retrieval model for zero-shot semantic search. |
EmbeddingModelInfo
class EmbeddingModelInfo()
A simple class to hold the information for embeddings models
EmbeddingModel
class EmbeddingModel()
A class to match the name of an embeddings model with the details required to download and use it.
get_embedding_model
def get_embedding_model(
name: EmbeddingModelName
)
Collects the details of an embedding model when given its name
Parameters
name: EmbeddingModelName
The name of an embedding model we have the details for
Returns
EmbeddingModel
An EmbeddingModel object containing the name and the details used
Embeddings
class Embeddings(
embeddings_path: str
force_rebuild: bool
embed_vocab: str[List]
model_name: EmbeddingModelName
search_kwargs: dict
)
This class allows the building or loading of a vector database of concept names. This database can then be used for vector search.
Methods
__init__
method __init__(
embeddings_path: str
force_rebuild: bool
embed_vocab: str[List]
model_name: EmbeddingModelName
search_kwargs: dict
)
Initialises the connection to an embeddings database
Parameters
embeddings_path: str
A path for the embeddings database. If one is not found,
it will be built, which takes a long time. This is built
from concepts fetched from the OMOP database.
force_rebuild: bool
If true, the embeddings database will be rebuilt.
embed_vocab: List[str]
A list of OMOP vocabulary_ids. If the embeddings database is
built, these will be the vocabularies used in the OMOP query.
model: EmbeddingModel
The model used to create embeddings.
search_kwargs: dict
kwargs for vector search.
_build_embeddings
def _build_embeddings()
Build a vector database of embeddings
_load_embeddings
def _load_embeddings()
If available, load a vector database of concept embeddings
get_embedder
def get_embedder()
Get an embedder for queries in LLM pipelines
Returns
FastembedTextEmbedder
get_retriever
def get_retriever()
Get a retriever for LLM pipelines
Returns
QdrantEmbeddingRetriever
search
def search(
query: str[List]
)
Search the attached vector database with a list of informal medications
Parameters
query: List[str]
A list of informal medication names
Returns
List[List[Dict[str, Any]]]
For each medication in the query, the result of searching the vector database