Command Line Interface
The lettuce
CLI provides command-line access to all lettuce
components, allowing you to convert informal drug names to standardized OMOP concepts through a simple terminal interface.
Usage
Navigate to the /lettuce
directory
Run the CLI with uv
uv run --env-file .env lettuce-cli --informal_names "betnovate scalp application"
Examples
The following examples demonstrate common usage patterns for the Lettuce CLI.
Multiple informal names
uv run --env-file .env lettuce-cli \
--informal_names "betnovate scalp application" "metformin 500mg"
With vector search and LLM inference
uv run --env-file .env lettuce-cli \
--informal_names "betnovate scalp application" \
--vector_search \
--use_llm
Using a specific medical LLM
uv run --env-file .env lettuce-cli \
--informal_names "betnovate scalp application" \
--use_llm \
--llm_model MED_LLAMA_3_8B_V4
Arguments
--informal_names
(Required)
Provide one or more strings for Lettuce to infer concepts for. You can supply multiple names in a single command:
--informal_names "betnovate scalp application" "metformin 500mg"
--vector_search
, --no-vector_search
Default: --vector_search
(enabled)
Choose whether to perform vector search on supplied informal names.
Vector search requires a embeddings vector table to exist in your omop database.
--use_llm
, --no-use_llm
Default: --use_llm
(enabled)
Choose whether to have an LLM infer an OMOP concept for supplied informal names.
LLM usage will download the model weights, which requires ~5 GB disk space.
--temperature
Default: 0.0
Controls LLM generation randomness. A value of 0.0 produces deterministic results, while higher values (e.g., 0.9) produce more varied outputs.
--vocabulary_id
Default: None
The OMOP vocabularies to query in the OMOP-CDM database. You can specify vocabularies in two ways:
# As separate arguments
--vocabulary_id "RxNorm" "RxNorm Extension"
# OR as a comma-separated list
--vocabulary_id "RxNorm,RxNorm Extension"
Common vocabularies include:
- RxNorm
- RxNorm Extension
- SNOMED
- ICD10
- LOINC
--embed-vocab
Default: None
Vocabulary IDs for embedding filtering. You can specify vocabularies in two ways:
# As separate arguments
--embed-vocab "RxNorm" "SNOMED"
# OR as a comma-separated list
--embed-vocab "RxNorm,SNOMED"
--standard-concept
Default: False
Whether to filter output by the standard_concept field of the concept table.
--concept_ancestor
, --no-concept_ancestor
Default: --no-concept_ancestor
(disabled)
Controls whether to query the concept_ancestor table for hierarchical relationships.
--concept_relationship
, --no-concept_relationship
Default: --no-concept_relationship
(disabled)
Controls whether to query the concept_relationship table.
--max_separation_ancestor
Default: 1
The maximum levels of separation to return ancestors from the concept_ancestor query.
--max_separation_descendants
Default: 1
The maximum levels of separation to return descendants from the concept_ancestor query.
--concept_synonym
, --no-concept_synonym
Default: --no-concept_synonym
(disabled)
Controls whether to return concept synonyms from the OMOP-CDM query.
--search_threshold
Default: 80
(equivalent to 0.8)
The fuzzy match threshold to use when filtering OMOP concept names. Values range from 0 to 100, where 100 requires an exact match.
--embedding-top-k
Default: 5
Number of suggestions to return from vector search for RAG (Retrieval-Augmented Generation).
--embedding-top-k
Default: 5
Number of suggestions to return from vector search for RAG (Retrieval-Augmented Generation).
--llm_model
Default: LLAMA_3_1_8B
The LLM used to infer concepts from an informal name. Available options:
General-purpose models:
- LLAMA_2_7B
- LLAMA_3_8B
- LLAMA_3_70B
- GEMMA_7BL
- LLAMA_3_1_8B
- LLAMA_3_2_3B
- MISTRAL_7B
- KUCHIKI_L2_7B
- TINYLLAMA_1_1B_CHAT
- QWEN2_5_3B_INSTRUCT
- AIROBOROS_3B
Medical-specialized models:
- BIOMISTRAL_7B
- MEDICINE_CHAT
- MEDICINE_LLM_13B
- MED_LLAMA_3_8B_V1
- MED_LLAMA_3_8B_V2
- MED_LLAMA_3_8B_V3
- MED_LLAMA_3_8B_V4
--embedding_model
Default: BGESMALL
The model used to generate embeddings for vector search. Available options:
- BGESMALL
- MINILM
- GTR_T5_BASE
- GTR_T5_LARGE
- E5_BASE
- E5_LARGE
- DISTILBERT_BASE_UNCASED
- DISTILUSE_BASE_MULTILINGUAL
- CONTRIEVER
The model used to generate query embeddings and used to build the vector database must be the same or the vector search will be nonsense.
Output Format
The CLI prints results to the console in a structured JSON format (under revision). Each query produces a comprehensive result object with vector search results, LLM matching, and OMOP concept data:
[
{
"query": "acetaminophen",
"Vector Search Results": [
{"content": "Acetaminophen", "score": 5.960464815046862e-07},
// Additional vector search results omitted for brevity
],
"llm_answer": "Acetaminophen",
"OMOP fuzzy threshold": 80,
"OMOP matches": {
"search_term": "Acetaminophen",
"CONCEPT": [
{
"concept_name": "acetaminophen",
"concept_id": 1125315,
"vocabulary_id": "RxNorm",
"concept_code": "161",
"concept_name_similarity_score": 100.0,
"CONCEPT_SYNONYM": [],
"CONCEPT_ANCESTOR": [],
"CONCEPT_RELATIONSHIP": []
},
{
"concept_name": "Acetaminophen Jr",
"concept_id": 19052416,
"vocabulary_id": "RxNorm",
"concept_code": "214962",
"concept_name_similarity_score": 89.65,
"CONCEPT_SYNONYM": [],
"CONCEPT_ANCESTOR": [],
"CONCEPT_RELATIONSHIP": []
}
// Additional concept matches omitted for brevity
]
}
},
{
"query": "codeine",
"Vector Search Results": [
{"content": "Codeine", "score": 4.76837158203125e-07},
// Additional vector search results omitted for brevity
],
"llm_answer": "Codeine",
"OMOP fuzzy threshold": 80,
"OMOP matches": {
"search_term": "Codeine",
"CONCEPT": [
{
"concept_name": "codeine",
"concept_id": 1201620,
"vocabulary_id": "RxNorm",
"concept_code": "2670",
"concept_name_similarity_score": 100.0,
"CONCEPT_SYNONYM": [],
"CONCEPT_ANCESTOR": [],
"CONCEPT_RELATIONSHIP": []
}
]
}
}
]