Database Setup for Lettuce
Lettuce requires a PostgreSQL database with the OMOP CDM (Common Data Model) schema, along with specific extensions for full-text search and vector similarity capabilities.
Prerequisites
Required PostgreSQL Extensions
- Full-text search: Built-in PostgreSQL capability for text searching
- pgvector: Extension for vector similarity searches
Required OMOP Tables
Lettuce expects the following tables from the OMOP CDM v5.0 vocabulary:
Table | Purpose | Key Columns |
---|---|---|
concept | Core vocabulary table containing all medical concepts | concept_id , concept_name , vocabulary_id , concept_code , standard_concept |
concept_synonym | Alternative names for concepts | concept_id , concept_synonym_name , language_concept_id |
concept_relationship | Relationships between concepts | concept_id_1 , concept_id_2 , relationship_id , valid_start_date , valid_end_date |
concept_ancestor | Hierarchical relationships between concepts | ancestor_concept_id , descendant_concept_id , min_levels_of_separation , max_levels_of_separation |
Custom Table for Vector Search
To enable semantic similarity search, you’ll need to generate vector embeddings for concepts using a supported model defined in components.embeddings
.
The embedding model used to generate your vector table must match the model specified in your Lettuce configuration. If they differ, semantic search will return inaccurate or empty results.
Here are some common models and their expected embedding dimensions:
- BGE-small: 384
- GTR-T5-base: 768
Setup Options
You can use either an existing OMOP-compliant database, or set up a containerized instance for development and testing.
Option 1: Using an Existing OMOP Database
If you already have access to a PostgreSQL instance with the required OMOP tables and extensions, you can proceed directly to Next Steps.
Option 2: Set Up a Containerized Database with omop-ts
Use the omop-lite
repository to quickly spin up a local OMOP-compatible database.
Clone the omop-lite
repository
Clone the repository to your local machine:
git clone git@github.com:Health-Informatics-UoN/omop-lite.git
cd omop-lite
Prepare data directory
Create a data
directory at the root level of the repo:
mkdir -p ./data
Copy your OMOP .csv
files (e.g., CONCEPT.csv
, CONCEPT_SYNONYM.csv
) into this folder.
For testing, minimal example files are provided in omop_lite/synthetic
.
Build and start the omop-ts
docker container
Start the containerized OMOP database using Docker Compose:
docker compose -f compose-omop-ts.yml up -d
This will launch a PostgreSQL instance with the necessary schema and extensions pre-configured.
Verify
Verify that the database was properly populated by running the following command:
docker exec omop-ts-omop-ts-1 psql -U postgres -c "SELECT * FROM cdm.concept LIMIT 5;"
You should see output showing the first 5 rows from the concept table, confirming the database is set up correctly.
Next Steps
Once your database is running, configure Lettuce to connect by specifying the connection details in your .env
file, as shown in the Quickstart guide.
If you’re using the containerised omop-ts
setup, the corresponding .env
file would look like:
DB_HOST="localhost"
DB_USER="postgres"
DB_PASSWORD="password"
DB_NAME="omop"
DB_PORT="5432"
DB_SCHEMA="cdm"
DB_VECTABLE="embeddings"
DB_VECSIZE=384