Database Setup for Lettuce
Lettuce requires a PostgreSQL database with the OMOP CDM (Common Data Model) schema, along with specific extensions for full-text search and vector similarity capabilities.
Prerequisites
Required PostgreSQL Extensions
- Full-text search: Built-in PostgreSQL capability for text searching
- pgvector: Extension for vector similarity searches
Required OMOP Tables
Lettuce expects the following tables from the OMOP CDM v5.0:
Table | Purpose | Key Columns |
---|---|---|
concept | Core vocabulary table containing all medical concepts | concept_id , concept_name , vocabulary_id , concept_code , standard_concept |
concept_synonym | Alternative names for concepts | concept_id , concept_synonym_name , language_concept_id |
concept_relationship | Relationships between concepts | concept_id_1 , concept_id_2 , relationship_id , valid_start_date , valid_end_date |
concept_ancestor | Hierarchical relationships between concepts | ancestor_concept_id , descendant_concept_id , min_levels_of_separation , max_levels_of_separation |
Custom Table for Vector Search
To enable semantic similarity search, you’ll need to generate vector embeddings for concepts using a supported model defined in components.embeddings
.
The embedding model used to generate your vector table must match the model specified in your Lettuce configuration. If they differ, semantic search will return inaccurate results or throw an error.
Setup Options
You can use either an existing OMOP-compliant database, or set up a containerized instance for development and testing.
Option 1: Using an Existing OMOP Database
If you have an existing Postgres instance running an OMOP-CDM database, it needs some modifications to serve as a database for Lettuce.
Full-text search
The only way the text search features can work with a reasonable latency is to materialise a column for tsvector
s and index on that column, otherwise the database has to effectively compute the column each time the query is run.
Lettuce assumes this column is present.
To create the column:
ALTER TABLE {SCHEMA_NAME}.concept
ADD COLUMN concept_name_tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', concept_name)) STORED;
Then the index:
CREATE INDEX idx_concept_fts ON {SCHEMA_NAME}.concept USING GIN (concept_name_tsv);
Embeddings
Lettuce uses the pgvector postgres extension for semantic search. To install pgvector, follow their installation instructions
Then:
CREATE EXTENSION IF NOT EXISTS vector;
If you have a parquet file containing embeddings, you can copy them to a table in your cdm schema for Lettuce to read from. Example code for this can be found in omop-lite
Code for reading the concept table from an existing database, then encoding the concepts is a work in progress. It works, but is highly experimental.
Option 2: Set Up a Containerized Database with omop-ts
Use the omop-lite
repository to quickly spin up a local OMOP-CDM database.
Clone the omop-lite
repository
Clone the repository to your local machine:
git clone git@github.com:Health-Informatics-UoN/omop-lite.git
cd omop-lite
Prepare data directory
Create a data
directory at the root level of the repo:
mkdir -p ./data
Copy your OMOP .csv
files (e.g., CONCEPT.csv
, CONCEPT_SYNONYM.csv
) into this folder.
For testing, minimal example files are provided in omop_lite/synthetic
.
Prepare embeddings
Building omop-lite assumes you have pre-computed embeddings stored in a parquet file in omop-lite/text-search/embeddings.parquet
.
An example file is provided in the omop-lite repository.
This parquet file holds a table with concept IDs in the first column, and embedding vectors in the third.
If you wish to create embeddings online, the embeddings table can be dropped and rebuilt. If using omop-lite, still run with the text-search profile.
Build and start the omop-ts
docker container
Start the containerized OMOP database using Docker Compose:
docker compose --profile text-search up -d
This will launch a PostgreSQL instance with the necessary schema and extensions pre-configured.
Verify
Verify that the database was properly populated by running the following command:
docker exec omop-ts-omop-ts-1 psql -U postgres -c "SELECT * FROM cdm.concept LIMIT 5;"
You should see output showing the first 5 rows from the concept table, confirming the database is set up correctly.
Next Steps
Once your database is running, configure Lettuce to connect by specifying the connection details in your .env
file, as shown in the Quickstart guide.