Database Setup for Lettuce

Lettuce requires a PostgreSQL database with the OMOP CDM (Common Data Model) schema, along with specific extensions for full-text search and vector similarity capabilities.

Prerequisites

Required PostgreSQL Extensions

Full-text search: Built-in PostgreSQL capability for text searching
pgvector: Extension for vector similarity searches

Required OMOP Tables

Lettuce expects the following tables from the OMOP CDM v5.0 vocabulary:

Table	Purpose	Key Columns
`concept`	Core vocabulary table containing all medical concepts	`concept_id`, `concept_name`, `vocabulary_id`, `concept_code`, `standard_concept`
`concept_synonym`	Alternative names for concepts	`concept_id`, `concept_synonym_name`, `language_concept_id`
`concept_relationship`	Relationships between concepts	`concept_id_1`, `concept_id_2`, `relationship_id`, `valid_start_date`, `valid_end_date`
`concept_ancestor`	Hierarchical relationships between concepts	`ancestor_concept_id`, `descendant_concept_id`, `min_levels_of_separation`, `max_levels_of_separation`

Custom Table for Vector Search

To enable semantic similarity search, you’ll need to generate vector embeddings for concepts using a supported model defined in components.embeddings.

The embedding model used to generate your vector table must match the model specified in your Lettuce configuration. If they differ, semantic search will return inaccurate or empty results.

Here are some common models and their expected embedding dimensions:

BGE-small: 384
GTR-T5-base: 768

Setup Options

You can use either an existing OMOP-compliant database, or set up a containerized instance for development and testing.

Option 1: Using an Existing OMOP Database

If you already have access to a PostgreSQL instance with the required OMOP tables and extensions, you can proceed directly to Next Steps.

Option 2: Set Up a Containerized Database with `omop-ts`

Use the omop-lite repository to quickly spin up a local OMOP-compatible database.

Clone the `omop-lite` repository

Clone the repository to your local machine:

git clone git@github.com:Health-Informatics-UoN/omop-lite.git
cd omop-lite

Prepare data directory

Create a data directory at the root level of the repo:

mkdir -p ./data

Copy your OMOP .csv files (e.g., CONCEPT.csv, CONCEPT_SYNONYM.csv) into this folder.

For testing, minimal example files are provided in omop_lite/synthetic.

Build and start the `omop-ts` docker container

Start the containerized OMOP database using Docker Compose:

docker compose -f compose-omop-ts.yml up -d

This will launch a PostgreSQL instance with the necessary schema and extensions pre-configured.

Verify

Verify that the database was properly populated by running the following command:

docker exec omop-ts-omop-ts-1 psql -U postgres -c "SELECT * FROM cdm.concept LIMIT 5;"

You should see output showing the first 5 rows from the concept table, confirming the database is set up correctly.

Next Steps

Once your database is running, configure Lettuce to connect by specifying the connection details in your .env file, as shown in the Quickstart guide.

If you’re using the containerised omop-ts setup, the corresponding .env file would look like:

DB_HOST="localhost"
DB_USER="postgres"
DB_PASSWORD="password"
DB_NAME="omop"
DB_PORT="5432"
DB_SCHEMA="cdm"
DB_VECTABLE="embeddings"
DB_VECSIZE=384

Lettuce Quickstart Deploying Lettuce