Database Quickstart

Database Setup for Lettuce

Lettuce requires a PostgreSQL database with the OMOP CDM (Common Data Model) schema, along with specific extensions for full-text search and vector similarity capabilities.

Prerequisites

Required PostgreSQL Extensions

  • Full-text search: Built-in PostgreSQL capability for text searching
  • pgvector: Extension for vector similarity searches

Required OMOP Tables

Lettuce expects the following tables from the OMOP CDM v5.0 vocabulary:

TablePurposeKey Columns
conceptCore vocabulary table containing all medical conceptsconcept_id, concept_name, vocabulary_id, concept_code, standard_concept
concept_synonymAlternative names for conceptsconcept_id, concept_synonym_name, language_concept_id
concept_relationshipRelationships between conceptsconcept_id_1, concept_id_2, relationship_id, valid_start_date, valid_end_date
concept_ancestorHierarchical relationships between conceptsancestor_concept_id, descendant_concept_id, min_levels_of_separation, max_levels_of_separation

To enable semantic similarity search, you’ll need to generate vector embeddings for concepts using a supported model defined in components.embeddings.

The embedding model used to generate your vector table must match the model specified in your Lettuce configuration. If they differ, semantic search will return inaccurate or empty results.

Here are some common models and their expected embedding dimensions:

  • BGE-small: 384
  • GTR-T5-base: 768

Setup Options

You can use either an existing OMOP-compliant database, or set up a containerized instance for development and testing.


Option 1: Using an Existing OMOP Database

If you already have access to a PostgreSQL instance with the required OMOP tables and extensions, you can proceed directly to Next Steps.

Option 2: Set Up a Containerized Database with omop-ts

Use the omop-lite repository to quickly spin up a local OMOP-compatible database.

Clone the omop-lite repository

Clone the repository to your local machine:

git clone git@github.com:Health-Informatics-UoN/omop-lite.git
cd omop-lite

Prepare data directory

Create a data directory at the root level of the repo:

mkdir -p ./data

Copy your OMOP .csv files (e.g., CONCEPT.csv, CONCEPT_SYNONYM.csv) into this folder.

For testing, minimal example files are provided in omop_lite/synthetic.

Build and start the omop-ts docker container

Start the containerized OMOP database using Docker Compose:

docker compose -f compose-omop-ts.yml up -d 

This will launch a PostgreSQL instance with the necessary schema and extensions pre-configured.

Verify

Verify that the database was properly populated by running the following command:

docker exec omop-ts-omop-ts-1 psql -U postgres -c "SELECT * FROM cdm.concept LIMIT 5;"

You should see output showing the first 5 rows from the concept table, confirming the database is set up correctly.

Next Steps

Once your database is running, configure Lettuce to connect by specifying the connection details in your .env file, as shown in the Quickstart guide.

If you’re using the containerised omop-ts setup, the corresponding .env file would look like:

DB_HOST="localhost"
DB_USER="postgres"
DB_PASSWORD="password"
DB_NAME="omop"
DB_PORT="5432"
DB_SCHEMA="cdm"
DB_VECTABLE="embeddings"
DB_VECSIZE=384