Database Quickstart

Database Setup for Lettuce

Lettuce requires a PostgreSQL database with the OMOP CDM (Common Data Model) schema, along with specific extensions for full-text search and vector similarity capabilities.

Prerequisites

Required PostgreSQL Extensions

  • Full-text search: Built-in PostgreSQL capability for text searching
  • pgvector: Extension for vector similarity searches

Required OMOP Tables

Lettuce expects the following tables from the OMOP CDM v5.0:

TablePurposeKey Columns
conceptCore vocabulary table containing all medical conceptsconcept_id, concept_name, vocabulary_id, concept_code, standard_concept
concept_synonymAlternative names for conceptsconcept_id, concept_synonym_name, language_concept_id
concept_relationshipRelationships between conceptsconcept_id_1, concept_id_2, relationship_id, valid_start_date, valid_end_date
concept_ancestorHierarchical relationships between conceptsancestor_concept_id, descendant_concept_id, min_levels_of_separation, max_levels_of_separation

To enable semantic similarity search, you’ll need to generate vector embeddings for concepts using a supported model defined in components.embeddings.

The embedding model used to generate your vector table must match the model specified in your Lettuce configuration. If they differ, semantic search will return inaccurate results or throw an error.

Setup Options

You can use either an existing OMOP-compliant database, or set up a containerized instance for development and testing.


Option 1: Using an Existing OMOP Database

If you have an existing Postgres instance running an OMOP-CDM database, it needs some modifications to serve as a database for Lettuce.

The only way the text search features can work with a reasonable latency is to materialise a column for tsvectors and index on that column, otherwise the database has to effectively compute the column each time the query is run. Lettuce assumes this column is present. To create the column:

ALTER TABLE {SCHEMA_NAME}.concept
ADD COLUMN concept_name_tsv tsvector
GENERATED ALWAYS AS (to_tsvector('english', concept_name)) STORED;

Then the index:

CREATE INDEX idx_concept_fts ON {SCHEMA_NAME}.concept USING GIN (concept_name_tsv);

Embeddings

Lettuce uses the pgvector postgres extension for semantic search. To install pgvector, follow their installation instructions

Then:

CREATE EXTENSION IF NOT EXISTS vector;

If you have a parquet file containing embeddings, you can copy them to a table in your cdm schema for Lettuce to read from. Example code for this can be found in omop-lite

💡

Code for reading the concept table from an existing database, then encoding the concepts is a work in progress. It works, but is highly experimental.

Option 2: Set Up a Containerized Database with omop-ts

Use the omop-lite repository to quickly spin up a local OMOP-CDM database.

Clone the omop-lite repository

Clone the repository to your local machine:

git clone git@github.com:Health-Informatics-UoN/omop-lite.git
cd omop-lite

Prepare data directory

Create a data directory at the root level of the repo:

mkdir -p ./data

Copy your OMOP .csv files (e.g., CONCEPT.csv, CONCEPT_SYNONYM.csv) into this folder.

For testing, minimal example files are provided in omop_lite/synthetic.

Prepare embeddings

Building omop-lite assumes you have pre-computed embeddings stored in a parquet file in omop-lite/text-search/embeddings.parquet. An example file is provided in the omop-lite repository. This parquet file holds a table with concept IDs in the first column, and embedding vectors in the third.

If you wish to create embeddings online, the embeddings table can be dropped and rebuilt. If using omop-lite, still run with the text-search profile.

Build and start the omop-ts docker container

Start the containerized OMOP database using Docker Compose:

docker compose --profile text-search up -d 

This will launch a PostgreSQL instance with the necessary schema and extensions pre-configured.

Verify

Verify that the database was properly populated by running the following command:

docker exec omop-ts-omop-ts-1 psql -U postgres -c "SELECT * FROM cdm.concept LIMIT 5;"

You should see output showing the first 5 rows from the concept table, confirming the database is set up correctly.

Next Steps

Once your database is running, configure Lettuce to connect by specifying the connection details in your .env file, as shown in the Quickstart guide.