BunnyQuickstart

Getting started with Bunny

This page will guide you through getting Bunny running locally.

Start by cloning the Hutch tools repository

Prerequisites

  • Bunny runs on Python version >= 3.9 and < 3.11
  • Dependencies are managed with poetry, which needs to be installed if you run Bunny outside a container.
  • Bunny needs to query a database
    • A remote OMOP-CDM database running
    • Or a tarball containing a pg_dump of an OMOP-CDM database

OMOP-CDM setup

Before Bunny can get up and running, it needs a database to query. If you don’t have a real OMOP-CDM, you can run a mock database.

Mock database setup

Start a containerized database

The compose.yml in the root of the Hutch tools repository can start a database container with

docker compose up db

This will initialise a Postgres instance in the container.

These instructions assume you have a pre-existing OMOP CDM Postgresql database and it can be called hutch_db.tar, for example, as below.

Copy the data

Navigate to the folder containing hutch_db.tar and copy it into the container with:

docker cp hutch_db.tar hutch-bunny-dev-db-1:/

Start running bash in your container

docker exec -it hutch-bunny-dev-db-1 bash

Use pg_restore to load the data into the database

pg_restore --dbname=postgres --host=localhost --port=5432 --username=postgres --password hutch_db.tar

If prompted, provide “postgres” as a password

You can then exit the container with ctrl+d or exit

Environment configuration

To run Bunny locally, your environment needs to have some variables configured.

There are various ways to load environment variables, but a convenient way for this purpose is to use a plugin for poetry that loads environment variables from a .env file.

Create a file called .env in app/bunny

The file should contain the variables required to connect to a database and Relay.

For example, if using the containerized Relay and mock database:

DATASOURCE_DB_USERNAME=postgres
DATASOURCE_DB_PASSWORD=postgres
DATASOURCE_DB_DATABASE=postgres
DATASOURCE_DB_DRIVERNAME=postgresql
DATASOURCE_DB_SCHEMA=public
DATASOURCE_DB_PORT=5432
DATASOURCE_DB_HOST=localhost
TASK_API_BASE_URL=http://localhost:8080
TASK_API_USERNAME=username
TASK_API_PASSWORD=password
LOW_NUMBER_SUPPRESSION_THRESHOLD=
ROUNDING_TARGET=
POLLING_INTERVAL=5

If you are querying a remote database, the variables prefixed with DATASOURCE must be configured accordingly. If you use another method to set your environment variables, follow the example above.

Add the plugin

poetry add self poetry-plugin-dotenv

Installing dependencies

The first time you run Bunny outside a container, you will need to install its dependencies by running

poetry install

in app/bunny

Running the Docker daemon

Bunny has a daemon which polls Relay for jobs, so needs to have a Relay instance running.

Start the Relay container

The compose used to start up the database also contains an implementation of Relay.

docker compose up relay -d

Run the Bunny daemon

The Bunny daemon can then be run using poetry. This will ensure the dependencies and environment variables are loaded

poetry run bunny-daemon

You should then see a message in your console like this:

INFO - 12-Nov-24 12:36:24 - Setting up database connection...
INFO - 12-Nov-24 12:36:24 - Looking for job...
INFO - 12-Nov-24 12:36:29 - Job received. Resolving...
INFO - 12-Nov-24 12:36:29 - Processing query...
INFO - 12-Nov-24 12:36:30 - Solved availability query
INFO - 12-Nov-24 12:36:30 - Job resolved.
INFO - 12-Nov-24 12:36:35 - Looking for job...
INFO - 12-Nov-24 12:36:40 - Looking for job...
INFO - 12-Nov-24 12:36:45 - Looking for job...

Bunny establishes a connection to your OMOP-CDM database, then polls Relay for a job. When it receives a job, it processes the query, queries the database, and sends the result back to Relay. You won’t see the results here, but if you see the messages, then it’s successfully contacting both the database and Relay.

🎉
Congratulations on your first Bunny query!

Running bunny-cli

To run Bunny without Relay, the command-line interface can be used. This needs a file with the right JSON schema to run (example below).

How to run

You can use the CLI from the same image as the daemon, by overriding the entrypoint.

You will need to have made the input file available to the docker container, for example by mounting a volume.

It is possible to pass Docker arguments to docker run before the image, and arguments to the Bunny CLI after the image.

Here’s an example with docker run:

docker run \
-v <path/to/rquest-query.json>:./rquest-query.json \
--entrypoint poetry \
ghcr.io/health-informatics-uon/hutch/bunny:<TAG> \
run bunny-cli --body ./rquest-query.json

Sample input files

availability.json
{
  "task_id": "job-2023-01-13-14: 20: 38-project",
  "project": "project_id",
  "owner": "user1",
  "cohort": {
    "groups": [
      {
        "rules": [
          {
            "varname": "OMOP",
            "varcat": "Person",
            "type": "TEXT",
            "oper": "=",
            "value": "8507"
          }
        ],
        "rules_oper": "AND"
      }
    ],
    "groups_oper": "OR"
  },
  "collection": "collection_id",
  "protocol_version": "v2",
  "char_salt": "salt",
  "uuid": "unique_id"
}