Getting started with Bunny
This page will guide you through getting Bunny running locally.
Start by cloning the Hutch tools repository
Prerequisites
- Bunny runs on Python version >= 3.9 and < 3.11
- Dependencies are managed with poetry, which needs to be installed if you run Bunny outside a container.
- Bunny needs to query a database
- A remote OMOP-CDM database running
- Or a tarball containing a
pg_dump
of an OMOP-CDM database
OMOP-CDM setup
Before Bunny can get up and running, it needs a database to query. If you don’t have a real OMOP-CDM, you can run a mock database.
Mock database setup
Start a containerized database
The compose.yml
in the root of the Hutch tools repository can start a database container with
docker compose up db
This will initialise a Postgres instance in the container.
These instructions assume you have a pre-existing OMOP CDM Postgresql database
and it can be called hutch_db.tar
, for example, as below.
Copy the data
Navigate to the folder containing hutch_db.tar
and copy it into the container with:
docker cp hutch_db.tar hutch-bunny-dev-db-1:/
Start running bash in your container
docker exec -it hutch-bunny-dev-db-1 bash
Use pg_restore to load the data into the database
pg_restore --dbname=postgres --host=localhost --port=5432 --username=postgres --password hutch_db.tar
If prompted, provide “postgres” as a password
You can then exit the container with ctrl+d
or exit
Environment configuration
To run Bunny locally, your environment needs to have some variables configured.
There are various ways to load environment variables, but a convenient way for this purpose is to use a plugin for poetry that loads environment variables from a .env file.
Create a file called .env in app/bunny
The file should contain the variables required to connect to a database and Relay.
For example, if using the containerized Relay and mock database:
DATASOURCE_DB_USERNAME=postgres
DATASOURCE_DB_PASSWORD=postgres
DATASOURCE_DB_DATABASE=postgres
DATASOURCE_DB_DRIVERNAME=postgresql
DATASOURCE_DB_SCHEMA=public
DATASOURCE_DB_PORT=5432
DATASOURCE_DB_HOST=localhost
TASK_API_BASE_URL=http://localhost:8080
TASK_API_USERNAME=username
TASK_API_PASSWORD=password
LOW_NUMBER_SUPPRESSION_THRESHOLD=
ROUNDING_TARGET=
POLLING_INTERVAL=5
If you are querying a remote database, the variables prefixed with DATASOURCE
must be configured accordingly.
If you use another method to set your environment variables, follow the example above.
Add the plugin
poetry add self poetry-plugin-dotenv
Installing dependencies
The first time you run Bunny outside a container, you will need to install its dependencies by running
poetry install
in app/bunny
Running the Docker daemon
Bunny has a daemon which polls Relay for jobs, so needs to have a Relay instance running.
Start the Relay container
The compose used to start up the database also contains an implementation of Relay.
docker compose up relay -d
Run the Bunny daemon
The Bunny daemon can then be run using poetry. This will ensure the dependencies and environment variables are loaded
poetry run bunny-daemon
You should then see a message in your console like this:
INFO - 12-Nov-24 12:36:24 - Setting up database connection...
INFO - 12-Nov-24 12:36:24 - Looking for job...
INFO - 12-Nov-24 12:36:29 - Job received. Resolving...
INFO - 12-Nov-24 12:36:29 - Processing query...
INFO - 12-Nov-24 12:36:30 - Solved availability query
INFO - 12-Nov-24 12:36:30 - Job resolved.
INFO - 12-Nov-24 12:36:35 - Looking for job...
INFO - 12-Nov-24 12:36:40 - Looking for job...
INFO - 12-Nov-24 12:36:45 - Looking for job...
Bunny establishes a connection to your OMOP-CDM database, then polls Relay for a job. When it receives a job, it processes the query, queries the database, and sends the result back to Relay. You won’t see the results here, but if you see the messages, then it’s successfully contacting both the database and Relay.
Running bunny-cli
To run Bunny without Relay, the command-line interface can be used. This needs a file with the right JSON schema to run (example below).
How to run
You can use the CLI from the same image as the daemon, by overriding the entrypoint
.
You will need to have made the input file available to the docker container, for example by mounting a volume.
It is possible to pass Docker arguments to docker run
before the image, and arguments to the Bunny CLI after the image.
Here’s an example with docker run
:
docker run \
-v <path/to/rquest-query.json>:./rquest-query.json \
--entrypoint poetry \
ghcr.io/health-informatics-uon/hutch/bunny:<TAG> \
run bunny-cli --body ./rquest-query.json
Sample input files
{
"task_id": "job-2023-01-13-14: 20: 38-project",
"project": "project_id",
"owner": "user1",
"cohort": {
"groups": [
{
"rules": [
{
"varname": "OMOP",
"varcat": "Person",
"type": "TEXT",
"oper": "=",
"value": "8507"
}
],
"rules_oper": "AND"
}
],
"groups_oper": "OR"
},
"collection": "collection_id",
"protocol_version": "v2",
"char_salt": "salt",
"uuid": "unique_id"
}