Aggregating data for contingency tables
This tutorial can be run as a Jupyter notebook from the 5s-TES notebooks repository
Federated analysis on contingency tables is relatively simple. Counts are easy to federate: each TRE calculates their local count for some group, then these are aggregated by adding the counts together. Each cell of a contingency table is a count, so the table can be federated by requesting these counts, and then statistical analyses can be performed on the aggregate.
The example data were produced by running the Custom Image wizard using the following parameters:
| Field | value |
|---|---|
| Docker image | ghcr.io/health-informatics-uon/five-safes-tes-analytics-dev:sha-9ac04bc |
| Workdir | /app |
| Commands | --user-query=SELECT g.concept_name AS gender_name, r.concept_name AS race_name\nFROM public.person p\nJOIN public.concept g ON p.gender_concept_id = g.concept_id\nJOIN public.concept r ON p.race_concept_id = r.concept_id\nWHERE p.race_concept_id IN (8515, 8516, 8527) --analysis=contingency_table --output-filename=/outputs/output --output-format=json |
The UI should look like this:

Expand for example JSON
{
"id": "504",
"name": "test chi-sq",
"description": "Federated analysis task",
"inputs": null,
"outputs": [
{
"name": "Query Results",
"description": "Results from the requested query execution",
"url": "s3://",
"path": "/outputs",
"type": "DIRECTORY"
}
],
"resources": null,
"executors": [
{
"image": "ghcr.io/health-informatics-uon/five-safes-tes-analytics-dev:sha-9ac04bc",
"command": [
"--user-query=SELECT g.concept_name AS gender_name, r.concept_name AS race_name\nFROM public.person p\nJOIN public.concept g ON p.gender_concept_id = g.concept_id\nJOIN public.concept r ON p.race_concept_id = r.concept_id\nWHERE p.race_concept_id IN (8515, 8516, 8527)",
"--analysis=contingency_table",
"--output-filename=/outputs/output",
"--output-format=json"
],
"workdir": "/app",
"stdin": null,
"stdout": null,
"stderr": null,
"env": {}
}
],
"volumes": null,
"tags": {
"Project": "NottinghamDemo",
"tres": "Nottingham TRE 01|Nottingham TRE 02"
},
"logs": null,
"creation_time": null
}
import pandas as pd
from contingency_table_utils import read_contingency_table_from_json, aggregate_tables
from scipy.stats import chi2_contingency
The json produced by this analysis can be read into tables using the contingency_table_utils module supplied.
tre1 = read_contingency_table_from_json(\"data/tre1.json\")
tre2 = read_contingency_table_from_json(\"data/tre2.json\")
tre1.data
| gender_name | race_name | n |
|---|---|---|
| FEMALE | Asian | 29 |
| FEMALE | Black or African American | 44 |
| FEMALE | White | 411 |
| MALE | Asian | 41 |
| MALE | Black or African American | 38 |
| MALE | White | 433 |
The data aren't very interesting, as they simply report how many men and women there are of three ethnicities in the synthetic datasets, but they serve to show how contingency tables can be assembled.
aggregate_tables checks that your tables have the same variables, and sums the counts if they do.
aggregate = aggregate_tables([tre1, tre2])
The contingency_table property organises this data into the format for statistical analyses.
aggregate.contingency_table
| FEMALE | MALE | |
|---|---|---|
| Asian | 1011 | 982 |
| Black or African American | 1080 | 1022 |
| White | 1426 | 1441 |
This format can be used for scipy.stats contingency table functions.
chisq = chi2_contingency(aggregate.contingency_table)
print(f\"The p-value for the chi-squared test is {chisq.pvalue:.3f}\")
The p-value for the chi-squared test is 0.508
Phew, the synthetic data haven't got any surprising imbalances.
This tutorial should show how you can perform federated analyses based on contingency tables of count data. The key requirement for writing your own analyses is writing a SQL query that, like
SELECT g.concept_name AS gender_name, r.concept_name AS race_name
FROM public.person p
JOIN public.concept g ON p.gender_concept_id = g.concept_id
JOIN public.concept r ON p.race_concept_id = r.concept_id;
produces a table of two categorical columns, e.g.
| gender_name | race_name |
|---|---|
| FEMALE | Asian |
| FEMALE | Black or African American |
| FEMALE | White |
| MALE | White |
| FEMALE | Native Hawaiian or Other Pacific Islander |
| MALE | Native Hawaiian or Other Pacific Islander |
| ... | ... |