Skip to content

Commit

Permalink
feat: initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
cbrinson-rise8 committed Nov 21, 2024
1 parent 3182485 commit 267ef0c
Show file tree
Hide file tree
Showing 13 changed files with 1,567 additions and 0 deletions.
22 changes: 22 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,25 @@ services:
depends_on:
api:
condition: service_healthy

algo-test-runner:
build:
context: tests/algorithm
dockerfile: Dockerfile.algo
env_file:
- tests/algorithm/algo.env
environment:
DB_URI: "postgresql+psycopg2://postgres:pw@db:5432/postgres"
API_URL: "http://api:8080"
volumes:
- ./tests/algorithm/scripts:/app/scripts
- ./tests/algorithm/data:/app/data
- ./tests/algorithm/results:/app/results
- ./tests/algorithm/configurations:/app/configurations
depends_on:
db:
condition: service_healthy
api:
condition: service_healthy
profiles:
- algo-test
12 changes: 12 additions & 0 deletions tests/algorithm/Dockerfile.algo
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Use the official Python 3.11 slim image as the base
FROM python:3.12-slim

# Set the working directory
WORKDIR /app

# Copy the scripts and data directories into the image
COPY scripts /app/scripts
COPY data /app/data

# Install Python dependencies
RUN pip install --no-cache-dir pandas requests
80 changes: 80 additions & 0 deletions tests/algorithm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Record Linkage Algorithm Testing

This repository contains a project to test the effectiveness of the RecordLinker algorithm.

## Prerequisites

Before getting started, ensure you have the following installed:

- [Docker](https://docs.docker.com/engine/install/)
- [Docker Compose](https://docs.docker.com/compose/install/)

## Directory Structure

- `/`: Contains the `.env` file and `Dockerfile` to build
- `configurations/`: Contains the configuration file for the algorithm tests
- `data/`: Contains the data `.csv` files used for the algorithm tests (seed file and test file)
- `results/`: Contains the results of the algorithm tests
- `scripts/`: Contains the scripts to run the algorithm tests

## Steup

1. Build the Docker images:

```bash
docker compose --profile algo-test build
```

2. Configure environment variables

`tests/algorithm/algo.env`

Edit the environment variables in the file

3. Edit the algorithm configuration file

`tests/algorithm/configurations/algorithm_configuration.json`

Edit the configuration file to tune the algorithm parameters

## Running Algorithm Tests

1. Run the tests

```bash
docker compose --profile algo-test run --rm algo-test-runner python scripts/run_test.py
```

2. Analyze the results

The results of the algorithm tests will be available in the `results/` directory.

The results will be in a csv formatted file with each test case result information

## Environment Variables

1. `env_file`: The attributes that should be tuned for your particular algorithm test,
are located in the `algo_test.env` file.

2. `environment`: The attributes that should likely remain static for all algorithm tests are located directly in the `compose.yml` file.

### Algorithm Test Parameters

The following environment variables can be tuned in the `algo-test.env` file:

- `SEED_FILE`: The file containing person data to seed the mpi with
- `TEST_FILE`: The file containing patient data to test the algorithm with
- `ALGORITHM_CONFIGURATION`: The file containing the algorithm configuration json
- `ALGORITHM_NAME`: The name of the algorithm to use (either the name of your `ALGORITHM_CONFIGURATION` or can be the built in `dibbs-basic` or `dibbs-enhanced` algorithms)


## Cleanup

After you've finished running algorithm tests and analyzing the results, you can stop and remove the Docker containers by running:
```bash
docker compose --profile algo-test down
```
## Output
4 changes: 4 additions & 0 deletions tests/algorithm/algo.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
SEED_FILE="data/seed_data.csv"
TEST_FILE="data/test_data.csv"
ALGORITHM_CONFIGURATION="configurations/algorithm_configuration.json"
ALGORITHM_NAME="test-config"
67 changes: 67 additions & 0 deletions tests/algorithm/configurations/algorithm_configuration.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
{
"label": "test-config",
"description": "test algorithm configuration",
"is_default": false,
"include_multiple_matches": true,
"belongingness_ratio": [0.75, 0.9],
"passes": [
{
"blocking_keys": [
"BIRTHDATE",
"SEX"
],
"evaluators": [
{
"feature": "FIRST_NAME",
"func": "func:recordlinker.linking.matchers.feature_match_fuzzy_string"
},
{
"feature": "LAST_NAME",
"func": "func:recordlinker.linking.matchers.feature_match_exact"
}
],
"rule": "func:recordlinker.linking.matchers.eval_perfect_match",
"cluster_ratio": 0.9,
"kwargs": {
"thresholds": {
"FIRST_NAME": 0.9,
"LAST_NAME": 0.9,
"BIRTHDATE": 0.95,
"ADDRESS": 0.9,
"CITY": 0.92,
"ZIP": 0.95
}
}
},
{
"blocking_keys": [
"ZIP",
"FIRST_NAME",
"LAST_NAME",
"SEX"
],
"evaluators": [
{
"feature": "ADDRESS",
"func": "func:recordlinker.linking.matchers.feature_match_fuzzy_string"
},
{
"feature": "BIRTHDATE",
"func": "func:recordlinker.linking.matchers.feature_match_exact"
}
],
"rule": "func:recordlinker.linking.matchers.eval_perfect_match",
"cluster_ratio": 0.9,
"kwargs": {
"thresholds": {
"FIRST_NAME": 0.9,
"LAST_NAME": 0.9,
"BIRTHDATE": 0.95,
"ADDRESS": 0.9,
"CITY": 0.92,
"ZIP": 0.95
}
}
}
]
}
Loading

0 comments on commit 267ef0c

Please sign in to comment.