generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3182485
commit 267ef0c
Showing
13 changed files
with
1,567 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Use the official Python 3.11 slim image as the base | ||
FROM python:3.12-slim | ||
|
||
# Set the working directory | ||
WORKDIR /app | ||
|
||
# Copy the scripts and data directories into the image | ||
COPY scripts /app/scripts | ||
COPY data /app/data | ||
|
||
# Install Python dependencies | ||
RUN pip install --no-cache-dir pandas requests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Record Linkage Algorithm Testing | ||
|
||
This repository contains a project to test the effectiveness of the RecordLinker algorithm. | ||
|
||
## Prerequisites | ||
|
||
Before getting started, ensure you have the following installed: | ||
|
||
- [Docker](https://docs.docker.com/engine/install/) | ||
- [Docker Compose](https://docs.docker.com/compose/install/) | ||
|
||
## Directory Structure | ||
|
||
- `/`: Contains the `.env` file and `Dockerfile` to build | ||
- `configurations/`: Contains the configuration file for the algorithm tests | ||
- `data/`: Contains the data `.csv` files used for the algorithm tests (seed file and test file) | ||
- `results/`: Contains the results of the algorithm tests | ||
- `scripts/`: Contains the scripts to run the algorithm tests | ||
|
||
## Steup | ||
|
||
1. Build the Docker images: | ||
|
||
```bash | ||
docker compose --profile algo-test build | ||
``` | ||
|
||
2. Configure environment variables | ||
|
||
`tests/algorithm/algo.env` | ||
|
||
Edit the environment variables in the file | ||
|
||
3. Edit the algorithm configuration file | ||
|
||
`tests/algorithm/configurations/algorithm_configuration.json` | ||
|
||
Edit the configuration file to tune the algorithm parameters | ||
|
||
## Running Algorithm Tests | ||
|
||
1. Run the tests | ||
|
||
```bash | ||
docker compose --profile algo-test run --rm algo-test-runner python scripts/run_test.py | ||
``` | ||
|
||
2. Analyze the results | ||
|
||
The results of the algorithm tests will be available in the `results/` directory. | ||
|
||
The results will be in a csv formatted file with each test case result information | ||
|
||
## Environment Variables | ||
|
||
1. `env_file`: The attributes that should be tuned for your particular algorithm test, | ||
are located in the `algo_test.env` file. | ||
|
||
2. `environment`: The attributes that should likely remain static for all algorithm tests are located directly in the `compose.yml` file. | ||
|
||
### Algorithm Test Parameters | ||
|
||
The following environment variables can be tuned in the `algo-test.env` file: | ||
|
||
- `SEED_FILE`: The file containing person data to seed the mpi with | ||
- `TEST_FILE`: The file containing patient data to test the algorithm with | ||
- `ALGORITHM_CONFIGURATION`: The file containing the algorithm configuration json | ||
- `ALGORITHM_NAME`: The name of the algorithm to use (either the name of your `ALGORITHM_CONFIGURATION` or can be the built in `dibbs-basic` or `dibbs-enhanced` algorithms) | ||
|
||
|
||
## Cleanup | ||
|
||
After you've finished running algorithm tests and analyzing the results, you can stop and remove the Docker containers by running: | ||
```bash | ||
docker compose --profile algo-test down | ||
``` | ||
## Output | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
SEED_FILE="data/seed_data.csv" | ||
TEST_FILE="data/test_data.csv" | ||
ALGORITHM_CONFIGURATION="configurations/algorithm_configuration.json" | ||
ALGORITHM_NAME="test-config" |
67 changes: 67 additions & 0 deletions
67
tests/algorithm/configurations/algorithm_configuration.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
{ | ||
"label": "test-config", | ||
"description": "test algorithm configuration", | ||
"is_default": false, | ||
"include_multiple_matches": true, | ||
"belongingness_ratio": [0.75, 0.9], | ||
"passes": [ | ||
{ | ||
"blocking_keys": [ | ||
"BIRTHDATE", | ||
"SEX" | ||
], | ||
"evaluators": [ | ||
{ | ||
"feature": "FIRST_NAME", | ||
"func": "func:recordlinker.linking.matchers.feature_match_fuzzy_string" | ||
}, | ||
{ | ||
"feature": "LAST_NAME", | ||
"func": "func:recordlinker.linking.matchers.feature_match_exact" | ||
} | ||
], | ||
"rule": "func:recordlinker.linking.matchers.eval_perfect_match", | ||
"cluster_ratio": 0.9, | ||
"kwargs": { | ||
"thresholds": { | ||
"FIRST_NAME": 0.9, | ||
"LAST_NAME": 0.9, | ||
"BIRTHDATE": 0.95, | ||
"ADDRESS": 0.9, | ||
"CITY": 0.92, | ||
"ZIP": 0.95 | ||
} | ||
} | ||
}, | ||
{ | ||
"blocking_keys": [ | ||
"ZIP", | ||
"FIRST_NAME", | ||
"LAST_NAME", | ||
"SEX" | ||
], | ||
"evaluators": [ | ||
{ | ||
"feature": "ADDRESS", | ||
"func": "func:recordlinker.linking.matchers.feature_match_fuzzy_string" | ||
}, | ||
{ | ||
"feature": "BIRTHDATE", | ||
"func": "func:recordlinker.linking.matchers.feature_match_exact" | ||
} | ||
], | ||
"rule": "func:recordlinker.linking.matchers.eval_perfect_match", | ||
"cluster_ratio": 0.9, | ||
"kwargs": { | ||
"thresholds": { | ||
"FIRST_NAME": 0.9, | ||
"LAST_NAME": 0.9, | ||
"BIRTHDATE": 0.95, | ||
"ADDRESS": 0.9, | ||
"CITY": 0.92, | ||
"ZIP": 0.95 | ||
} | ||
} | ||
} | ||
] | ||
} |
Oops, something went wrong.