This repo contains the Jupyther Notebooks used to obtain the results presented in the paper Roughness of Molecular Property Landscapes and Its Impact on Modellability.
In addition to python>=3.7
and common scientific libraries (e.g., scipy
, numpy
, pandas
, matplotlib
), these packages are required to be able to run the notebooks:
rogi
scikit-learn
PyTDC
(with all dependencies; note that some are not automatically installed bypip
, e.g.,requests
andnetworkx
)rdkit
Full detail of the Python environment used are in the environment.yml
file.
This is a description of the folders and files to help you navigate the repo.
data
is generated byPyTDC
and it contains the TDC datasets downloadedoracle
is generated byPyTDC
and it contains a pickle file used by the packagechembl_datasets
contain the ChEMBL datasets provided in the SI of the paper Exposing the limitations of molecular machine learning with activity cliffsplots
contains the plots for all results in the paperlandscapes
contains 2D and 3D visualizations of the property landscapes
toy-examples
: results related to the analytical function tests (Figures 1--3).regression
: results for all three sets of datasets (ZINC+GuacaMol, TDC, ChEMBL) related to all regression tasks, for ROGI, RMODI, and SARI (Figure 4, Table 1, and related SI Figures).compute_sari
: computes SARI scores for all datasets (the output of this notebook is the fileregression_sari_scores.csv
, which is used by theregression
notebook for plotting)classification
: results related to all classification tasks, for ROGI and MODI (Figure 5, Table 2, and related SI Figures).binarized_regression
: results for the additional classification tasks based on the binzrization of the regression datasets (SI Figure 11)convergence
: results testing the convergence of ROGI with datasets of increasing size (SI Figures 13--16)landscape_viz
: generates 2D and 3D visualizations of the property landscapes.
regression_results.pkl
: pickle file storing the results obtained in the notebookregression.ipynb
. In the regression notebook, there is a cell to load this pickle file rather than re-running all experiments.classification_results.pkl
: pickle file storing the results obtained in the notebookclassification.ipynb
.binarized_regression_results.pkl
: pickle file storing the results obtained in the notebookbinarized_regression.ipynb
.convergence_results.pkl
: pickle file storing the results obtained in the notebookconvergence.ipynb
.