This github repo includes mario-py
and mario-R
, which is a Python package for matching and integrating multi-modal single cell data with partially overlapping features. The method is specifically tailored toward proteomic datasets, and for detailed description on the algorithm, including the core methodology, mathmetical ingredients, application on various biological samples, and extensive benchmarking, please refer to the paper.
This work has been lead by Shuxiao Chen from Zongming Lab @Upenn and Bokai Zhu from Nolan lab @Stanford.
For easy usage, we suggest builing a conda
virtualenv with python = 3.8
.
conda create -n mario python=3.8
To install MARIO
, we can easily install it with pip
function (package name pyMARIO
):
python -m pip install pyMARIO
To use in MARIO
in python
:
from mario.match import pipelined_mario
final_matching_lst, embedding_lst = pipelined_mario(data_lst=[df1, df2])
Where df1
and df2
are two dataframes for match and integration, with row as cells, columns as features. Remember for shared features, the column names should be identical. Input list can be multiple dataframes, as MARIO
accomodates for multiple dataset match and integration.
The result contains the a matching list (matching), and a embedding list (integration). For detailed usage please refer to the Full tutorial section.
Similarly, to use in MARIO
in R
(with package reticulate
) :
library(reticulate)
myenvs=conda_list() # get conda virtualenv list
envname=myenvs$name[12] # specify which virtualenv to use, should use the one for MARIO
use_condaenv(envname, required = TRUE)
mario.match <- import("mario.match") # import main mario-py module
pipelined_res = mario.match$pipelined_mario(data_lst=list(df1, df2))
Where the result also contains the matching list and embedding list.
For step by step tutorials on how to use MARIO
, with fine-tuned parameters for optimal results and full functionality, please refer to the documents we provided here:
Python - Jupyter notebook: Match and Integration of Human Bonemarrow datasets
Python - Jupyter notebook: Match and Integration of multiple Xspecies datasets
R - Rmarkdown: Match and Integration of Human Bonemarrow datasets
MARIO
is under the Academic Software License Agreement, please use accordingly.