-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
variables are collinear #41
Comments
Actually these relative abundances are correct, I forgot that they are scaled to add to 1e6. I think this is real collinearity, caused by the presence of synthetic clades (e.g. "Bacteria") where a parent node has only one child, or two children but one is dominant. Two TODOs:
|
It seems like you are solving this with the |
I also want to clarify points 1 & 3: if the terminal node of an input data is a mix of strain, species, and genus, for example, what will be the recommendation? |
One of the features of lefse-conda is determining which clades are differentially abundant. I think the abundance of synthetic clades (e.g., Bacteria, etc.) is necessary for the cladogram if the internal nodes are of interest. Otherwise, the cladogram could just depict which taxa in the terminal nodes are differentially abundant. If the taxonomy is included in the rowData instead of the rownames and we restrict (recommend) the input to only terminal nodes at the same taxonomic level we could use the mia package: library(mia)
data("GlobalPatterns")
l <- splitByRanks(GlobalPatterns)
l$features <- GlobalPatterns
mergedSE <- mergeSEs(l, collapse.cols = TRUE)
## Instead of merging, the lefse analysis could be run at all taxonomic levels individually |
This is a new warning being generated by the tests, and I don't think there are actually should be collinear variables here so I'm not sure what's happening.
Created on 2024-06-15 with reprex v2.1.0
The text was updated successfully, but these errors were encountered: