uniprotdb.index file is showing as generic file #887

vineethvintu · 2024-09-17T20:06:38Z

Expected Behavior

I have provided the below command
mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

I have created the Uniprotdb using mmseqs createdb command so the uniportdb.index file was created with it.

Current Behavior

But I am seeing after giving expandaln command facing an issue saying the uniprotdb.index is generic type
Input database "./uniprot/uniprotdb.index" has the wrong type (Generic)
Allowed input:

Index
Nucleotide
Profile
Aminoacid

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.
MMSEQS="$1"
QUERY="$2"
BASE="$4"
DB1="$5"
DB2="$6"
DB3="$7"
USE_ENV="$8"
USE_TEMPLATES="$9"
FILTER="${10}"
TAXONOMY="${11}"
M8OUT="${12}"
EXPAND_EVAL=inf
ALIGN_EVAL=10
DIFF=3000
QSC=-20.0
MAX_ACCEPT=1000000
if [ "${FILTER}" = "1" ]; then
0.1 was not used in benchmarks due to POSIX shell bug in line above
EXPAND_EVAL=0.1
ALIGN_EVAL=10
QSC=0.8
MAX_ACCEPT=100000
fi
export MMSEQS_CALL_DEPTH=1
SEARCH_PARAM="--num-iterations 3 --db-load-mode 2 -a --k-score 'seq:96,prof:80' -e 0.1 --max-seqs 10000"
FILTER_PARAM="--filter-min-enable 1000 --diff ${DIFF} --qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95"
EXPAND_PARAM="--expansion-mode 0 -e ${EXPAND_EVAL} --expand-filter-clusters ${FILTER} --max-seq-id 0.95"
mkdir -p "${BASE}"
mkdir -p "${BASE}"
"${MMSEQS}" createdb "${QUERY}" "${BASE}/qdb"
"${MMSEQS}" search "${BASE}/qdb" "${DB1}" "${BASE}/res" "${BASE}/tmp1" $SEARCH_PARAM
"${MMSEQS}" mvdb "${BASE}/tmp1/latest/profile_1" "${BASE}/prof_res"
"${MMSEQS}" lndb "${BASE}/qdb_h" "${BASE}/prof_res_h"
mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

I got stucked at the above command

next I am gonna do
"${MMSEQS}" align "${BASE}/prof_res" "${DB1}.idx" "${BASE}/res_exp" "${BASE}/res_exp_realign" --db-load-mode 2 -e ${ALIGN_EVAL} --max-accept ${MAX_ACCEPT} --alt-ali 10 -a
"${MMSEQS}" filterresult "${BASE}/qdb" "${DB1}.idx" "${BASE}/res_exp_realign" "${BASE}/res_exp_realign_filter" --db-load-mode 2 --qid 0 --qsc $QSC --diff 0 --max-seq-id 1.0 --filter-min-enable 100

MMseqs Output (for bugs)

Please make sure to also post the complete output of MMseqs. You can use gist.github.com for large output.
$ time mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124
expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

MMseqs Version: GITDIR-NOTFOUND
Expansion mode 0
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Max sequence length 65535
Score bias 0
Compositional bias 1
Compositional bias 1
E-value threshold inf
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Pseudo count mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Expand filter clusters 0
Use filter only at N seqs 0
Maximum seq. id. threshold 0.95
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Preload mode 2
Compressed 0
Threads 124
Verbosity 3

Input database "./uniprot/uniprotdb.index" has the wrong type (Generic)
Allowed input:

Index
Nucleotide
Profile
Aminoacid

Context

trying to get the mmseqs out in the MSA format so we can input that to Alphafold to predict the structure of protein

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
MMseqs2 (Many against Many sequence searching) is an open-source software suite for very fast,
parallelized protein sequence searches and clustering of huge protein sequence data sets.

Please cite: M. Steinegger and J. Soding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi:10.1038/nbt.3988 (2017).

MMseqs2 Version: GITDIR-NOTFOUND
© Martin Steinegger ([email protected])

usage: mmseqs []

Easy workflows for plain text input/output
easy-search Sensitive homology search
easy-cluster Slower, sensitive clustering
easy-linclust Fast linear time cluster, less sensitive clustering
easy-taxonomy Taxonomic classification
easy-rbh Find reciprocal best hit

Main workflows for database input/output
search Sensitive homology search
map Map nearly identical sequences
rbh Reciprocal best hit search
linclust Fast, less sensitive clustering
cluster Slower, sensitive clustering
clusterupdate Update previous clustering with new sequences
taxonomy Taxonomic classification

Input database creation
databases List and download databases
createdb Convert FASTA/Q file(s) to a sequence DB
createindex Store precomputed index on disk to reduce search overhead
convertmsa Convert Stockholm/PFAM MSA file to a MSA DB
msa2profile Convert a MSA DB to a profile DB

Format conversion for downstream processing
convertalis Convert alignment DB to BLAST-tab, SAM or custom format
createtsv Convert result DB to tab-separated flat file
convert2fasta Convert sequence DB to FASTA format
taxonomyreport Create a taxonomy report in Kraken or Krona format

An extended list of all modules can be obtained by calling 'mmseqs -h'.

Bash completion for modules and parameters can be installed by adding "source MMSEQS_HOME/util/bash-completion.sh" to your "$HOME/.bash_profile".
Include the location of the MMseqs2 binary in your "$PATH" environment variable.

Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
$ which mmseqs
~/MMseqs2-71dd32ec43e3ac4dabf111bbc4b124f1c66a85f1/build/bin/mmseqs
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
Operating system and version:
MACOS 15

vineethvintu · 2024-09-17T21:05:21Z

Actually Now I have used
For the next step, an index file of the targetDB is computed for a fast read-in. It is recommended
to compute the index if the targetDB is reused for several searches. If only few searches against this
database will be done, this step should be skipped.
mmseqs createindex targetDB tmp
This call will create a targetDB.idx file. It is just possible to have one index per database.
Then generate a directory for temporary files. MMseqs2 can produce a high IO on the file system.
It is recommended to create this temporary folder on a local drive.
Then after I got
tmp uniprotdb.dbtype uniprotdb_h.dbtype uniprotdb.idx.0 uniprotdb.idx.2 uniprotdb.idx.4 uniprotdb.idx.index uniprotdb.lookup
uniprotdb uniprotdb_h uniprotdb_h.index uniprotdb.idx.1 uniprotdb.idx.3 uniprotdb.idx.dbtype uniprotdb.index uniprotdb.source

so now I am confused which idx file needs to be considered ?

mmseqs expandaln ./base/qdb ./uniprot/uniprotdb.index ./base/res ./uniprot/uniprotdb.index ./base/res_exp --db-load-mode 2 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 --threads 124

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uniprotdb.index file is showing as generic file #887

uniprotdb.index file is showing as generic file #887

vineethvintu commented Sep 17, 2024

vineethvintu commented Sep 17, 2024 •

edited

Loading

uniprotdb.index file is showing as generic file #887

uniprotdb.index file is showing as generic file #887

Comments

vineethvintu commented Sep 17, 2024

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

I got stucked at the above command

MMseqs Output (for bugs)

Context

Your Environment

vineethvintu commented Sep 17, 2024 • edited Loading

vineethvintu commented Sep 17, 2024 •

edited

Loading