Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: createbintaxonomy failed with malloc error #871

Open
zrqiao opened this issue Aug 11, 2024 · 6 comments
Open

Error: createbintaxonomy failed with malloc error #871

zrqiao opened this issue Aug 11, 2024 · 6 comments

Comments

@zrqiao
Copy link

zrqiao commented Aug 11, 2024

Expected Behavior

Taxonomy database created based on a seqdb created from UniProt sequences

Current Behavior

Program crashed with core dumped error and reports Error: createbintaxonomy failed.

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

mmseqs createdb "uniprot_2024_03.fasta" seqdb

then

mmseqs createtaxdb seqdb tmp 

We attempted to vary --tax-db-mode, --tax-mapping-mode, and --threads parameters but observed the same behavior. Any help would be highly appreciated.

We are able to reproduce this issue with a minimal database containing 1000 sequences.

MMseqs Output (for bugs)

> mmseqs createtaxdb seqdb tmp 
createtaxdb seqdb tmp 

MMseqs Version:         15.6f452
NCBI tax dump directory
Taxonomy mapping file  
Taxonomy mapping mode   0
Taxonomy db mode        1
Threads                 48
Verbosity               3

Loading nodes file ... Done, got 2601214 nodes
Loading merged file ... Done, added 79743 merged nodes.
Loading names file ... Done
mmseqs: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted (core dumped)
Error: createbintaxonomy failed

Context

We are trying to create a custom taxonomy database for MSA, such that the resulting .a3m files contain taxonomy information.

Is a taxonomy database already available for download for uniprot_2024_03 for similar releases?

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute MMseqs without any parameters):
  • Which MMseqs version was used (Statically-compiled, self-compiled, Homebrew, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:

Linux 64-bit, 256G memory
MMseqs Version: 15.6f452

@ahof1704
Copy link

Having the same issue. Any insights on how to address this, please?

@piehld
Copy link

piehld commented Aug 12, 2024

I'm also was getting this error earlier...and interestingly only on Linux (seemed to work fine on MacOS for me).

Strangely, it now seems to be working again for me since about a 10 minutes ago, despite having the exact same setup as I had when I was getting the error.

MMseqs2 Version: 45111b641859ed0ddd875b94d6fd1aef1a675b7e

@milot-mirdita
Copy link
Member

Does this happen with the databases download of the uniprot or only if you call createtaxdb manually?

databases goes through a separate branch to extract taxonomic information from uniprot based databases and should not be affected.

@zrqiao
Copy link
Author

zrqiao commented Aug 13, 2024

Does this happen with the databases download of the uniprot or only if you call createtaxdb manually?

databases goes through a separate branch to extract taxonomic information from uniprot based databases and should not be affected.

Thanks for supporting us! This happens when calling createtaxdb manually.

Would you please elaborate on what databases download entails in this context?

To zoom out a bit: is there a feasible mmseqs2 command to generate .a3m files with correct UniRef100 taxonomy identifiers without going through this custom database setup procedure?

@milot-mirdita
Copy link
Member

mmseqs databases UniProtKB uniprot tmp

should download the latest uniprot and set it up correctly for use with MMseqs2 including taxonomy information.

@zrqiao
Copy link
Author

zrqiao commented Aug 16, 2024

mmseqs databases UniProtKB uniprot tmp

should download the latest uniprot and set it up correctly for use with MMseqs2 including taxonomy information.

Thanks for this - we ran this command and obtained main database files containing uniprot_h, uniprot.index etc. However we probably still need some help to understand the next steps to assign taxonomy IDs to alignments. Following sokrypton/ColabFold#216, here is what we tried:

mmseqs convertalis test/qdb uniprot test/res_exp test/res_exp_realign.m8 --format-output query,target,taxid,taxname,taxlineage,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,cigar

and it raised the following error:

Loading NCBI taxonomy
names.dmp, nodes.dmp, merged.dmp from NCBI taxdump could not be found!

Is there something that we are missing here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants