diamond

Accelerated BLAST compatible local sequence aligner.

GPL-3.0 License

Stars
1K
Committers
21

Bot releases are hidden (Show)

diamond - DIAMOND v2.1.9 Latest Release

Published by bbuchfink 9 months ago

  • Corrected the prefix of the query length field for the SAM format.
  • Added the size modifiers 'T', 'M' and 'K' for the --memory-limit/-M option.
  • Added the option --mutual-cover to cluster sequences by mutual coverage percentage of the cluster representative and member sequence.
  • Added the option --symmetric for computing greedy vertex cover with symmetric edges.
  • Fixed an issue that caused the --approx-id option and the approx_pident output field not to work correctly when using the --anchored-swipe option.
  • Added the option --no-reassign to prevent reassignment to closest representative for the greedy vertex cover and clustering workflows.
  • Added the option --connected-component-depth to activate clustering of connected components at a given maximum depth for the greedy vertex cover and the clustering workflows.
  • Fixed a compiler error for Clang v17.
  • Improved search performance when searching with mutual coverage threshold by filtering for sequence length ratio.
  • Added the sensitivity mode --shapes-30x10 with sensitivity approximately equivalent to --mid-sensitive.
  • Added the options --round-coverage and --round-approx-id to set per round cutoffs for cascaded clustering.
  • The CMake switch -DKEEP_TARGET_ID is now obsolete and the corresponding function is always available.
  • Added the option --include-lineage to the taxonomic classification format to include taxonomic lineage in the output.
  • Added native support for the ARM NEON instruction set (contributed by @althonos).
diamond - DIAMOND v2.1.8

Published by bbuchfink over 1 year ago

  • Fixed an issue that could cause reduced performance when running in query-indexed mode.
  • Added support for the JSON output format (option -f json-flat).
  • Added the option --sam-query-len to output query length in SAM format.
diamond - DIAMOND v2.1.7

Published by bbuchfink over 1 year ago

  • Fixed a bug that caused taxonomy names not to be loaded correctly for the makedb workflow.
  • Fixed a bug that caused a crash when using the --target-indexed option.
  • Fixed an error when using the --tmpdir option for the makedb workflow.
  • Added a warning message when sequence accessions are shortened due to parsing rules for the makedb workflow.
  • Added the option --no-parse-seqids to disable parsing of sequence accessions.
  • Changed the command line help to print options separated by command.
  • Fixed an issue that the --ignore-warnings option could not be used for the makedb workflow.
diamond - DIAMOND v2.1.6

Published by bbuchfink over 1 year ago

  • Fixed compatibility issues on older systems without support for AVX2.
  • Fixed linker errors when compiled with -DX86=OFF.
  • Fixed a compiler error on macOS systems.
  • Fixed a bug that could cause missing tags in the XML output format and unaligned queries not to be reported correctly.
  • Fixed a bug that caused the PAF output format not to work correctly.
diamond - DIAMOND v2.1.5

Published by bbuchfink over 1 year ago

  • Disabled the use of frequency based seed masking when using the linear-time search feature with respect to the targets.
  • Fixed a bug that caused a Database file is not a BLAST database error message for the prepdb workflow.
  • Fixed a bug that caused a segmentation fault when using BLAST databases.
  • Added line numbers for error messages when reading taxonomy mapping files.
  • Fixed a bug that could cause a crash when using the greedy-vertex-cover workflow without the --out and --centroid-out options.
  • Fixed a bug that caused the greedy-vertex-cover workflow to only produce a trivial clustering.
  • Fixed a bug that caused the last codon of the -2 reading frame to be translated incorrectly.
  • Reduced the memory use of the clustering workflow.
  • Updated the bundled NCBI toolkit to the latest version.
diamond - DIAMOND v2.1.4

Published by bbuchfink over 1 year ago

  • Leading spaces are now trimmed and tabulator characters escaped as \t in sequence titles, and a warning message is produced.
  • Blank sequence titles are now replaced by N/A, and a warning message is produced.
  • Fixed a bug that could cause a Traceback error in certain cases.
  • Fixed a bug that caused the qlen and score output fields not to be reported correctly for the realign workflow.
  • Added an error message when using unsupported output fields for the realign workflow.
  • Fixed an issue that could cause a Missing fields in input line error when clustering.
  • Optimized the performance of the linclust workflow.
  • Reduced the memory use of the clustering workflow.
  • Fixed a bug that caused using standard input as the query not to work.
diamond - DIAMOND v2.1.3

Published by bbuchfink over 1 year ago

  • Fixed compiler errors for GCC 4.8.
  • Fixed a GCC compiler error.
  • Fixed a segfault issue occuring when compiled using GCC 12 on ARM64 systems.
  • Fixed an issue that caused missing support for AVX2.
diamond - DIAMOND v2.1.2

Published by bbuchfink over 1 year ago

  • The iterated search mode (option --iterate) now uses a linear-time feature as the first search round.
  • Added the linclust command to cluster using only a single linear-time search round.
  • Fixed compiler errors on macOS.
  • Fixed a bug that caused invalid alignment traceback output for the DAA view workflow.
  • Added the merge-daa workflow to merge DAA files.
  • Fixed an error when using the --max-target-seqs/-k option for the DAA view workflow.
  • Removed AVX2 support from the Windows release binary to ensure compatibility with older systems.
  • Permitted the --ignore-warnings option for the cluster and deepclust workflows.
  • Use unlinked temporary files for database blocks in clustering workflows.
  • Fixed a bug that could cause invalid results when using a clustering step with linearization as the final round in combination with database processing in multiple super blocks.
  • The --lin-stage1 option can now be used without compilation using the -DEXTRA=ON cmake option.
  • Added the option to specify the _lin suffix for sensitivity keywords for the --iterate option to activate linear-time feature.
  • Added the option --linsearch to activate linear-time feature for the search workflows.
  • Fixed a bug that caused the ppos and positive output fields not to work for the realign workflow.
  • Fixed an issue that caused motif masking not to work when compiled with link time optimization.
diamond - DIAMOND v2.1.1

Published by bbuchfink over 1 year ago

  • Fixed compilation errors on non-x86 systems and for the clang compiler.
  • Fixed an error message when running the recluster workflow.
  • Fixed a bug that could cause an invalid varint encoding error when using the DAA format.
  • Fixed a bug that could cause corrupted DAA output.
  • Fixed a bug that caused an error in the view workflow.
  • Adjusted the hit culling heuristic of the frameshift alignment mode to be less aggressive.
diamond - DIAMOND v2.1.0

Published by bbuchfink over 1 year ago

  • Added the cluster workflow to cluster protein sequences.
  • Added the realign workflow to generate clustering output.
  • Added the recluster workflow to correct errors in clusterings.
  • Added the reassign workflow to reassign cluster members to their closest centroid.
  • Added the option -M/--memory-limit to set a memory limit for clustering workflows.
  • Added the --approx-id option to filter alignments by approximate sequence identity and to set an approximate sequence identity threshold for clustering.
  • Added the --member-cover option to set the coverage threshold of the cluster member sequence.
  • Added the --cluster-steps option to set steps for cascaded clustering.
  • Added the --clusters option to specify clustering input file.
  • The blastx mode will now mask any open reading frame below the minimum required length as specified by --min-orf.
  • The blastx mode will only count unmasked letters towards the block size.
  • Fixed a bug that caused an error when using the global ranking mode.
  • Added the fast mode as the first round in iterative searches.
  • Fixed a bug that caused the program not to function on systems without support for SSE4.1.
  • Improved multi-threaded load balancing of gapped extension computations.
  • Improved performance of seed extension stage when HSP filter settings are used.
  • Added the option --soft-masking with possible values 0 and tantan to permit soft-masking using the tantan algorithm.
  • Fixed a bug that could cause an inflate error in multiprocessing mode.
  • Added the option --swipe to compute full Smith Waterman alignments of all queries against all targets.
  • Added the sensitivity mode --faster.
  • Added the output fields approx_pident and corrected_bitscore to the tabular format.
  • Added the --lin-stage1 option to linearize comparisons in the seeding stage by only considering hits against the longest query sequence for identical seeds (only supported when compiled with -DEXTRA=ON).
  • Added the --kmer-ranking option to rank sequences when --lin-stage1 is used (only supported when compiled with -DKEEP_TARGET_ID=ON).
  • Added the option --no-block-size-limit to deactivate upper limits for the block size when the --memory-limit option is used.
  • Added the greedy-vertex-cover workflow to compute clustering based on alignments.
  • Added the --edge-format option to set edge format for greedy vertex cover.
  • Added the --edges option to set input file for greedy vertex cover.
  • Added the --centroid-out option to output centroid sequences for greedy vertex cover.
  • Added the --unaligned-targets option to generate an output file of unaligned targets.
  • Fixed an issue that failed compilation using the Intel Compiler.
  • Fixed an issue that could cause a segmentation fault in rare cases.
  • The --header option can now be used with the parameter simple to enable simple headers for the tabular format, or without a parameter to enable headers for the clustering format.
  • Added the option --mp-self to optimize self-alignment in multiprocessing mode.
  • Added the option --query-or-subject-cover to report alignments if the query or the subject cover (or both) are above the given threshold.
  • Removed support for the --comp-based-stats 2 option (now equivalent to --comp-based-stats 3).
  • Removed hit culling in case of overlapping target ranges in frameshift alignment mode.
  • Added the option --anchored-swipe to activate anchored SWIPE extension.
diamond - DIAMOND v2.0.15

Published by bbuchfink over 2 years ago

  • Fixed a bug (present since v2.0.12) that caused the diamond view workflow to report a zero bit score for all alignments.
diamond - DIAMOND v2.0.14

Published by bbuchfink almost 3 years ago

  • Fixed a compiler error on Linux systems that do not define _SC_LEVEL3_CACHE_SIZE.
  • Fixed an error when using --unal 1 with the cigar output field.
  • Fixed an illegal instruction error on systems that did not support AVX2.
  • Fixed a bug (present since v2.0.12) that could cause an error or suboptimal alignments when HSP filter settings were used.
diamond - DIAMOND v2.0.13

Published by bbuchfink almost 3 years ago

  • Fixed a bug that caused invalid bit scores in frameshift alignment mode.
diamond - DIAMOND v2.0.12

Published by bbuchfink about 3 years ago

  • Fixed an error when using HSP filter settings together with a BLAST database.
  • Optimized the performance of alignment traceback.
  • A non-default setting of --max-hsps will now recompute a full-matrix Smith Waterman alignment with the ranges of the known HSPs masked in the target.
  • A non-default setting for --max-hsps can now be used together with --ext full.
  • The sensitivity levels used for iterated searches can now be manually set by using a space-separated list after the --iterate option.
  • Seeds are masked based on complexity instead of frequency by default.
  • Added the option --seed-cut to set a complexity cutoff for indexed seeds.
  • Added the option --freq-masking to enable masking seeds based on frequency.
  • The fast, default, mid-sensitive and sensitive modes will by default softmask a fixed set of highly abundant sequence motifs.
  • Added the option --motif-masking (0,1) to enable or disable motif masking.
  • Added the option --masking seg to enable SEG masking of target sequences (BLAST default) instead of tantan masking.
  • Fixed a bug that caused the full_sseq output field to contain invalid information or to produce an error when using a BLAST database.
  • Changed composition based statistics to use BLOSUM62 background frequencies.
  • Fixed the zstd dependency in the Dockerfile.
  • Added support for gap letters in BLAST databases.
  • Fixed a bug that caused the --custom-matrix option not to function correctly.
  • Changed the overlap for merging adjoining bands to >0.0.
  • Use more moderate filtering of HSPs in the chaining stage.
diamond - DIAMOND v2.0.11

Published by bbuchfink over 3 years ago

  • Fixed a bug that could cause invalid output when using --masking 0 combined with the global ranking mode.
  • Enabled lazy repeat masking in the query-indexed and contiguous seed modes when using global ranking.
  • Added detection of cache size to auto-enable query-indexed mode.
diamond - DIAMOND v2.0.10

Published by bbuchfink over 3 years ago

  • Using BLAST databases now requires a preprocessing step using the command prepdb. The command line is: diamond prepdb -d /path/to/database. This call runs quickly and will write some small auxiliary files into the database directory.
  • Improved performance of searching small query files.
  • Added the "iterative" search mode (option --iterate) to search the query dataset with increasing sensitivity, only searching queries at the target sensitivity that do not produce a significant alignment at a lower sensitivity search. For example, using --sensitive --iterate will first search the query file at default sensitivity, and search all query sequences again in --sensitive mode that fail to align in the first round.
  • Added the "global ranking" mode (option -g) to set a limit on the number of Smith Waterman extensions per query, with the target sequences ranked by their ungapped extension scores.
  • Added the --fast sensitivity mode that is faster and less sensitive than the default mode.
  • Reduced the time for loading target sequences from BLAST databases.
  • Added the contiguous-seed mode (option --algo ctg) to improve performance for small query files.
  • Added support for using --comp-based-stats (3,4) in combination with --ext full.
  • Fixed a bug that could cause a Traceback error when using --comp-based-stats (3,4) in rare cases.
  • Changed the full_sseq output field to always contain unmasked sequences.
  • Fixed an issue that could cause target output order to be nondeterministic in case of identically scoring hits.
  • Added support for reading zstd-compressed input files (auto-detected) and writing zstd-compressed output files (option --compress zstd) (requires compilation using cmake -DWITH_ZSTD=ON).
  • Compilation with BLAST database support requires the zstd library.
  • Added error message when reading protein sequences from FASTA files that only contain DNA letters (can be disabled using --ignore-warnings).
diamond - DIAMOND v2.0.9

Published by bbuchfink over 3 years ago

  • Reduced the memory use of database building with taxonomy mapping.
  • Removed the limitation of sequence accession length.
  • Fixed a bug that could cause using a BLAST database not to function correctly.
  • Added support for using BLAST alias databases (created by blastdb_aliastool).
  • Reduced the memory use of the seed hit sorting stage.
  • Improved the consistency of results when running in query-indexed mode (--algo 1).
  • Added the option --skip-missing-seqids to ignore cases of missing sequences
    in the database when using the --seqidlist option.
  • The --min-orf parameter now defaults to 1 in frameshift alignment mode.
  • Added support for using BLAST databases to the Docker container.
diamond - DIAMOND v2.0.8

Published by bbuchfink over 3 years ago

  • Added support for directly using BLAST database files instead of the Diamond-formatted .dmnd database files. This feature is not yet available through all release channels. It can currently be accessed by downloading the GitHub release version or by compiling from source. Taxonomy features are not yet supported for BLAST databases.
  • Added the option --seqidlist to filter the database by sequence accession (only supported for BLAST databases).
  • Fixed a bug that caused the --dbsize option not to function correctly.
  • Added the command makeidx and the option --target-indexed that provide an optimisation specialized for small databases (<10 Mb). (see: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics#small-database-optimization)
  • Added the option --mp-recover to recover aborted runs in multiprocessing mode.
diamond - DIAMOND v2.0.7

Published by bbuchfink over 3 years ago

  • Added support for computing full-matrix instead of banded Smith Waterman extensions (command line option --ext full).
  • Added support for the new prot.accession2taxid.FULL.gz taxonomy mapping file from NCBI.
  • Added the option --gapped-filter-evalue to set the e-value threshold of the gapped filter heuristic.
  • Added setting the scores of the mask letter according to BLAST rules when a compositionally adjusted matrix is used.
  • Changed formatting of e-values to print two decimals instead of one.
  • Added the output field qseq_translated to print the translation of the aligned part of the query sequence.
  • Added support for providing two input files to --query/-q when running alignment in blastx mode.
  • Added the output field full_qseq_mate to print the sequence of the query's mate (enabled when using two query files in blastx mode).
  • Fixed a bug that could cause a crash in blastx mode for very long queries.
diamond - DIAMOND v2.0.6

Published by bbuchfink almost 4 years ago

  • Changed the computation of expected values to use the method described in Park, Y., Sheetlin, S., Ma, N. et al. New finite-size correction for local alignment score distributions. BMC Res Notes 5, 286 (2012).
  • Enabled the use of a custom scoring matrix without having to specify the statistical parameters (option --custom-matrix).
  • Added support for compositional matrix adjust as described in Yi-Kuo Yu, Stephen F. Altschul, The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions, Bioinformatics, Volume 21, Issue 7, 1 April 2005, Pages 902–911. Three additional modes have been added that can be enabled by setting --comp-based-stats (2,3,4) (the feature is not enabled by default and does not support translated searches at the moment).
  • Fixed a bug that could cause incorrect alignment coordinates, gaps counts and sequence identities being reported by diamond view.
  • Targets are sorted by bit score instead of e-value in the alignment output when the --top parameter is used.
  • Disabled support of custom scoring matrices for the DAA format.
  • Fixed a bug that caused the use of a custom scoring matrix not to function correctly.
  • Fixed an issue that caused the portable binary not to function on systems that did not support AVX.
  • Added the option --no-unlink to prevent unlinking of temporary files.