seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

OTHER License

Stars
389

Bot releases are hidden (Show)

seqan3 - SeqAn 3.4.0-rc.1 Latest Release

Published by eseiler 9 months ago

This is the first release candidate for SeqAn 3.4.0

SeqAn 3.4.0 offers support for GCC 11/12/13, and new clang 17

Changelog

seqan3 - SeqAn 3.3.0

Published by eseiler about 1 year ago

seqan3 - SeqAn 3.3.0-rc.2

Published by eseiler about 1 year ago

This is the second release candidate for SeqAn 3.3.0

Changelog

seqan3 - SeqAn 3.3.0-rc.1

Published by eseiler over 1 year ago

This is the first release candidate for SeqAn 3.3.0

Changelog

seqan3 - SeqAn 3.2.0

Published by eseiler over 2 years ago

GitHub commits since tagged version (branch)

We are excited to announce the SeqAn 3.2.0 release that has a major compiler update:

As a consequence, we also dropped C++17 support and are fully C++20 compatible.

Furthermore, we could drop the range-v3 dependency completely, so be sure to delete the submodule after updating to SeqAn 3.2.0!

While we will present essential changes of the 3.2.0 release in this message, you can also find a comprehensive list of the changes in our changelog.

Get to know SeqAn3 with our tutorials.

πŸŽ‰ New Features

We added GCC 12 support!

Alphabet

  • seqan3::cigar can now be assigned from std::string_view which is much faster (no allocations) than the former assignment from seqan3::small_string.
  • The new view seqan3::views::char_strictly_to behaves like seqan3::views::char_to, but throws on invalid input.

I/O

  • The new option seqan3::sequence_file_option::fasta_ignore_blanks_before_id lets you keep blanks before IDs when reading FASTA files.
    This ensures a "perfekt roundtrip" when reading and writing FASTA files.
    E.g., fasta_ignore_blanks_before_id = true (default): >Β Β Β Β some_id will only store "some_id" as ID.
    E.g., fasta_ignore_blanks_before_id = false: >Β Β Β Β some_id will store "Β Β Β Β some_id" as ID.

Search

  • Improved performance of seqan3::counting_vector::operator+= by 25%.

Utility

  • Added seqan3::list_traits::repeat.

πŸ› οΈ Notable API changes

seqan3::views::to is no view anymore but C++23 conform

We replaced seqan3::views::to (implemented via range-v3) with seqan3::ranges::to (implemented in SeqAn3). seqan3::ranges::to provides a subset of C++23's std::ranges::to and will be replaced with the STL-equivalent in a future version.
Since it is not a view anymore, it cannot be properly deprecated. Please keep this in mind if you encounter errors with seqan3::views::to.

Example:

auto vec = std::views::iota(0, 10) | seqan3::views::to<std::vector>; // Before
auto vec = std::views::iota(0, 10) | seqan3::ranges::to<std::vector>(); // After

Removed several headers in seqan3/std/

All headers in seqan3/std/ except charconv and new have been deprecated, since their STL-equivalents are available in GCC >= 10. Please use the equivalent std includes.

Example:

// Before
#include <seqan3/std/bit>
#include <seqan3/std/charconv>
// After
#include <bit>
#include <seqan3/std/charconv>

Removed namespace std::cpp20

The namespace std::cpp20 has been deprecated, since, e.g., the implementation of the std::back_inserter among others are now C++20-conform with GCC >= 10. Please use the std:: namespace.

Example:

std::ranges::copy(range, std::cpp20::back_inserter(other_range)); // Before
std::ranges::copy(range, std::back_inserter(other_range)); // After

πŸ› Notable bug fixes

Core

  • Added missing implementations for AVX512 .

IO

  • FASTA files containing IDs starting with >, e.g., > >MyID, are now parsed correctly.

Search

  • Relaxed kmer_hash_view::iterator difference requirement .
  • Relaxed seqan3::views::minimiser requirements to be C++20-compatible.
  • Relaxed seqan3::views::kmer_hash requirements to be C++20-compatible.

Utility

  • seqan3::views::single_pass_input cannot propagate the std::ranges::output_range property because it cannot satisfy the following requirement:
    *it++ = value;
    // must be the same as
    *it = value; ++it;
    // but it actually would be the same as
    ++it; *it = value;
    
  • Fixed signature of seqan3::detail::store_sse4. This might have affected some public API.
  • Relaxed seqan3::views::to_simd requirements to be C++20-compatible.

πŸ”Œ External dependencies

  • GCC 7, 8, and 9 have been removed.
  • SeqAn 3.2.0 is known to compile with GCC 10.3, 11.3, and 12.1. Future versions might work, but were not yet available at the time of this release.
  • Other compilers, e.g., clang, and MSVC, are known to not be compatible with SeqAn 3.2.0.
  • We removed range-v3, and now require cereal 1.3.2 as well as sdsl-lite 3.0.1.
  • We use doxygen 1.9.4 to build our documentation.
seqan3 - SeqAn 3.2.0-rc.1

Published by eseiler over 2 years ago

This is the first release candidate for SeqAn 3.2.0

You can find a list of changes in our changelog.

Notable API-changes (not final)

  • We removed the std::cpp20 namespace. We use std instead, since all supported compilers provide the C++20 versions of the entities we used with std::cpp20.
  • We replaced seqan3::views::to with seqan3::range::to. As planned for C++23, it is no longer a view, but a function object. seqan3::views::to<std::vector> changes to seqan3::range::to<std::vector>().

Dependencies

  • cereal bumped to 1.3.2
  • sdsl-lite bumped to 3.0.1
  • range-v3 removed
  • For documentation: doxygen bumped to 1.9.4
seqan3 - SeqAn 3.1.0

Published by eseiler almost 3 years ago

GitHub commits since tagged version (branch)

We are thrilled to announce the SeqAn 3.1.0 release with the first stable SeqAn module: Alphabet

The Alphabet module is the foundation of SeqAn, thoroughly used and tested in other modules as well as external projects (e.g. raptor). We are confident to release the code with few exceptions as the first stable module of SeqAn3.
Check out notes about our API stability or the API documentation for more details.

In the upcoming future we strive to let the other modules follow suit.

While we will present essential changes of the 3.1.0 in this message, you can also find a comprehensive list of the changes in our changelog.

Get to know SeqAn3 with our tutorials.

πŸ”’ API Stability

With few exceptions, the Alphabet module of SeqAn3 is now stable. Additionally, the following entities are marked stable in this release:

While 3.0.3 only deprecated entities, the affected items are removed in 3.1.0!

If you are upgrading from an older version than 3.0.3, we strongly recommend first upgrading to 3.0.3, and afterwards to 3.1.0. The version 3.0.3 contains deprecation notices that will help you to transition with ease.

πŸ› οΈ Notable API changes

There are no API changes upgrading from 3.0.3. See the 3.0.3 release message for an overview of previous changes.

πŸ› Notable bug fixes

πŸ”Œ External dependencies

  • SeqAn 3.1.0 is known to compile with GCC 7.5, 8.4, 9.4, 10.3, and 11.2. Future versions might work, but were not yet available at the time of this release.
  • Other compilers, e.g. GCC 12 (currently in development), clang, and MSVC, are known to not be compatible with SeqAn 3.1.0.
  • We support ranges-v3 versions β‰₯ 0.11.0 and < 0.12.0, sdsl-lite 3.0.0, and cereal 1.3.0.
  • We use doxygen 1.9.2 to build our documentation.
seqan3 - SeqAn 3.1.0-rc.1

Published by eseiler about 3 years ago

This is the first release candidate for SeqAn 3.1.0

You can find a list of changes in our changelog.

seqan3 - SeqAn 3.0.3

Published by eseiler over 3 years ago

GitHub commits since tagged version (branch)

We are proud to announce the last 3.0.x Release of the SeqAn Library.

This is a rather big release for us, encompassing 923 commits and changing 1750 files. This sums up to over 75,000 insertions and more than 50,000 deletions.
Despite this huge diff, we have good reasons to believe that upgrading from 3.0.2 will be extremely smooth and easy.

You can find a comprehensive list of the changes in our changelog.

Note that 3.1.0 will be the first API stable release and interfaces in this release might still change.

Our final push to 3.1.0

For 3.0.3, we heavily focused on making the final push towards 3.1.0 and took our time to really define the scope of our Sequence Analysis Library.

That means the next Release (3.1.0) will be the first stable one, exciting right? πŸ₯³

πŸ“– Module structure

We now utilise the following module structure.

Sequence Analysis Modules

  • #include <seqan3/alphabet/*.hpp>:
    Contains alphabet related entities, like seqan3::dna4.
  • #include <seqan3/alignment/*.hpp>:
    Contains sequence alignment related entities, like seqan3::align_pairwise.
  • #include <seqan3/io/*.hpp>:
    Contains I/O related entities, like seqan3::sequence_file_input.
  • #include <seqan3/search/*.hpp>:
    Contains search related entities, like seqan3::search.

We also moved containers and views from the former range module into related Sequence Analysis Module (e.g., seqan3::bitpacked_sequence and seqan3::translate are now part of the alphabet module).

All Sequence Analysis Modules will have strong API Stability guarantees. So far we finished declaring the stability in our documentation for alphabets, containers, and ranges. Sequence Alignment and I/O will follow in the next release.

General Purpose Modules

  • #include <seqan3/argument_parser/*.hpp>:
    Our argument parser library that helps to write apps.
  • #include <seqan3/range/*.hpp> (split since 3.0.3):
    We moved most entities within this module into the other modules: alphabet, alignment, io, search or utility. But, we also deprecated some of them that did not fit our scope anymore.
  • #include <seqan3/core/*.hpp>:
    Our internal module for entities that are shared across our sequence analysis modules and are needed to implement our own algorithms and data structures.
  • #include <seqan3/utility/*.hpp>:
    utility contains entities that are completely unrelated to biology-problems. We think of these as something that could be made separate libraries.

All General Purpose Modules will have no API Stability guarantees.

πŸ”’ API Stability

We are especially thrilled to announce 3.0.3, because this release should be the first one that just compilesℒ️ your app when upgrading from 3.0.2 to 3.0.3. Arguably, you will encounter scattered "deprecation" notices, but all of those messages should point you to an upgrade path. Unless you treat warnings as errors, your app will still compile even when encountering deprecation notices.

Furthermore, we see this release as a test run of our way to handle API Stability in the upcoming era of stable releases. We are happy to hear your feedback, so please let us know whether it worked for you and if something could be improved. We believe we found a good system for hinting changes between releases to our users.

But how do we know that we did not miss anything? Our idea is simple: compile our current SeqAn version (3.0.3 in this case) against the tests of our previous Release (3.0.2). If everything compiles and the previous tests pass, we most likely did not break our API.

πŸŽ‰ Notable new features

  • A new Phred Quality Score alphabet seqan3::phred94 that represents the full Phred Score range (Sanger format) and is used for PacBio Phred scores of HiFi reads.

  • A new seqan3::literals namespace, which can be used in lieu of importing individual literal operators.

    #include <seqan3/alphabet/nucleotide/dna4.hpp>
    
    int main()
    {
        using namespace seqan3::literals; // Still works: using seqan3::operator''_dna4;
    
        seqan3::dna4 adenine = 'A'_dna4;
    }
    
  • Records of I/O files have member functions.

    #include <seqan3/core/debug_stream.hpp>
    #include <seqan3/io/sequence_file/input.hpp>
    
    int main()
    {
        seqan3::sequence_file_input fin{"my.fastq"};
    
        for (auto && record: fin)
        {
            seqan3::debug_stream << "id: " << record.id() << '\n';
            seqan3::debug_stream << "sequence: " << record.sequence() << '\n';
            seqan3::debug_stream << "base_qualities: " << record.base_qualities() << '\n';
        }
    }
    

:trollface: Notable breaking changes (API)

  • Starting with 3.0.3, seqan3::seqan3_version is a number and equivalent to the SEQAN3_VERSION macro. Consequently, seqan3::seqan3_version_cstring is the C-String (char const *) which was named seqan3::seqan3_version in the previous release (and was a std::string).
  • The meanings of seqan3::alphabet_variant::{is_alternative, holds_alternative} have been swapped.

πŸ› οΈ Notable API changes

  • Entities have been renamed, a short but incomplete excerpt
    • seqan3::phred68legacy to seqan3::phred68solexa
    • seqan3::sam_dna16 to seqan3::dna16sam
    • seqan3::bitcompressed_vector to seqan3::bitpacked_sequence
    • seqan3::alignment_file_* to seqan3::sam_file_*
    • multiple UPPER_CASE to lower_case names, mostly in enums
  • Accessing an I/O file record by seqan3::get, e.g., seqan3::get<seqan3::field::id>(record), has been deprecated in favour of the new member accessors, e.g., record.id().

πŸ› Notable bug fixes

  • A couple of fixes in the argument parser.
  • Fixed a nasty issue when combining seqan3::views::kmer_hash and std::views::reverse.
  • Fixed an issue with compressing .gz files with the BGZF compression algorithm.
  • Various fixes to our SAM/BAM file implementation.

πŸ”Œ External dependencies

  • SeqAn 3.0.3 is known to compile with GCC 7.5, 8.4, 9.3, 10.3, and 11.1. Future versions (such as GCC 11.2 and 12) might work, but were not yet available at the time of this release.
  • We support ranges-v3 versions β‰₯ 0.11.0 and < 0.12.0.
  • We use doxygen 1.9.1 to build our documentation.
seqan3 - SeqAn 3.0.3-rc.1

Published by eseiler over 3 years ago

This is the first release candidate for SeqAn 3.0.3

You can find a list of changes in our changelog.

seqan3 - SeqAn 3.0.2

Published by marehr about 4 years ago

GitHub commits since tagged version (branch)

Despite all circumstances, we are excited to present a new update of our SeqAn library. We present some great new features and also a lot of usability improvements. Among others, this release will fully comply with the final C++-20 standard.

⚠️ In this release we harmonised the algorithm configurations for a better user experience. This, much like 2020, will break a lot of code. But rest assured that the changes are easy to apply and are worth every bit. πŸ˜„

You can find a comprehensive list of the changes in our changelog.

Note that 3.1.0 will be the first API stable release and interfaces in this release might still change.

πŸŽ‰ Notable new features

  • We added the seqan3::views::minimiser and seqan3::views::minimiser_hash views to compute the minimum in a shifted window and apply hashing, respectively.

  • The seqan3::search_cfg::hit configuration can now be set dynamically.

    #include <vector>
    
    #include <seqan3/alphabet/nucleotide/dna4.hpp>
    #include <seqan3/core/debug_stream.hpp>
    #include <seqan3/search/configuration/max_error.hpp>
    #include <seqan3/search/configuration/hit.hpp>
    #include <seqan3/search/fm_index/fm_index.hpp>
    #include <seqan3/search/search.hpp>
    
    int main()
    {
        using seqan3::operator""_dna4;
    
        std::vector<seqan3::dna4_vector> text{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
                                            "ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
                                            "GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
        seqan3::dna4_vector query{"GCT"_dna4};
        seqan3::fm_index index{text};
    
        // Use the dynamic hit configuration to set hit_all_best mode.
        seqan3::configuration search_config = seqan3::search_cfg::max_error_total{seqan3::search_cfg::error_count{1}} |
                                            seqan3::search_cfg::hit{seqan3::search_cfg::hit_all_best{}};
    
        seqan3::debug_stream << "All single best hits:\n";
        for (auto && hit : search(query, index, search_config)) // Find all best hits:
            seqan3::debug_stream << hit << '\n';
    
        // Change the hit configuration to the strata mode with a stratum of 1.
        using seqan3::get;
        get<seqan3::search_cfg::hit>(search_config).hit_variant = seqan3::search_cfg::hit_strata{1};
    
        seqan3::debug_stream << "\nAll x+1 hits:\n";
        for (auto && hit : search(query, index, search_config)) // Find all strata hits.
            seqan3::debug_stream << hit << '\n';
    }
    
  • The return type of the search algorithm was adapted to use a lazy result range over the found hits during the search and is now independent of the used FM-index type.

    #include <vector>
    
    #include <seqan3/alphabet/nucleotide/dna4.hpp>
    #include <seqan3/core/debug_stream.hpp>
    #include <seqan3/search/configuration/max_error.hpp>
    #include <seqan3/search/configuration/hit.hpp>
    #include <seqan3/search/search.hpp>
    #include <seqan3/search/fm_index/fm_index.hpp>
    
    int main()
    {
        using seqan3::operator""_dna4;
    
        std::vector<seqan3::dna4_vector> text{"CGCTGTCTGAAGGATGAGTGTCAGCCAGTGTA"_dna4,
                                            "ACCCGATGAGCTACCCAGTAGTCGAACTG"_dna4,
                                            "GGCCAGACAACCCGGCGCTAATGCACTCA"_dna4};
        seqan3::dna4_vector query{"GCT"_dna4};
    
        seqan3::configuration const search_config = seqan3::search_cfg::max_error_total{seqan3::search_cfg::error_count{1}} |
                                                    seqan3::search_cfg::hit_all_best{};
    
        // Always provide a unified interface over the found hits independent of the index its text layout.
        seqan3::debug_stream << "Search in text collection:\n";
        seqan3::fm_index index_collection{text};
        for (auto && hit : search(query, index_collection, search_config)) // Over a text collection.
            seqan3::debug_stream << hit << '\n';
    
        seqan3::debug_stream << "\nSearch in single text:\n";
        seqan3::fm_index index_single{text[0]};
        for (auto && hit : search(query, index_single, search_config)) // Over a single text.
            seqan3::debug_stream << hit << '\n';
    }
    
  • We added a data structure called interleaved Bloom filter, which can answer set-membership queries efficiently.

  • The pairwise alignment can now be configured with a user-defined callback, which is called for every computed alignment
    result instead of returning a lazy range over the alignment results.

    #include <mutex>
    #include <vector>
    
    #include <seqan3/alignment/configuration/align_config_edit.hpp>
    #include <seqan3/alignment/configuration/align_config_method.hpp>
    #include <seqan3/alignment/configuration/align_config_on_result.hpp>
    #include <seqan3/alignment/configuration/align_config_parallel.hpp>
    #include <seqan3/alignment/pairwise/align_pairwise.hpp>
    #include <seqan3/alphabet/nucleotide/dna4.hpp>
    #include <seqan3/core/debug_stream.hpp>
    
    int main()
    {
    
        // Generate some sequences.
        using seqan3::operator""_dna4;
        using sequence_pair_t = std::pair<seqan3::dna4_vector, seqan3::dna4_vector>;
        std::vector<sequence_pair_t> sequences{100, {"AGTGCTACG"_dna4, "ACGTGCGACTAG"_dna4}};
    
        std::mutex write_to_debug_stream{}; // Need mutex to synchronise the output.
    
        // Use edit distance with 4 threads.
        auto const alignment_config = seqan3::align_cfg::method_global{} |
                                      seqan3::align_cfg::edit_scheme |
                                      seqan3::align_cfg::parallel{4} |
                                      seqan3::align_cfg::on_result{[&] (auto && result)
                                                    {
                                                        std::lock_guard sync{write_to_debug_stream}; // critical section
                                                        seqan3::debug_stream << result << '\n';
                                                    }};
    
        // Compute the alignments in parallel, and output them unordered using the callback (order is not deterministic).
        seqan3::align_pairwise(sequences, alignment_config);  // seqan3::align_pairwise is now declared void.
    }
    

:trollface: Notable API changes

  • The alignment and search configurations have been refactored and improved.
  • Some type traits and concepts have been added to the seqan3/std module complying with the C++-20 standard.

πŸ› Notable bug fixes

  • FM-index based search produces now the correct results when using quality sequences.
  • The parallel search was fixed. So no time for β˜• here, sorry.
  • Fixed an issue with spawning too many threads in parallel pairwise alignment.
  • Various fixes to make our views and ranges comply with the C++-20 standard.

πŸ”Œ External dependencies

  • SeqAn 3.0.2 is known to compile with GCC 7.5, 8.4, 9.3 and 10.2. Future versions (e.g. GCC 10.3 and 11) might work,
    but aren't yet available at the time of this release.
  • We now support ranges-v3 versions >= 0.11.0 and < 0.12.0, increasing the previous requirement of >= 0.10.0 and < 0.11.0.

Note: We changed our naming scheme of our source package from seqan-[VERSION]-with-submodules.tar.gz to seqan3-[VERSION]-Source.tar.xz. Please use the new package seqan3-[VERSION]-Source.tar.xz.

seqan3 - SeqAn 3.0.1

Published by smehringer over 4 years ago

GitHub commits since tagged version (branch)

We are excited to present a new update of our SeqAn library. This release has been in the making for roughly half a year now and we are proud to present some great new features and also a lot of improvements with respect to runtime and usability. You can find a comprehensive list of the changes in our changelog.

Note that 3.1.0 will be the first API stable release and interfaces in this release might still change.

πŸŽ‰ Notable new features

  • We added support for type erasing semialphabets that allows you to manage semialphabets with the same alphabet size in one container. This can have a big effect on your compile-time, in case you don't drink as much β˜• as we do.

  • We added parallel support for the alignment algorithm. You can now configure the number of threads you want to use for the alignment computation.

    #include <iostream>
    
    #include <seqan3/alphabet/nucleotide/dna4.hpp>
    #include <seqan3/alignment/pairwise/all.hpp>
    
    int main()
    {
        using seqan3::operator""_dna4;
    
        auto sequence1{"ACCA"_dna4};
        auto sequence2{"ATTA"_dna4};
    
        seqan3::configuration alignment_config = seqan3::align_cfg::edit |
                                                 seqan3::align_cfg::parallel{4};
    
        for (auto const & res : seqan3::align_pairwise(std::tie(sequence1, sequence2),
                                                       alignment_config))
            std::cout << "Score: " << res.score() << '\n';
    }
    
  • One to command them all: Our argument parser now supports subcommands, such as git pull. A How-to will guide you through setting this up for your tool.

  • The performance of the I/O was improved to allow faster file reading. Further, we added support for reading and writing the CIGAR string through alignment files.

  • We added several new ranges and views. Most notably, the seqan3::views::kmer_hash view, which transforms a sequence into a range of k-mer hashes efficiently. Another view of great practice is the seqan3::views::to, which can be used to convert a view into a container. We also added a seqan3::dynamic_bitset which is a dynamic version of the std::bitset.

  • Memory consumption of the (bidirectional) FM-Index for text collections was reduced by 10%.

:trollface: Notable API changes

As much as we'd like to reduce inconsistencies between releases, we are sometimes forced to change an interface either to improve usability or to follow changes made by the ISO C++ committee.

  • All our concepts are named in the snake_case style (e.g. seqan3::WritableAlphabet -> seqan3::writable_alphabet)!
  • The directory seqan3/range/view has been renamed to seqan3/range/views.
  • The namespace seqan3::view has been renamed to seqan3::views.
  • The CMake variable SEQAN3_VERSION_STRING defined by find_package(SEQAN3) was renamed to SEQAN3_VERSION.

You can find a comprehensive list of the changes in our changelog.

πŸ› Notable bug fixes

  • Copying and moving the seqan3::fm_index and seqan3::bi_fm_index now work properly.
  • The translation table for nucleotide to amino acid translation was corrected.
  • The amino acid score matrices were corrected.

πŸ”Œ External dependencies

  • We now support ranges-v3 versions >= 0.10.0 and < 0.11.0, increasing the previous requirement of >= 0.5.0 and < 0.6.0.
  • We now support cereal version 1.3.0, increasing the previous requirement of 1.2.2

Note: We changed our naming scheme of our source package from seqan-[VERSION]-with-submodules.tar.gz to seqan3-[VERSION]-Source.tar.xz. Please use the new package seqan3-[VERSION]-Source.tar.xz.

seqan3 - SeqAn 3.0.0 "Escala"

Published by h-2 over 5 years ago

This is the initial release of SeqAn3. It is an entirely new library so there is no changelog that covers the differences to SeqAn2.

Please see the release announcement:
https://www.seqan.de/announcing-seqan3/

See the porting guide for some help on porting:
http://docs.seqan.de/seqan/3-master-user/howto_porting.html

Note that 3.1.0 will be the first API stable release and interfaces in this release might still change.


Note: We changed our naming scheme of our source package from seqan-[VERSION]-with-submodules.tar.gz to seqan3-[VERSION]-Source.tar.xz. Please use the new package seqan3-[VERSION]-Source.tar.xz.