rasusa

Randomly subsample sequencing reads or alignments

MIT License

Downloads
8.9K
Stars
206
Committers
3

Bot releases are visible (Hide)

rasusa - v2.1.0 Latest Release

Published by mbhall88 2 months ago

2.1.0 (2024-08-19)

Features

  • [aln] add program (@PG) entry to header (0123e54)
  • log seed used when --seed not passed (6e1f37d)
rasusa - v2.0.0

Published by mbhall88 6 months ago

2.0.0 (2024-05-03)

⚠ BREAKING CHANGES

  • paired reads require --output once for each file

Bug Fixes

  • paired reads require --output once for each file (1427a0b)
rasusa - v1.0.0

Published by mbhall88 6 months ago

1.0.0 (2024-04-29)

⚠ BREAKING CHANGES

  • move fastq functionality to reads subcommand

Features

  • add cite command to get citation (db17612)
  • add subcommand aln to subsample alignments (b92979a)
  • move fastq functionality to reads subcommand (f48d47b)

Bug Fixes

  • deal with chromosomes with no alignments (14aa15e)
rasusa - v0.8.0

Published by mbhall88 10 months ago

0.8.0 (2024-01-03)

Features

  • add logging message with coverage of input before downsampling (79445fc)
  • support ztsd (cfa50f8)
  • use default compression level for compression output type (cfa50f8)

Bug Fixes

  • update logging so colour not sent to file (bc62c3f)
rasusa - 0.7.1

Published by github-actions[bot] over 1 year ago

Added

  • Install script and support for more binary triple targets

Changed

  • Updated needletail dependecy due to dependency deprecation
rasusa - 0.7.0

Published by github-actions[bot] over 2 years ago

Added

  • Fraction (--frac) and number (--num) options. This allows users to replicate the
    functionality of seqtk sample [#34]
rasusa - 0.6.1

Published by github-actions[bot] over 2 years ago

Added

  • Warning if the actual coverage of the file(s) is less than the requested coverage
    [#36]
  • JOSS manuscript

Changed

  • Use rasusa as the entry command for docker container [#35]
rasusa - 0.6.0

Published by github-actions[bot] about 3 years ago

Addedd

  • --bases option to allow for manually setting the target number of bases to keep
    [#30]
  • --genome-size can now take a FASTA/Q index file and the sum of all reference
    sequences will be used as the genome size [#31]
rasusa - 0.5.0

Published by github-actions[bot] about 3 years ago

Added

  • Support for LZMA, Bzip, and Gzip output compression (thanks to niffler). This is either inferred from the file extension or manually via the -O option.
  • Option to specify the compression level for the output via -l

Changed

  • Use a Vec<bool> instead of HashSet to store the indices of reads to keep. This gives a nice little speedup (see #28), A big thank you to @natir

Fixed

  • Restore compression of output files [#27]
rasusa - 0.4.2

Published by github-actions[bot] about 3 years ago

Fixed

  • I had stupidly forgetten to merge the fix for #22 onto master
rasusa - 0.4.1

Published by github-actions[bot] about 3 years ago

Fixes

  • Releasing cross-compiled binaries didn't work for version 0.4.0
  • Docker image is now correctly built
rasusa - 0.4.0

Published by github-actions[bot] about 3 years ago

Changed

  • Switch from using snafu and failure for error handling to anyhow and thiserror. Based on the procedure outlined in this excellent blog post.
  • Switched fasta/q parsing to use needletail
    instead of rust-bio. See benchmark for improvement in runtimes.
  • Changed the way Illumina paired reads are subsampled. Previously, there was an
    assumption made that the reads of a pair were both the same length as the R1 read. We
    are now more careful and look at each read's length individually [#22]
  • Moved container hosting to quay.io
rasusa - 0.3.0

Published by mbhall88 over 4 years ago

0.3.0

Version 0.3.0 may give different results to previous versions. If so,
the differences will likely be a handful of extra reads (possibly none).
The reason for this is --coverage is now treated as a float.
Previously we immediately round coverage down to the nearest integer. As
the number of reads to keep is based on the target total number of
bases, which is coverage * genome size. So if coverage is 10.7 and
genome size is 100, previously our target number of bases would have
been 1000, whereas now, it would be 1070.

Changed

  • --coverage is now treated as a f32 instead of being converted
    immediately to an integer #19.
  • Updated rust-bio to version 0.31.0. This means rasusa now handles
    wrapped fastq files.
  • Preallocate fastx records instead of using iterator. Gives marginal
    speedup.
  • Added bash to the docker image b47a8b75943098bdd845b7758cf2eab01ef5a3d8
rasusa - 0.2.0

Published by mbhall88 over 4 years ago

New

  • Support paired Illumina #15
rasusa - https://github.com/mbhall88/rasusa/releases/tag/0.1.0

Published by mbhall88 almost 5 years ago