pyrodigal

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes. Now with SIMD!

GPL-3.0 License

Downloads
37.8K
Stars
139
Committers
4

Bot releases are visible (Hide)

pyrodigal - v2.0.4

Published by github-actions[bot] almost 2 years ago

Fixed

  • GC% computation and RBS scoring for reverse strand nodes close to the contig edge (#27).
pyrodigal - v2.0.3

Published by github-actions[bot] almost 2 years ago

Fixed

  • OrfFinder(mask=True) ignoring the minimum mask size when masking regions (#26).

Changed

  • Use cibuildhweel for building wheel distributions.

Added

  • Wheels for MacOS Aarch64 platforms.
pyrodigal - v2.0.2

Published by github-actions[bot] almost 2 years ago

Fixed

  • Syntax issue in Cython files failing build on Bioconda runner.
pyrodigal - v2.0.1

Published by github-actions[bot] almost 2 years ago

Fixed

  • Syntax issue in Cython files failing build on some environments.
pyrodigal - v2.0.0

Published by github-actions[bot] almost 2 years ago

Added

  • MMX implementation of the SIMD prefilter.
  • Proper GFF headers and metadata section to GFF output.
  • Sequence.gc_frame_plot method to compute the max GC frame profile from Python.
  • metagenomic_bin property to TrainingInfo to support recovering the object corresponding to a pre-trained model.
  • meta attribute to Genes to store whether genes were predicted in single or in meta mode.
  • pyrodigal.PRODIGAL_VERSION constant storing the wrapped Prodigal version.
  • pyrodigal.MIN_SINGLE_GENOME and pyrodigal.IDEAL_SINGLE_GENOME constants storing the minimum and recommended sequence sizes for training.

Changed

  • Make all write methods of Genes objects require a sequence_id argument instead of using the internal sequence number.
  • Rewrite SIMD prefilter using a generic template with C macros.
  • Make Mask record coordinates in start-inclusive end-exclusive mode to follow Python conventions.
  • Make connection scoring tests only score some randomly selected node pairs for faster runs.
  • Rewrite tests to use importlib.resources for managing test data.

Removed

  • from_bytes and from_string constructors of Sequence objects.

Fixed

  • Duplicate extraction of start codons located on contig edges inside Nodes._extract (#21).
  • Pickling and unpickling of TrainingInfo objects corresponding to pre-trained models.
  • Implementation of calc_most_gc_frame being inconsistent with the Prodigal implementation.
  • Implementation of the maximum search in score_connection_forward_start not following the (weird?) behaviour from Prodigal (#21).
  • Gene identifier being used instead of the sequence identifier in the GFF output (#18).
  • Out of bound access to sequence data in Sequence._shine_dalgarno_mm and Sequence._shine_dalgarno_exact.
pyrodigal - v2.0.0-rc.4

Published by github-actions[bot] almost 2 years ago

Changed

  • Make Mask record coordinates in start-inclusive end-exclusive mode to follow Python conventions.

Removed

  • from_bytes and from_string constructors of Sequence objects.
pyrodigal - v2.0.0-rc.3

Published by github-actions[bot] about 2 years ago

Added

  • Sequence.gc_frame_plot method to compute the max GC frame profile from Python.

Changed

  • Rewrite tests to use importlib.resources for managing test data.
  • Make connection scoring tests only score some randomly selected node pairs for faster runs.

Fixed

  • Duplicate extraction of start codons located on contig edges inside Nodes._extract.
pyrodigal - v2.0.0-rc.2

Published by github-actions[bot] about 2 years ago

Added

  • metagenomic_bin property to TrainingInfo to support recovering the object corresponding to a pre-trained model.
  • meta attribute to Genes to store whether genes were predicted in single or in meta mode.

Fixed

  • Pickling and unpickling of TrainingInfo objects corresponding to pre-trained models.
  • Implementation of calc_most_gc_frame being inconsistent with the Prodigal implementation.
  • Implementation of the maximum search in score_connection_forward_start not following the (weird?) behaviour from Prodigal.
pyrodigal - v2.0.0-rc.1

Published by github-actions[bot] about 2 years ago

Added

  • MMX implementation of the SIMD prefilter.
  • Propert GFF headers and metadata section to GFF output.

Fixed

  • Out of bound access to sequence data in Sequence._shine_dalgarno_mm and Sequence._shine_dalgarno_exact.
  • Gene identifier being used instead of the sequence identifier in the GFF output (#18).

Changed

  • Rewrite SIMD prefilter using a generic template with C macros.
  • Make all write methods of Genes objects require a sequence_id argument instead of using the internal sequence number.
pyrodigal - v1.1.2

Published by github-actions[bot] about 2 years ago

Changed

  • Use the vbicq Arm intrinsic in the NEON implementation to combine vandq and vmvnq.

Fixed

  • Prevent direct instantiation of Node and Gene objects from Python code.
  • Configuration of platform-specific NEON flags in setup.py not being applied to the linker.
pyrodigal - v1.1.1

Published by althonos over 2 years ago

Fixed

  • Some cpu_features source files not being included in source distribution.
pyrodigal - v1.1.0

Published by github-actions[bot] over 2 years ago

Changed

  • OrfFinder.train can now be given more than one sequence argument to train on contigs from an unclosed genome.
  • Updated cpu_features to v0.7.0 and added hardware detection of NEON features on Linux Aarch64 platforms.
pyrodigal - v1.0.2

Published by github-actions[bot] over 2 years ago

Fixed

  • Detection of Arm64 platform in setup.py (#16).
pyrodigal - v1.0.1

Published by github-actions[bot] over 2 years ago

Changed

  • pyrodigal.cli now concatenates training sequences the same way as Prodigal does.
pyrodigal - v1.0.0

Published by github-actions[bot] over 2 years ago

Stable version, to be published in the Journal of Open-Source Software.

Added

  • pickle protocol implementation for Nodes, TrainingInfo, OrfFinder, Sequence, Masks and Genes objects.
  • Buffer protocol implementation for Sequence, allowing access to raw digits.
  • __eq__ and __repr__ magic methods to Mask objects.

Changed

  • Optimized code used for region masking to avoid searching for the same mask repeatedly.
  • TRANSLATION_TABLES and METAGENOMIC_BINS are now exposed as constants in the top pyrodigal module.
  • Refactored connection scoring into different functions based on the type (start/stop) and strand (direct/reverse) of the node being scored.
  • Changed the growth factor for dynamic arrays to be the same as the one used in CPython list buffers.
pyrodigal - v0.7.3

Published by github-actions[bot] over 2 years ago

Added

  • Gene.score property to get the gene score as reported in the score data string.

Fixed

  • OrfFinder.find_genes not producing consistent results across runs in meta mode (#13).
  • OrfFinder.find_genes returning Nodes with incomplete score information.
pyrodigal - v0.7.2

Published by github-actions[bot] over 2 years ago

Changed

  • Improve performance of mer_ndx and score_connection using dedicated implementations with better branch prediction.
  • Mark arguments as const in C code where possible.

Fixed

  • Signatures of Cython classes not displaying properly because of the embedsignature directive.
  • _sequence.h functions not being inlined as expected.
pyrodigal - v0.7.1

Published by github-actions[bot] over 2 years ago

Changed

  • Rewrite internal Sequence code using inlined functions to increase performance when the strand is known.

Fixed

  • Nodes.copy potentially failing on empty collections after trying to allocate 0 bytes.
  • TestGenes.test_write_scores failing on some machines because of float rounding issues.
  • Gene.translate ignoring the unknown_residue argument value and always using "X".
  • Memory leak in Pyrodigal.train cause by memory not being freed after building the GC frame plot.
pyrodigal - v0.7.0

Published by github-actions[bot] over 2 years ago

Added

  • Support for setting a custom minimum gene length in pyrodigal.OrfFinder.
  • Genes.write_scores method to write the node scores to a file.
  • Gene.__repr__ and Node.__repr__ methods to display some useful attributes.
  • Sequence.__str__ method to get back a nucleotide string from a Sequence object.

Changed

  • Use a more compact data structure to store Gene data.

Fixed

  • Nodes._calc_orf_gc reading nucleotides after the sequence end when computing GC content for edge nodes.

Removed

  • pyrodigal.Pyrodigal class (use pyrodigal.OrfFinder instead).
  • pyrodigal.Predictions class (functionality merged into pyrodigal.Genes).
pyrodigal - 0.6.4

Published by althonos almost 3 years ago

Added

  • load and dump methods to TrainingInfo for storing and loading a raw training info structure.
  • Support for creating an OrfFinder pre-configured with a training info.
  • -t and -n flags to the CLI.