thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool

MIT License

Downloads
4.7K
Stars
8
Committers
5
thapbi-pict - THAPBI PICT v0.13.1

Published by peterjc about 2 years ago

Released on PyPI on 2022-09-21:

https://pypi.org/project/thapbi-pict/0.13.1/

Dramatically improves the time taken to import large FASTA files into the database, at the expense of higher memory usage.

Updated the default database with latest genus-level NCBI results, added a new Plasmopara sequence (OP326699.1), and multiple additional accessions to an existing Phytopythium sequence.

Caps the --cpu argument by the number of available CPUs under Linux, or number of CPUs if that information is not available.

The pipeline and prepare reads commands now accept .fq or .fq.gz extensions in addition to .fastq or .fastq.gz for FASTQ input files.

thapbi-pict - THAPBI PICT v0.13.0

Published by peterjc about 2 years ago

Released on PyPI on 2022-09-14:

https://pypi.org/project/thapbi-pict/0.13.0/

The main change is faster distance based classifiers by efficient use of the RapidFuzz library. This requires at least RapidFuzz version 2.4.0 to work.

This release drops the human readable plain text sample report which was not useful with large datasets, and now always includes the threshold columns (previously hidden when their values were the same for all samples).

Finally importing sequences into a database is now faster, and the NCBI taxonomy used in the default DB has been updated adding taxids to recently added species.

thapbi-pict - THAPBI PICT v0.12.9

Published by peterjc about 2 years ago

Released on PyPI on 2022-08-19:

https://pypi.org/project/thapbi-pict/0.12.9/

Changes to the default Phytophthora-centric ITS1 database, with further refinement of the left-trimming for the bulk import of NCBI search results, and the addition of type strains from Jung et al. (2022) to the curated Phytophthora set.

thapbi-pict - THAPBI PICT v0.12.8

Published by peterjc about 2 years ago

Released on PyPI on 2022-08-08:

https://pypi.org/project/thapbi-pict/0.12.8/

Now treats NCBI taxonomy 'equivalent name' as a synonym, resulting in minor changes to the non-Phytophthora in the default ITS1 database (including 11 cases of Pythium now recorded as Phytopythium).

thapbi-pict - THAPBI PICT v0.12.7

Published by peterjc over 2 years ago

Released on PyPI on 2022-07-26:

https://pypi.org/project/thapbi-pict/0.12.7/

Fixes the missing NCBI taxid in the genus-only fallback classifier output (when using the NCBI taxonomy), previously would report zero.

Also minor updates to the genus-only NCBI imports in the default database, adding another ten non-Phytophthora ITS1 sequences which had been excluded in error as part of resolving conflicting genus information. There are generally associated with species which have changed genus.

thapbi-pict - THAPBI PICT v0.12.6

Published by peterjc over 2 years ago

Released on PyPI on 2022-07-25:

https://pypi.org/project/thapbi-pict/0.12.6/

Reworked how we import NCBI sequences (at genus level) into the default Phytophthora centric database. Now import extended published sequences missing (part of) the conserved starting 32bp leader where the extended version is observed in our own environmental data in at least 5 samples and at least 1000 reads total abundance. Currently this adds 59 unique sequences, around half of which are Phytophthora. Also, have relaxed the stringency on the right primer matching, but now insist on this being present. This added over 100 non-Phytophthora entries, mostly Globisporangium.

Internally we now track the history of the default database as a FASTA export (including genus/species and NCBI taxid), rather than a raw SQL dump. This is much easier to interpret by eye, and has less spurious changes (e.g. renumbering of entries from taxonomy additions). However, this does require rebuilding the database from the source files (which are already under version control).

thapbi-pict - THAPBI PICT v0.12.5

Published by peterjc over 2 years ago

Released on PyPI on 2022-07-08:

https://pypi.org/project/thapbi-pict/0.12.5/

Now records synonym entries for NCBI taxonomy identifiers for sub-nodes (e.g. varietas nodes for parent species, or clade nodes for parent genus), in addition to their names, which improves importing references into the database via the taxid.

The existing FASTA reference import support for the ObiTools format now prioritises any species information already in the database for the NCBI taxid over that in the FASTA file. This is particularly helpful when there have been changes (e.g. the genus of a species has been changed) since the FASTA file was created.

Added a new FASTA reference importing convention which uses the NCBI taxid only (which therefore requires pre-loading the taxonomy).

thapbi-pict - THAPBI PICT v0.12.4

Published by peterjc over 2 years ago

Released on PyPI on 2022-07-07:

https://pypi.org/project/thapbi-pict/0.12.4/

This is a minor release to support RapidFuzz v2.0.0 or later, used only for the edit-graph functionality.

thapbi-pict - THAPBI PICT v0.12.3

Published by peterjc over 2 years ago

Released on PyPI on 2022-07-06:

https://pypi.org/project/thapbi-pict/0.12.3/

This release focused on additions to the default database, updating the NCBI taxonomy and refreshing the bulk import of genus-only entries from an ITS1 search on the NCBI.

Also includes some documentation updates, and fixed an integer overflow bug when outputting a distance matrix.

thapbi-pict - THAPBI PICT v0.12.2

Published by peterjc over 2 years ago

Released on PyPI on 2022-06-15:

https://pypi.org/project/thapbi-pict/0.12.2/

This release focused on additions to the default database, with additional curated sequences for Phytophthora panamensis, sp. Kunnunara, transitoria, variabilis, and 13 candidate species from Catala et a. (2018). Additionally the right-trimming of the entry for Phytophthora rhizophorae was corrected.

thapbi-pict - THAPBI PICT v0.12.1

Published by peterjc over 2 years ago

Released on PyPI on 2022-05-18:

https://pypi.org/project/thapbi-pict/0.12.1/

Fixes a regression on sample reports including unsequenced samples missing a blank value field. This was only partially addressed in v0.11.6, and would block the pooling script from completing.

thapbi-pict - THAPBI PICT v0.12.0

Published by peterjc over 2 years ago

Released on PyPI on 2022-04-19:

https://pypi.org/project/thapbi-pict/0.12.0/

Can now use synthetic spike-ins control to automatically raise the fractional abundance threshold. For example, if 99% of a control is recognised as synthetic spike-in sequence, then the fractional abundance threshold can be raised to 1%.

The sample reports updated to include the number of singletons and number of accepted unique sequences, in addition to the total number of accepted reads. This required updates to the metadata stored in the intermediate files.

Computation of the edit-graph has been optimised and performance runtime significantly improved. Additionally a regression in the XGMML output was fixed.

Dependencies, tests, and examples updated to use FLASH v1.2.11 and Cutadapt v4.0, a new release with minor changes in our accepted read counts compared to cutadapt v3.7.

thapbi-pict - THAPBI PICT v0.11.6

Published by peterjc over 2 years ago

Released on PyPI on 2022-03-09:

https://pypi.org/project/thapbi-pict/0.11.6/

Fixed a regression building summary reports when some samples have not been sequenced.

thapbi-pict - THAPBI PICT v0.11.5

Published by peterjc over 2 years ago

Released on PyPI on 2022-02-18:

https://pypi.org/project/thapbi-pict/0.11.5/

Reporting enhancements when using spike-in (synthetic) controls.

thapbi-pict - THAPBI PICT v0.6.1

Published by peterjc over 2 years ago

This is a belated release on GitHub, with v0.6.1 published on PyPI 2020-01-08

https://pypi.org/project/thapbi-pict/0.6.1/

The marker sequences in the default curated Phytophthora ITS1 database were extended to include the leading normally conserved 32bp region which had previously been discarded. This point release required Python 3.6 onwards due to adopting new features of the Python language. The later v0.6.x releases focused on improved reporting.

thapbi-pict - THAPBI PICT v0.11.4

Published by peterjc over 2 years ago

Released on PyPI on 2022-02-08:

https://pypi.org/project/thapbi-pict/0.11.4/

Updates the default DB with a further six Phytophthora species.

thapbi-pict - THAPBI PICT v0.11.3

Published by peterjc over 2 years ago

Released on PyPI on 2022-02-01:

https://pypi.org/project/thapbi-pict/0.11.3/

Fixes the dynamic k-mer threshold for detecting synthetic spike-in control sequences.

thapbi-pict - THAPBI PICT v0.11.2

Published by peterjc almost 3 years ago

Released on PyPI on 2022-01-20:

https://pypi.org/project/thapbi-pict/0.11.2/

Small fixes for use on Windows, and automated continuous integration testing on Windows using AppVeyor.

thapbi-pict - THAPBI PICT v0.11.1

Published by peterjc almost 3 years ago

Released on PyPI on 2022-01-18:

https://pypi.org/project/thapbi-pict/0.11.1/

Switched from the python-Levenshtein to rapidfuzz Python library for Levenshtein edit-distance, which is more easily installed on Windows, and should be faster too.

thapbi-pict - THAPBI PICT v0.11.0

Published by peterjc almost 3 years ago

Released on PyPI on 2022-01-13:

https://pypi.org/project/thapbi-pict/0.11.0/

When used with multiple markers the pipeline now also produces combined reports by pooling the predictions from each marker.