thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool

MIT License

Downloads
4.7K
Stars
8
Committers
5
thapbi-pict - THAPBI PICT v0.10.6

Published by peterjc almost 3 years ago

Released on PyPI on 2022-01-12:

https://pypi.org/project/thapbi-pict/0.10.6/

This fixes a slow-down in v0.10.0 when using a small database, most noticeable when exploring a large new dataset with a minimal (often ad-hoc) database. It restores the earlier approach of building a cloud of all 1bp edits of the DB entries in memory to speed up comparison to the samples sequences (as long as the DB is not too large as this becomes memory intensive).

thapbi-pict - THAPBI PICT v0.10.5

Published by peterjc almost 3 years ago

Released on PyPI on 2021-12-23:

https://pypi.org/project/thapbi-pict/0.10.5/

Default for -f / --abundance-fraction is now 0.001, meaning 0.1%. The percentage based abundance threshold was previously off by default.

thapbi-pict - THAPBI PICT v0.10.4

Published by peterjc almost 3 years ago

Released on PyPI on 2021-11-24:

https://pypi.org/project/thapbi-pict/0.10.4/

Updates to default curated DB, including newer NCBI taxonomy.

thapbi-pict - THAPBI PICT v0.10.3

Published by peterjc almost 3 years ago

Released on PyPI on 2021-11-19:

https://pypi.org/project/thapbi-pict/0.10.3/

New -f / --abundance-fraction setting, off by default. Particularly useful for experiments where the sequence depth varies dramatically between samples.

thapbi-pict - THAPBI PICT v0.10.2

Published by peterjc almost 3 years ago

Released on PyPI on 2021-11-05:

https://pypi.org/project/thapbi-pict/0.10.2/

Simplifies how the NCBI taxonomy is loaded. Also (unless in lax mode), when importing curated FASTA files sequences where the species is not in our DB, these are now retained but at genus level only.

The default database was rebuilt with these changes, the October 2021 NCBI taxonomy, and some additional newly curated entries.

thapbi-pict - THAPBI PICT v0.10.1

Published by peterjc over 3 years ago

Released on PyPI on 2021-07-28:

https://pypi.org/project/thapbi-pict/0.10.0/

Fixes the classifier code to run under SQLAlchemy v1.3. The previous release was accidentally using a new alias only available in SQLAlchemy v1.4 onwards.

thapbi-pict - THAPBI PICT v0.10.0

Published by peterjc over 3 years ago

Released on PyPI on 2021-07-28:

https://pypi.org/project/thapbi-pict/0.10.0/

This update changes the database schema in a backward compatibility breaking way in order to support multiple primer amplicons, which will be separated based on the primers when calling cutadapt.

It also reworks the distance based classifiers to reduce their memory usage, taking advantage of the changes in v0.9.9 to make a non-redundant FASTA file so that the classifier does not have to repeatedly re-classify the same sequences. This makes it possible to use larger databases.

This has required adding -k / --marker to some of the commands. There were therefore additional minor changes to the command line options including renaming -k / --spike to -y / --synthetic, dropping -k as a shorthand for --known. Also the -o / -output and -r / --report arguments were combined into a single output folder or stem setting.

thapbi-pict - THAPBI PICT v0.9.9

Published by peterjc over 3 years ago

Released on PyPI on 2021-07-08:

https://pypi.org/project/thapbi-pict/0.9.9/

Dropped the SWARM based classifiers. This allowed optimising the pipeline to run the classifier on a non-redundant FASTA file containing all the observed sequences - with associated changes to the summary report code etc to match.

Also optimised the memory footprint of the load-tax command to more than halve the peak RAM requirement in building our default database.

thapbi-pict - THAPBI PICT v0.9.8

Published by peterjc over 3 years ago

Released on PyPI on 2021-06-17:

https://pypi.org/project/thapbi-pict/0.9.8/

Dropped edit-graph in pipeline. Rarely required, and computationally expensive.

Require full length primers in merged reads. Avoids some problematic reads, but does discard a small number of usable sequences.

Fixed an issue with the pooling script when some entries were flagged as pending.

thapbi-pict - THAPBI PICT v0.9.7

Published by peterjc over 3 years ago

Released on PyPI on 2021-06-04:

https://pypi.org/project/thapbi-pict/0.9.7/

Now supports the USEARCH SINTAX and OBITools FASTA conventions in the import command.

Dropped support for in-situ intermediate FASTA output. Mixing raw data and intermediate is not a good idea, and this complicated future plans.

Drop prepare-reads command option -p / --primers for reporting reads which failed primer matching. Again, this complicated future plans.

Updates to the test suite.

thapbi-pict - THAPBI PICT v0.9.6

Published by peterjc over 3 years ago

Released on PyPI on 2021-05-21:

https://pypi.org/project/thapbi-pict/0.9.6/

Updated the default database, focused on curation to remove uninformative entries:

  • Now using the May 2021 NCBI taxonomy, which moved Phytophthora versiformis out of the unclassified Phytophthora (so we now accept it). However, the curated sequence is shared with Phytophthora castanetorum & Phytophthora quercina.
  • Narrowed the genus level NCBI import to the Peronosporales & Pythiales only, having not seen any matches outside these groups.
  • Left primer the NCBI import before look for 32bp leader. This removes a leading G in many sequences where the ~32bp
    leader started TT rather than TTT after the left primer site.
  • Limited the import to sequences up to 450bp only, which dropped some untrimmed sequences lacking a match to the right primer, and the curated Nothophytophthora caduca entry as too long.
  • Ignoring GQ149496/JF916542 as probably not Phytophthora.

Also added a simple distance matrix output format to the edit-graph command.

thapbi-pict - THAPBI PICT v0.9.5

Published by peterjc over 3 years ago

Released on PyPI on 2021-05-10:

https://pypi.org/project/thapbi-pict/0.9.5/

Simplified to just one import command, taking pre-trimmed FASTA input. This means the end user must apply any primer-trimming to their reference set before importing the trimmed sequences into a database.

Also dropped an unused field in the database, which was originally any pre-trimmed sequence. This version will still work with databases created with older versions, but not the other way round.

thapbi-pict - THAPBI PICT v0.9.4

Published by peterjc over 3 years ago

Released on PyPI on 2021-05-05:

https://pypi.org/project/thapbi-pict/0.9.4/

Dropped unused metadata fields from databases schema. This version will still work with databases created with older versions, but not the other way round.

Fixed output of GML format edit graphs.

thapbi-pict - THAPBI PICT v0.9.3

Published by peterjc over 3 years ago

Released on PyPI on 2021-05-04:

https://pypi.org/project/thapbi-pict/0.9.3/

Replaced use of a simple HMM for spike-in control detection, now done via synthetic controls in the database, and k-mer counting.

This allows us to drop the dependency on hmmer3, making use on Windows significantly easier in principle.

thapbi-pict - THAPBI PICT v0.9.2

Published by peterjc over 3 years ago

Released on PyPI on 2021-04-28:

https://pypi.org/project/thapbi-pict/0.9.2/

Improved test coverage of automatically raising the minimum abundance threshold based on negative controls.

Fixed an obscure problem using relative versions of absolute paths.

thapbi-pict - THAPBI PICT v0.9.1

Published by peterjc over 3 years ago

Released on PyPI on 2021-04-20:

https://pypi.org/project/thapbi-pict/0.9.1/

Can now specify the encoding of the metadata TSV file (e.g. "latin1" or "macintosh", for when this does not match the system default).

Adds explicit warnings when using a low abundance threshold.

Also fixed some oversights in the manifest which had left some files out of the recent tar-ball source releases.

thapbi-pict - THAPBI PICT v0.9.0

Published by peterjc over 3 years ago

Released on PyPI on 2021-04-19:

https://pypi.org/project/thapbi-pict/0.9.0/

Dropped use of Trimmomatic, which made the read preparation step slightly faster, and yields slightly higher read counts.

thapbi-pict - THAPBI PICT v0.8.4

Published by peterjc over 3 years ago

Released on PyPI on 2021-04-13:

https://pypi.org/project/thapbi-pict/0.8.4/

Speed up re-running the classifier by delaying method setup until and if actually required.

Includes recent work adding the abundance threshold to the reports (unless constant), and the addition of a Python script for pooling the sample report.

thapbi-pict - THAPBI PICT v0.8.1

Published by peterjc over 3 years ago

Released on PyPI on 2021-04-09:

https://pypi.org/project/thapbi-pict/0.8.1/

Simplified the intermediate classifier TSV file by dropping the species list embedded in the header. The assess command now requires a database to provide the list of possible species. There is no change to use of the pipeline command.

thapbi-pict - THAPBI PICT v0.8.0

Published by peterjc over 3 years ago

Released on PyPI on 2021-04-06:

https://pypi.org/project/thapbi-pict/0.8.0/

Revised genus/species columns in sample report. Dropped the genus sub-totals, and instead show 'Genus (unknown species)' as needed. Changed the column sort order to move unknowns to the end. Added a human readable comma separated classification string as a new column.

Shortened (uncertain/ambiguous) to (*) in text report.

Added scripts/ folder with a few helper Python scripts.