Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
MIT License
Published by peterjc almost 3 years ago
Released on PyPI on 2022-01-12:
https://pypi.org/project/thapbi-pict/0.10.6/
This fixes a slow-down in v0.10.0 when using a small database, most noticeable when exploring a large new dataset with a minimal (often ad-hoc) database. It restores the earlier approach of building a cloud of all 1bp edits of the DB entries in memory to speed up comparison to the samples sequences (as long as the DB is not too large as this becomes memory intensive).
Published by peterjc almost 3 years ago
Released on PyPI on 2021-12-23:
https://pypi.org/project/thapbi-pict/0.10.5/
Default for -f
/ --abundance-fraction
is now 0.001, meaning 0.1%. The percentage based abundance threshold was previously off by default.
Published by peterjc almost 3 years ago
Released on PyPI on 2021-11-24:
https://pypi.org/project/thapbi-pict/0.10.4/
Updates to default curated DB, including newer NCBI taxonomy.
Published by peterjc almost 3 years ago
Released on PyPI on 2021-11-19:
https://pypi.org/project/thapbi-pict/0.10.3/
New -f
/ --abundance-fraction
setting, off by default. Particularly useful for experiments where the sequence depth varies dramatically between samples.
Published by peterjc almost 3 years ago
Released on PyPI on 2021-11-05:
https://pypi.org/project/thapbi-pict/0.10.2/
Simplifies how the NCBI taxonomy is loaded. Also (unless in lax mode), when importing curated FASTA files sequences where the species is not in our DB, these are now retained but at genus level only.
The default database was rebuilt with these changes, the October 2021 NCBI taxonomy, and some additional newly curated entries.
Published by peterjc over 3 years ago
Released on PyPI on 2021-07-28:
https://pypi.org/project/thapbi-pict/0.10.0/
Fixes the classifier code to run under SQLAlchemy v1.3. The previous release was accidentally using a new alias only available in SQLAlchemy v1.4 onwards.
Published by peterjc over 3 years ago
Released on PyPI on 2021-07-28:
https://pypi.org/project/thapbi-pict/0.10.0/
This update changes the database schema in a backward compatibility breaking way in order to support multiple primer amplicons, which will be separated based on the primers when calling cutadapt
.
It also reworks the distance based classifiers to reduce their memory usage, taking advantage of the changes in v0.9.9 to make a non-redundant FASTA file so that the classifier does not have to repeatedly re-classify the same sequences. This makes it possible to use larger databases.
This has required adding -k
/ --marker
to some of the commands. There were therefore additional minor changes to the command line options including renaming -k
/ --spike
to -y
/ --synthetic
, dropping -k
as a shorthand for --known
. Also the -o
/ -output
and -r
/ --report
arguments were combined into a single output folder or stem setting.
Published by peterjc over 3 years ago
Released on PyPI on 2021-07-08:
https://pypi.org/project/thapbi-pict/0.9.9/
Dropped the SWARM based classifiers. This allowed optimising the pipeline to run the classifier on a non-redundant FASTA file containing all the observed sequences - with associated changes to the summary report code etc to match.
Also optimised the memory footprint of the load-tax command to more than halve the peak RAM requirement in building our default database.
Published by peterjc over 3 years ago
Released on PyPI on 2021-06-17:
https://pypi.org/project/thapbi-pict/0.9.8/
Dropped edit-graph in pipeline. Rarely required, and computationally expensive.
Require full length primers in merged reads. Avoids some problematic reads, but does discard a small number of usable sequences.
Fixed an issue with the pooling script when some entries were flagged as pending.
Published by peterjc over 3 years ago
Released on PyPI on 2021-06-04:
https://pypi.org/project/thapbi-pict/0.9.7/
Now supports the USEARCH SINTAX and OBITools FASTA conventions in the import
command.
Dropped support for in-situ intermediate FASTA output. Mixing raw data and intermediate is not a good idea, and this complicated future plans.
Drop prepare-reads
command option -p
/ --primers
for reporting reads which failed primer matching. Again, this complicated future plans.
Updates to the test suite.
Published by peterjc over 3 years ago
Released on PyPI on 2021-05-21:
https://pypi.org/project/thapbi-pict/0.9.6/
Updated the default database, focused on curation to remove uninformative entries:
Also added a simple distance matrix output format to the edit-graph
command.
Published by peterjc over 3 years ago
Released on PyPI on 2021-05-10:
https://pypi.org/project/thapbi-pict/0.9.5/
Simplified to just one import
command, taking pre-trimmed FASTA input. This means the end user must apply any primer-trimming to their reference set before importing the trimmed sequences into a database.
Also dropped an unused field in the database, which was originally any pre-trimmed sequence. This version will still work with databases created with older versions, but not the other way round.
Published by peterjc over 3 years ago
Released on PyPI on 2021-05-05:
https://pypi.org/project/thapbi-pict/0.9.4/
Dropped unused metadata fields from databases schema. This version will still work with databases created with older versions, but not the other way round.
Fixed output of GML format edit graphs.
Published by peterjc over 3 years ago
Released on PyPI on 2021-05-04:
https://pypi.org/project/thapbi-pict/0.9.3/
Replaced use of a simple HMM for spike-in control detection, now done via synthetic controls in the database, and k-mer counting.
This allows us to drop the dependency on hmmer3, making use on Windows significantly easier in principle.
Published by peterjc over 3 years ago
Released on PyPI on 2021-04-28:
https://pypi.org/project/thapbi-pict/0.9.2/
Improved test coverage of automatically raising the minimum abundance threshold based on negative controls.
Fixed an obscure problem using relative versions of absolute paths.
Published by peterjc over 3 years ago
Released on PyPI on 2021-04-20:
https://pypi.org/project/thapbi-pict/0.9.1/
Can now specify the encoding of the metadata TSV file (e.g. "latin1" or "macintosh", for when this does not match the system default).
Adds explicit warnings when using a low abundance threshold.
Also fixed some oversights in the manifest which had left some files out of the recent tar-ball source releases.
Published by peterjc over 3 years ago
Released on PyPI on 2021-04-19:
https://pypi.org/project/thapbi-pict/0.9.0/
Dropped use of Trimmomatic, which made the read preparation step slightly faster, and yields slightly higher read counts.
Published by peterjc over 3 years ago
Released on PyPI on 2021-04-13:
https://pypi.org/project/thapbi-pict/0.8.4/
Speed up re-running the classifier by delaying method setup until and if actually required.
Includes recent work adding the abundance threshold to the reports (unless constant), and the addition of a Python script for pooling the sample report.
Published by peterjc over 3 years ago
Released on PyPI on 2021-04-09:
https://pypi.org/project/thapbi-pict/0.8.1/
Simplified the intermediate classifier TSV file by dropping the species list embedded in the header. The assess command now requires a database to provide the list of possible species. There is no change to use of the pipeline command.
Published by peterjc over 3 years ago
Released on PyPI on 2021-04-06:
https://pypi.org/project/thapbi-pict/0.8.0/
Revised genus/species columns in sample report. Dropped the genus sub-totals, and instead show 'Genus (unknown species)' as needed. Changed the column sort order to move unknowns to the end. Added a human readable comma separated classification string as a new column.
Shortened (uncertain/ambiguous) to (*) in text report.
Added scripts/
folder with a few helper Python scripts.