geNomad: Identification of mobile genetic elements
OTHER License
--min-number-genes
parameter to the summary
module. This parameter allows users to set the minimum number of genes a sequence must encode to be considered for classification as a plasmid or virus. The default value is 1
. When --conservative
is used, this parameter is set to 1
. When --relaxed
is used, this parameter is set to 0
. This filter has no effect if the annotate
module is not executed.;
). As a result, for genomes that could not be assigned to the family level (the most specific taxonomic rank), there will be trailing semicolons at the end of the lineage string.annotate
module is not executed.--min-plasmid-marker-enrichment
to 0.1
.Published by apcamargo 7 months ago
keras
version to below 3.0
. This prevents errors due to incompatibility with keras >=3.0
, such as the shape
parameter not accepting an integer as input.Published by apcamargo 8 months ago
CUDA_VISIBLE_DEVICES
environment variable to -1
in nn_classification
. This fixes a bug where the nn_classification
module would fail to run when a GPU was available and the input had a single sequence.Published by apcamargo 11 months ago
find-proviruses
module can now properly add integrases to gene tables and extend boundaries using integrase coordinates.read_fasta
.current_contig
at the beginning of _append_aragorn_tsv
.Published by apcamargo 11 months ago
pyrodigal-gv
version to 0.3.1
. This fixes a bug introduced in 0.3.0
that led to the identification of RBS motifs not reported by Prodigal.CCGGGG
RBS motif from the list of motifs.Published by apcamargo 11 months ago
CCGGGG
RBS motif to the list of motifs.*
) at the end of protein sequences.pyrodigal-gv
version to 0.2.0
.Published by apcamargo about 1 year ago
prodigal-gv
with pyrodigal-gv
Published by apcamargo about 1 year ago
mmseqs search
command has been replaced by a two-step alignment workflow. In the first alignment step, --alignment-mode 1
and --max-rejected
are utilized, while the second step uses --alignment-mode 2
and -c 0.2
. This change reduces the number of alignments that are rejected due to not meeting the minimum coverage cutoff and mitigates the issue where the annotation results change when the input sequence order is altered.--min-ungapped-score
parameter of mmseqs prefilter
was increased from 20
to 25
.--max-rejected
parameter of the first mmseqs align
step was increased from 225
to 280
.Published by apcamargo about 1 year ago
np.warnings
with warnings
to add compatibility with numpy >= 1.24
.Published by apcamargo about 1 year ago
numba
(>=0.57
) and numpy
(>=1.21
) version requirements.casefold
for sequence comparison within the Sequence
class.Sequence
class that return an instance of Sequence
.console.status
to log the deletion of the .tar.gz
file during the execution of download-database
.--conservative-taxonomy
parameter. This increases the amount of viral genomes assigned to a family when executing geNomad with default parameters.--conservative
and --relaxed
(e.g. --min_score
→ --min-score
).Published by apcamargo over 1 year ago
nn-classification
.README.md
to the database version 1.3.0.mmseqs convertalis
output the whole sequence header instead of gene accesions. This prevents parsing conflits with geNomad's other components in cases where MMseqs2 uses its built-in special parsers for specific header formats (e.g. RefSeq).Published by apcamargo over 1 year ago
--threads
parameter to the nn-classification
module, which allows controlling the number of threads used for classifying sequences using the neural network model.summary
module description.Published by apcamargo over 1 year ago
--min-score
parameter was modified to remove the following sentence: "By default, the sequence is classified as virus/plasmid if its virus/plasmid score is higher than its chromosome score, regardless of the value".--max-seqs 1000000 --min-ungapped-score 20 --max-rejected 225
. As a result, changing --splits
won't affect the search results anymore. Thanks @milot-mirdita.Published by apcamargo over 1 year ago
README.md
.--min-plasmid-hallmarks-short-seqs
and --min-virus-hallmarks-short-seqs
parameters. These options allow filtering out short sequences (less than 2,500 bp) that don't encode a minimum number of hallmark genes. By default, short sequences need to encode at least one hallmark to be classified as a virus or a plasmid.--conservative
and --relaxed
presets that control post-classification filters. The --conservative
option makes those filters even more aggressive, resulting in more restricted sets of plasmid and virus, containing only sequences whose classification is strongly supported. The --relaxed
preset disables all post-classification filters.--min-score
from 0.0 to 0.7.README.md
to version 1.4.0. This includes mentions to the --conservative
and --relaxed
flags and a warning about how changes in --splits
can affect geNomad's output.Published by apcamargo almost 2 years ago
score-calibration
that happened when find-proviruses
was executed but no provirus was detected. The module now checks if proviruses were detected (using utils.check_provirus_execution
) before counting the total number of sequences.Published by apcamargo almost 2 years ago
numpy <1.6
. Fixes #7, which occurs because numba
doesn't support numpy >=1.24
yet.Published by apcamargo almost 2 years ago
find-proviruses
was executed when counting the number of sequences in the score-calibration
module (thanks Spencer Diamond for pointing this error out!).Published by apcamargo almost 2 years ago
No terminal repeats
, as Linear
can be misleading.click.rich_click.MAX_WIDTH
to None
.--sensitivity
to 4.0
.README.md
to version 1.3.0.prog_name
in click.version_option
.Published by apcamargo almost 2 years ago
README.md
.--min-plasmid-marker-enrichment
, --min-virus-marker-enrichment
, --min-plasmid-hallmarks
, and --min-virus-hallmarks
parameters: "This option will be ignored if the annotation module was not executed".score_batch_correction
. This will shrink the effect of calibration when the empirical composition distribution is very skewed.--min-score
in the README.md
example to 0.7.Published by apcamargo about 2 years ago
Sequence
class: add support for str
in __eq__
.Sequence
class: add a __hash__
method.marker-classification
module._plasmid_summary.tsv
and _virus_summary.tsv
files.--min-plasmid-marker-enrichment
and --min-virus-marker-enrichment
to 0
as default. This will alter the results when using default parameters._plasmid_summary.tsv
. Requires geNomad database v1.1.Sequence
class: simplify has_dtr
return statement.Sequence
class: make __repr__
more friendly for long sequences.Sequence
class: rename the id
property to accession
._provirus_aragorn.tsv
..ubj
format.xgboost >=1.6
._taxonomy.tsv
and _virus_summary.tsv
will use Viruses
as the highest rank, instead of root
._plasmid_summary.tsv
and _virus_summary.tsv
.fraction
to 0.5
in taxopy.find_majority_vote
.summary_execution_info
.DatabaseDownloader.get_version
where only the major version was compared.