DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD-3-CLAUSE License
Bot releases are visible (Hide)
In this release:
call_variants
that caused the step to freeze in cases where there were no examples. This bug was observed and reported in https://github.com/google/deepvariant/issues/764, https://github.com/google/deepvariant/issues/769, https://github.com/google/deepsomatic/issues/8.libssw
library from 1.2.4 to 1.2.5.Published by kishwarshafin 12 months ago
postprocess_variants
which reduces 48 minutes to 30 minutes for Illumina WGS and 56 minutes to 33 minutes with PacBio.We are sincerely grateful to
postprocess_variants
.Published by pichuan over 1 year ago
--model_type ONT_R104
is a new option. Starting from v1.5, DeepVariant natively supports ONT R10.4 simplex and duplex data.
--enable_joint_realignment
and --p_error
.Published by pichuan over 2 years ago
insert_size
) . This reduces errors by 4-10% for Illumina WGS and WES model. Thanks @lucasbrambrink for implementing this feature.postprocess_variants
step by 10-30%. Thanks @moshewagner for optimizing the code.Published by pichuan almost 3 years ago
call_variants
speed for PacBio models (both DeepVariant and DeepTrio) by reducing the default window width from 221 to 199, without tradeoff on accuracy. Thanks to @lucasbrambrink for conducting the experiments to find a better window width for PacBio.--normalize_reads
in make_examples
, which normalizes Indel candidates at the reads level.This flag is useful to reduce rare cases where an indel variant is not left-normalized. This feature is mainly relevant to joint calling of large cohorts for joint calling, or cases where read mappings have been surjected from one reference to another. It is currently set to False by default. To enable it, add --normalize_reads=true
directly to the make_examples
binary. If you’re using the run_deepvariant
one-step approach, add --make_examples_extra_args="normalize_reads=true"
. Currently we don’t recommend turning this flag on for long reads due to potential runtime increase.--aux_fields_to_keep
flag to the make_examples
step, and set the default to only the auxiliary fields that DeepVariant currently uses. This reduces memory use for input BAM files that have large auxiliary fields that aren’t used in variant calling. Thanks to @williamrowell and @rhallPB for reporting this issue.make_examples
as well as call_variants
to address the issue reported in https://github.com/google/deepvariant/issues/491.Published by pichuan about 3 years ago
The DeepVariant v1.2 release contains the following major improvements:
make_examples
better modularizes common components between DeepVariant, DeepTrio, and potential future applications. This enables DeepTrio to inherit improvements such as --add_hp_channel
(introduced to the DeepVariant PacBio model in v1.1; see blog), improving DeepTrio’s PacBio accuracy.Additional detail for improvements in DeepVariant v1.2:
Improvements for training:
Improvements for make_examples
:
For more details on flags, run /opt/deepvariant/bin/make_examples --help
for more details.
--split_skip_reads
flag: if True, make_examples will split reads with large SKIP cigar operations into individual reads. Resulting read parts that are less than 15 bp are filtered out.--emit_realigned_reads=true --realigner_diagnostics=/output/realigned_reads
for make_examples. You will still need to run samtools index
to get the index file, but no longer need to sort the BAM.Improvements for the one-step run_deepvariant
:
For more details on flags, run /opt/deepvariant/bin/run_deepvariant --help
for more details.
--runtime_report
which enables runtime report output to --logging_dir
. This makes it easier for users to get the runtime by region report for make_examples.--dry_run
flag is now added for printing out all commands to be executed, without running them. This is mentioned in the Quick Start section.Published by akolesnikov almost 4 years ago
The v1.1 release introduces DeepTrio, which uses a model specifically trained to call a mother-father-child trio or parent-child duo. DeepTrio has superior accuracy compared to DeepVariant. Pre-trained models are available for Illumina WGS, Illumina exome, and PacBio HiFi.
In addition, DeepVariant v1.1 contains the following improvements:
--add_hp_channel
which is enabled by default for PacBio.New optional flags to increase speed:
A team at Intel has adapted DeepVariant to use the OpenVINO toolkit, which further accelerates
TensorFlow applications. This further speeds up the call_variants stage by ~25% for any model when run in CPU mode on an Intel machine. DeepVariant runs of OpenVINO have the same accuracy and are nearly identical to runs without. Runs with OpenVINO are fully reproducible on OpenVINO.
To use OpenVINO, add the following flag too the DeepVariant command:
--call_variants_extra_args "use_openvino=true"
We thank Intel for their contribution, and acknowledge the extensive work their team put in, captured in (https://github.com/google/deepvariant/pull/363)
Published by pichuan about 4 years ago
DeepVariant v1.0 releases new features and accuracy improvements sufficiently substantial to indicate a major version of v1.0. Compared to DeepVariant v0.10, these changes reduce Illumina WGS errors by 24%, exome errors by 19%, and PacBio errors by 52%.
--alt_aligned_pileup
. --alt_aligned_pileup=diff_channels
is now default for DeepVariant PacBio model. This substantially improves INDEL accuracy for PacBio data.--sort_by_haplotypes
to optionally allow creating pileup images with reads sorted by haplotype. Haplotype sorting is based on the HP tag that must be present in input BAM, and --parse_sam_aux_fields
needs to be set as well. This substantially improves INDEL accuracy for PacBio data.--sort_by_haplotypes
by phasing variants and the input reads. Accuracy metrics for both single pass calling and two-pass calling are shown. Users may choose whether to run a second time for higher accuracy.--min_mapping_quality
in make_examples.py changed from 10 to 5. This improves accuracy of all models (WGS, WES, and PACBIO).--sequencing_type_image
and --custom_pileup_image
--only_keep_pass
flag to postprocess_variants.py to optionally only keep PASS calls in output VCF.binarize
function in modelling.py. (https://github.com/google/deepvariant/issues/286 fixed in https://github.com/google/deepvariant/commit/db87d77)--regions
when using run_deepvariant.py. (https://github.com/google/deepvariant/issues/305 fixed in https://github.com/google/deepvariant/commit/fbacd35)--version
to run_deepvariant.py. (https://github.com/google/deepvariant/issues/332 fixed in https://github.com/google/deepvariant/commit/f101492)--sample_name
flag to postprocess_variant.py and applied it in run_deepvariant.py as well. (https://github.com/google/deepvariant/issues/334 fixed in https://github.com/google/deepvariant/commit/a81d629)Published by pichuan over 4 years ago
ws_use_window_selector_model
by default: This flag was turned on by default in v0.7.0. After the discussion in issue #272, we decided to turn this off to improve consistency and accuracy, at the trade-off of a 7% increase in runtime of the make_examples
step.--make_examples_extra_args "ws_use_window_selector_model=true"
to save some runtime at the expense of accuracy.Published by sgoe1 almost 5 years ago
Full release notes:
New documentation:
Changes to Docker images, code, and models:
Changes to flags:
--sample_name
flag to run_deepvariant.py.vsc_min_fraction_indels
to 0.06 for Illumina data (WGS
and WES
mode) which increases sensitivity.--reads
to take multiple BAMs in a comma-separated list.--ref
for CRAM by default. (Set --use_ref_for_cram
to true by default)--realigner_diagnostics
and --emit_realigned_reads
flags in realigner.py.Published by gunjanbaid over 5 years ago
With the v0.8.0 release, we introduce a new DeepVariant model for PacBio CCS data. This model can be run in the same manner as the Illumina WGS and WES models. For more details, see our manuscript with PacBio and our blog post.
This release also includes general improvements to DeepVariant and the Illumina WGS and WES models. These include:
use_ref_for_cram
flag below.New optional flags:
make_examples.py
use_ref_for_cram
:--ref
will be used as the reference instead. See CRAM support section for more details.parse_sam_aux_fields
and use_original_quality_scores
:min_base_quality
:min_mapping_quality
:call_variants.py
config_string
:num_mappers
:Published by akolesnikov almost 6 years ago
./.
instead of 0/0
. The threshold is configurable via --cnn_homref_call_min_gq
flag in postprocess_variants.py
. This improves downstream cohort merging performance based on our internal investigation in a "Improved non-human variant calling using species-specific DeepVariant models" blog.Published by pichuan almost 6 years ago
batch_size
in case the users need to change it for the call_variants step.logging_interval_sec
to control how often worker logs are written into Google Cloud Storage.call_variants
: only one call_variants
is run on each machine for better performance. This improved the GPU cost and speed.Published by pichuan about 6 years ago
This release includes numerous performance improvements that collectively reduce the runtime of DeepVariant by about 65%.
A few highlighted changes in this release:
call_variants
runtime by more than 3x compared to v0.6.make_examples
which result in significant runtime improvements. For example, make_examples
now runs more than 3 times faster in the WGS case study than v0.6.
-ws_use_window_selector_model
which is now on by default.Published by pichuan over 6 years ago
Published by pichuan over 6 years ago
This release has a new WGS model that has major accuracy improvement on PCR+ data. We also released a new WES model that has some minor accuracy improvement.
A few important changes in this release:
Published by cmclean over 6 years ago
This release is a bugfix release for gVCF creation. See https://github.com/google/deepvariant/issues/58 for details.
Published by cmclean over 6 years ago
This release fixes issue #27 and adds support for creating the MIN_DP field in gVCF records.
Published by pichuan over 6 years ago
Release two separate models for calling genome and exome sequencing data. Significant improvement of Indel F1 on exome data.
Provide capability to produce gVCF files as output from DeepVariant [doc]:
gVCF files are required as input for analyses that create a set of variants in a cohort of individuals, such as cohort merging or joint genotyping.
Training data:
All models are trained with a benchmarking-compatible strategy: That is, we never train on any data from the HG002 sample, or from chromosome 20 from any sample.
Whole genome sequencing model:
We used training data from both genome sequencing data as well as exome sequencing data.
In order to increase diversity of training data, we also used the downsample_fraction
flag when making training examples.
Whole exome sequencing model:
We started from a trained WGS model as a checkpoint, then we continue to train only on WES data above. We also use various downsample fractions for the training data.
DeepVariant now provides deterministic output by rounding QUAL field to one digit past the decimal when writing to VCF.
Update the model input data representation from 7 channels to 6.
Add a post-processing step to variant calls to eliminate rare inconsistent haplotypes [description].
Expand the excluded contigs list to include common problematic contigs on GRCh38 [GitHub issue].
It is now possible to run DeepVariant workflows on GCP with pre-emptible GPUs.
Published by scott7z almost 7 years ago
This fixes a problem with htslib_gcp_oauth when network access is unavailable.