dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5' data
AGPL-3.0 License
Bot releases are visible (Hide)
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.4...v0.3.5
Published by zktuong 9 months ago
generate_network
vdj_psuedobulk
functions - @ktpolanski.data
(extra
) to flag if contig is considered extra.VDJ
and VJ
to the id to reduce ambiguity - need to check if it does it properly for cells with no clone ids. This also means that now clone ids can be created for orphan chains.to_scirpy
/from_scirpy
functions that will now convert them to the new scverse airr formats - @amoschoomyogrdb
references in both the base package and the container.setup_vdj_pseudobulk()
by @ktpolanski in https://github.com/zktuong/dandelion/pull/334
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.3...v0.3.4
Published by zktuong about 1 year ago
tl.clone_overlap
and pl.clone_overlap
.Detailed notes:
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.2...v0.3.3
Published by zktuong over 1 year ago
Mainly to fix compatibility with dependencies.
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.1...v0.3.2
Published by zktuong over 1 year ago
Just to update pypi - Some bug fixes to accompany the revision
Doesn't affect the container image (but i should add a tag on sylabs to also call it 0.3.1 just to be consisten).
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.0...v0.3.1
Published by zktuong almost 2 years ago
This release adds a number of new features and minor restructuring to accompany Dandelion's manuscript (uploading soon). Kudos to @suochenqu and @ktpolanski
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.4...v0.3.0
Published by zktuong over 2 years ago
Dandelion
object can now be sliced like a AnnData
, or pandas DataFrame
!
vdj[vdj.data['productive'] == 'T']
Dandelion class object with n_obs = 38 and n_contigs = 94
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
vdj[vdj.metadata['productive_VDJ'] == 'T']
Dandelion class object with n_obs = 17 and n_contigs = 36
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
vdj[vdj.metadata_names.isin(['cell1', 'cell2', 'cell3', 'cell4', 'cell5'])]
Dandelion class object with n_obs = 5 and n_contigs = 20
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
vdj[vdj.data_names.isin(['contig1','contig2','contig3','contig4','contig5'])]
Dandelion class object with n_obs = 2 and n_contigs = 5
data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
adata[:, adata.var.something]
make sense as it's not really row information in the data slot?Dandelion
is .data
, and doesn't make sense for .metadata
to be the 'row'ddl.pp.check_contigs
ddl.pp.check_contigs
as a way to just check if contigs are ambiguous, rather than outright removing them. I envisage that this will eventually replace simple
mode in ddl.pp.filter_contigs
in the future.
.data
: ambiguous
, T/F to indicate whether contig is considered ambiguous or not (different from cell level ambiguous)..metadata
and several other functions ignores any contigs marked as T
to maintain the same behaviourddl.pp.check_contigs
and ddl.pp.filter_contigs
is that the onus is on the user to remove any 'bad' cells from the GEX data (illustrated in the tutorial) with check_contigs
whereas this happens semi-automatically with filter_contigs
.ddl.update_metadata
now comes with a 'by_celltype' option.retrieve_celltype
subfunction in the Query
class, which breaks up the retrieval into the 3 major groups if by_celltype = True
..obs
bloating.constant_status_VDJ
, constant_status_VJ
, productive_status_VDJ
, productive_status_VJ
as the metadata is getting bloated with the slight rework of Dandelion metadata slot to account for the new B/abT/gdT columnstl.productive_ratio
pl.productive_ratio
tl.vj_usage_pca
scanpy.pp.pca
internallyscanpy.pl.pca
ddl.pp.filter_contigs
filter_vj_chains
argument and replaced with filter_extra_vdj_chains
and filter_extra_vj_chains
to hopefully enable more interpretable behaviour. fixes #158ddl.pp.check_contigs
rearrangement_status_VDJ
and rearrangement_status_VJ
(renamed from rearrangement_VDJ_status
and rearrangement_VJ_status
) from now gives a single value for whether a chimeric rearrangement occured e.g. TRDV pairing with TRAJ and TRAC as in this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267242/
ddl.tl.find_clones
crashing if more than 1 type of loci is found in the data.
B
, abT
and gdT
prefix will be appended to BCR/TR-ab/TR-gd clones.chain_status
, to summarise the reworked locus_status
column.
ambiguous
, Orphan VDJ
, Single pair
etc, similar to chain_pairing
in scirpy.ddl.concat
now allows for custom suffix/prefix - only operates on sequence_id
.edges
from Dandelion class because this doesn't get used anywhere and it's also stored in the networkx
graphsnetworkx
directly so that i don't have to keep changing the adjacency matrices from pandas
to networkx
back and forthFull Changelog: https://github.com/zktuong/dandelion/compare/v0.2.2...v0.2.4
Published by zktuong over 2 years ago
same as v0.2.2 but i seemed to have messed up the upload to pypi. so trying again.
generate_network
.distance
slot is removed and is now directly stored/converted from the .graph
slot.compute_layout: bool = True
. If dataset is too large, generate_layout
can be switched to False
in which case only the networkx
graph is returned. The data can still be visualised later with scirpy's
plotting method (see below).layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'
. New default uses the ultra-fast C++ implemented sfdp_layout
algorithm in graph-tools
to generate final layout. sfdp
stands for Scalable Force Directed Placement.
gamma
alone doesn't really seem to do much.layout_method = 'mod_fr'
. Requires a separate installation of graph-tool
via conda (not managed by pip) as it has several C++ dependencies.generate_network
to run last.min_size
was doing the opposite previously and this is now fixed. #155transfer
scirpy
can use to generate the plots https://github.com/scverse/scirpy/issues/286
productive
to productive_status
.filter_contigs
initialise_metadata
. Dandelion
should now initialise and read faster.
load_data
will rename umi_count
to duplicate_count
Query
Dandelion
will be ordered based on productive first, then followed by umi count (largest to smallest).initialise_metadata/update_metadata/Dandelion
.metadata
which were probably bloated and not used.
vdj_status
and vdj_status_summary
removed and replaced with rearrangement_VDJ_status
and rearrange_VJ_status
constant_status
and constant_summary
removed and replaced with constant_VDJ_status
and constant_VJ_status
.productive
and productive_summary
combined and replaced with productive_status
.locus_status
and locus_status_summary
combined and replaced with locus_status
.isotype_summary
replaced with isotype_status
.unassigned
or ''
has been changed to :str: None
in .metadata
.
NoneType
as there's quite a bit of text processing internally that gets messed up if swapped.No_contig
will still be populated after transfer to AnnData
to reflect cells with no TCR/BCR info.read_h5/write_h5
. Use of read_h5ddl/write_h5ddl
will be enforced in the next update.Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.1...v0.2.2
Published by zktuong over 2 years ago
generate_network
.distance
slot is removed and is now directly stored/converted from the .graph
slot.compute_layout: bool = True
. If dataset is too large, generate_layout
can be switched to False
in which case only the networkx
graph is returned. The data can still be visualised later with scirpy's
plotting method (see below).layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'
. New default uses the ultra-fast C++ implemented sfdp_layout
algorithm in graph-tools
to generate final layout. sfdp
stands for Scalable Force Directed Placement.
gamma
alone doesn't really seem to do much.layout_method = 'mod_fr'
. Requires a separate installation of graph-tool
via conda (not managed by pip) as it has several C++ dependencies.generate_network
to run last.min_size
was doing the opposite previously and this is now fixed. #155transfer
scirpy
can use to generate the plots https://github.com/scverse/scirpy/issues/286
productive
to productive_status
.filter_contigs
initialise_metadata
. Dandelion
should now initialise and read faster.
load_data
will rename umi_count
to duplicate_count
Query
Dandelion
will be ordered based on productive first, then followed by umi count (largest to smallest).initialise_metadata/update_metadata/Dandelion
.metadata
which were probably bloated and not used.
vdj_status
and vdj_status_summary
removed and replaced with rearrangement_VDJ_status
and rearrange_VJ_status
constant_status
and constant_summary
removed and replaced with constant_VDJ_status
and constant_VJ_status
.productive
and productive_summary
combined and replaced with productive_status
.locus_status
and locus_status_summary
combined and replaced with locus_status
.isotype_summary
replaced with isotype_status
.unassigned
or ''
has been changed to :str: None
in .metadata
.
NoneType
as there's quite a bit of text processing internally that gets messed up if swapped.No_contig
will still be populated after transfer to AnnData
to reflect cells with no TCR/BCR info.read_h5/write_h5
. Use of read_h5ddl/write_h5ddl
will be enforced in the next update.Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.1...v0.2.2
Published by zktuong over 2 years ago
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.0...v0.2.1
Published by zktuong over 2 years ago
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.1.12...v0.2.0
Published by zktuong over 2 years ago
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.1.11...v0.1.12
Published by zktuong almost 3 years ago
Full Changelog: https://github.com/zktuong/dandelion/compare/v0.1.10...v0.1.11
Published by zktuong about 3 years ago
Fix minor bug in TCR preprocessing.
Fix documentation building script.
Add logging in singularity container.
Also testing an automated build test for creating singularity container that seems to be working:
https://github.com/zktuong/github-ci
Added support statement in readme.
Published by zktuong about 3 years ago
Query
to return numerical queries properly.Published by zktuong about 3 years ago
filter_contigs/FilterContigs/FilterContigLite
. Solves #92
duplicate_counts
were not adding upQuery
.
__init__
method preloads the required fields as a tree and a separate retrieve
method and access the dictionaries much faster. Same method as above.read_10x_vdj
parse_annotation
which was slowing everything down.sanitize_data
airr.RearrangementSchema
to match the dtype to deal with missing values. Also speed up some steps to make the validation faster.write_airr
function that basically calls airr.create_rearrangement
quantify_mutations
to hopefully return the right dtypes now.filter_contigs
to run without anndata
.dandelion_preprocessing.py
to let it run quantify_mutations based on the args.file_prefix
.Published by zktuong about 3 years ago
retrieve_metadata
which is now a new internal Query
class.ig
or tr
to be specified.Published by zktuong about 3 years ago
Published by zktuong about 3 years ago
Added a series of functions to guess the best format for locus as the previous try-except statements were not working.
Added a few more tests - now code coverage is 70%!
Added sanitize_dtype
function so that saving and conversion of format with R/h5/h5ad
works.
Changed bool to str where possible so as not to intefere with saving in h5ad
with h5py>=3.0.0
.
Minor annotation update: Using Optional
instead of Union[None, whatever]
.
Fix broken links in docs.
Bug fixes for compatibility with: mouse #70 and TCR data #81.
Some bug fixes to container script.
Updated container definition:
SINGULARITY_ENVIRONMENT
first helps with #79.Added pytest suite within the container. Now I can quickly test like as follows:
# devel and testing
sudo singularity build --notest sc-dandelion_dev.sif sc-dandelion_dev.def
sudo singularity test --writable-tmpfs sc-dandelion_dev.sif
# for release
sudo singularity build --notest sc-dandelion.sif sc-dandelion.def
sudo singularity test --writable-tmpfs sc-dandelion.sif
singularity sign sc-dandelion.sif
singularity verify sc-dandelion.sif
singularity push sc-dandelion.sif library://kt16/default/sc-dandelion
The test requires root access otherwise wouldn't be able to write into the container. Should look into whether sandbox/fakeroot modes works.
Interactive shell session of the container should now work with singularity shell --writable-tmpfs sc-dandelion.sif
.
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
Published by zktuong over 3 years ago
Multiple bug fixes.
Reworked filter_contigs
filter_contigs
where it just checks for v/j/c gene call mis-match. Toggled with simple = True
.rescue_igh
option with keep_highest_umi
option which applies to all locus.Updated the isotype dictionary to allow for mouse genes which seems to address #70.
Added more tests.
Updated ddl.pp.calculate_threshold
to reflect the updated shazam's disToNearest functionalities.
Added sanitization functions to check for data stored in dandelion is relatively compliant with airr-standards (barring missing columns from 10x's data or scirpy's transferred data.
Updated transfer of boolean columns to anndata to be stored as string rather than category during filter_contigs
.
Updated h5py requirement to be >=3.1.0. 2.10.0 should still work though. This led to some updates to how AnnData was storing info from dandelion after filter_contigs. Should update the tutorial in the next version.
TODO before merging:
Ongoing:
Slight issue with integration with scirpy https://github.com/icbi-lab/scirpy/pull/283, but should be solved. Will edit tests when the the pr is merged.
Need to create larger fixtures to get access to some steps within the functions.
Also need to test a mouse fixture. Maybe i should merge a large mouse fixture?