dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5' data
AGPL-3.0 License
Bot releases are hidden (Show)
Published by zktuong over 3 years ago
Added locus
options to tools that uses Dandelion class.
Add coverage bit, which required moving of test folder within dandelion.
Bug fix for singularity container and script.
Published by zktuong over 3 years ago
Updates
scirpy
names the columns. Same goes for mentions of BCR
is renamed to contig
where appropriate.filter_bcr
to filter_contigs
.filename_prefix
to control the behaviour during the preprocessing step better.filter_contigs
have been partially reworked - no longer require a multiple core implementation as the new implemenation now runs faster without it.umi_count
is now treated as a back up to duplicate_count
if there is modification of the duplicate_count
due to filter_contigs
.locus
option to filter_contigs
so that it can work with the new implementation of the data class.tr
pre-processing mode. Kudos to Krzysztof #80.Bug Fix
filter_contig
.umi_count
vs duplicate_count
behaviour more consistent. Now duplicate_count
is the default column.Ongoing
Command killed due to excessive memory consumption
.scirpy
. Also need to write native 10x data parser to reduce reliance on scirpy
.Known issues
~/.bash_rc
is present:
~/.bash_rc
is present and within it comes with conda initialization code, then the default conda path will be appended to the front of the container's $PATH
. This impacts on users who install igblast
and blast
via conda + also want to use the container. What will happen is that the container will try to use the blast outside the container first, which may then lead to issues where it cannot find the database files.--no-home
doesn't solve the issue completely as scanpy
requires a writeable numba_cache_dir
. Currently the container will hopefully try to create a $PWD/dandelion_cache
folder if this is an issue but needs more testing.#54 Check productive_only
option in filter_bcr
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
#70 Check for compatibilty with mouse data
Published by zktuong over 3 years ago
A new name for v0.1.1.post1 to fix pypi issue.
Published by zktuong over 3 years ago
setuptools_scm
to pull the version and predict next update version number.git tag -a v0.1.1
git push --tags
filter_bcr
A minor edit, but decided to switching the fold-change cut off back to 2 to fit with the original filtering strategy.
setting up container
Related to #51, to create a preprocessing wrapper, we will add in an option to export the plots generated during pre-processing.
Simple addition of a boolean save_plot
option to:
reassign_allele
reassign_alleles
assign_isotype
assign_isotypes
#51 Preprocessing wrapper for singularity container
#54 Check productive_only
option in filter_bcr
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
#70 Check for compatibilty with mouse data
scirpy interopertability
Now fully works with scirpy's tool to transfer the data format.
singularity container
Singularity container recipe/image created.
ls
# database environment.yml ncbi-blast-2.10.1+ ncbi-igblast-1.15.0 sc-dandelion.def
singularity build --fakeroot sc-dandelion.sif sc-dandelion.def
singularity sign sc-dandelion.sif
singularity verify sc-dandelion.sif
singularity push sc-dandelion.sif library://kt16/default/sc-dandelion:latest
To download and use:
singularity pull library://kt16/default/sc-dandelion:latest
singularity shell sc-dandelion.sif
- **pre-processing plots saving**
Meant to be used primarily for the container, but now specifiying `save_plot` into `reassign_alleles` and `assign_isotypes` will save the plots from pre-processing accordingly.
## Depreciated:
I should start using depreciation decorators but only one major change: `reassign_alleles_` no longer used/available.
Published by zktuong over 3 years ago
Github actions - tests
Fixed issue with tests failing. Mostly resolved.
find_clones and generate_network
Bugs were introduced with the v0.028 due to the rework of dandelion initialization causing some clones to be excessively splitted (multiple '|' separators), IgM/IgD catcher was skipping a few steps unnecessarily, and networks were not generating properly due to incorrect index referencing. This has now been corrected. Switched back to using squareform + pdist for calculating distances to simplify the code. May revisit in the future.
filter_bcr
Similar issue as above, contigs with multiple IgH were not flagged properly and this resulted in extremely noisy data. This is now fixed and the IgM|IgD catcher is actually working properly now! Also added a productive_only
toggle to only retain bcrs that are determined to be productive. Allow the user the flexibility to change this. Perhaps i should add the same toggle to all other functions so that the contigs can pass through and be flexibly used in subsequent steps?
update_metadata
Found some typos.
quantify_mutation
Fixed issue where original code wasn't allowing for the R-objects to be parsed properly when specifting non-NULL arguments.
miscellaneous
updated codes to prevent syntax warnings.
annotation
All main functions visible to user are annotated. Can be improved further for clarity but for next version update.
plotting issues
There's an issue in reassign_alleles where sometimes plotting fail. I've added a try-except step to try and circumvene the error while i search for what it actually is.
metadata info
I think a column is needed to deal with multi-chain flagging properly. It's currently not ideal as IgM/IgD cells can be flagged as multi and calling them single isn't the right solution.
Published by zktuong almost 4 years ago
Prevent transfer from overwriting anndata.obs
columns
This was causing issues if the column name already exists. Changed this to now respect the column in anndata.obs
first.
Slight adjustment to calculation of gini indices
Added a single zero to the end of each sorted array if the array length is longer than 1 so that the Lorenz curve starts from 0. This effectively solves the problem where the gini indices were originally being returned as negative values. Also added some desecription into the relevant locations to describe why the tabulation for clone/node degree/centrality is different compared to cluster size.
Fixed downsampling and metric options
Some parts were ignoring the options because I forgot to update them in the main function within functions.
Fix rpy2 dependency to be <3.3.5 until I find out what's wrong
Recently updated my mac's R to version 4 and rpy2 >3.3.5 really didn't like it. Not sure if this would be be fixed in 3.4?
Vertex size gini calculation
Finally had a crack at trying to implement this method. This was proving to be a challenge as native implementation with networkx's node contractions tools was fine for sparse clones but really struggles in highly connected samples. The workaround is a simple counter but having access to the graph would be ideal... Anyway, I will have to think about this more more in the future~, and whether I should remake the clone size gini to use a similar implementation to reflect the network more~. Currently this involves the reconstruction of networks which is quite time consuming, especially if the sample is large.
cluster size gini calculation
This is an attempt to also perform the gini calculation after network contraction. However, it also revealed a problem in that if the sample is not deeply sampled, then the gini index will not be reflected appropriately. For now, an option is placed in to choose whether or not to use the contracted network.
clone_diversity gini calculation for anndata
No longer possible as the function will now absolutely require the the network to be present first i.e. it requires a dandelion object.
Published by zktuong almost 4 years ago
Many bug fixes.
Rehauled preprocessing steps to allow for more flexibility with barcode naming and reannotation strategy to be more in line with immcantation's recommendations.
New functions to plot clonal overlap as a circos plot which requires separate installation of nxviz. Didn't put nxviz in the requirements/setup as there's a few conflicts with matplotlib and others during installation.
Published by zktuong about 4 years ago
Many bug fixes.
Smoothed out the initial preprocessing functions to allow for more flexible input options.
Allows for package to be reticulated (somewhat).
Published by zktuong about 4 years ago
Major bug fixes.
Updated to work with AnnData>=7.1.0
Sped up creation of networks, filtering of BCR by allowing for parallelization.
Changed dependency on igraph to networkx.
Changed default behaviour for filter_bcr to drop contigs instead of filtering barcodes for those that only contain poor quality BCR data.
Added diversity estimation tools.
Published by zktuong over 4 years ago
First release to mark a stable version that contains the important functions for preprocessing, generation of networks, and integration with scanpy.