dandelion | Jupyter Notebook Ecosystem Directory

Bot releases are hidden (Show)

dandelion - Minor updates to v0.1.3

Published by zktuong over 3 years ago

Added locus options to tools that uses Dandelion class.

Add coverage bit, which required moving of test folder within dandelion.

Bug fix for singularity container and script.

dandelion - v0.1.3

Published by zktuong over 3 years ago

Updates

Adjusted names and functions to allow for TCR data in AIRR format to be handled. Instead of heavy/light, the naming convention will be using VDJ/VJ - consistent with how scirpy names the columns. Same goes for mentions of BCR is renamed to contig where appropriate.
Renamed function name from filter_bcr to filter_contigs.
Added option for filename_prefix to control the behaviour during the preprocessing step better.
Updates should work with recently merged PR to modify the container script.
filter_contigs have been partially reworked - no longer require a multiple core implementation as the new implemenation now runs faster without it.
umi_count is now treated as a back up to duplicate_count if there is modification of the duplicate_count due to filter_contigs.
Added locus option to filter_contigs so that it can work with the new implementation of the data class.
Singularity container now has additional options to trigger tr pre-processing mode. Kudos to Krzysztof #80.
Rewrote tests to stop downloading files everytime it needs to start a test.
- This lets me write very specific tests! Will have to expand on this eventually.

Bug Fix

~~Remove extra filtering section in single-core implementation of filter_contig.~~
make the umi_count vs duplicate_count behaviour more consistent. Now duplicate_count is the default column.

Ongoing

~~Tests are failing for the preprocess script. Perhaps I should break it up to find out exactly what's wrong. Readthedocs is also complaining that Command killed due to excessive memory consumption.~~
Rewriting unit tests. Seems like they are working so far (minus some typos here and there)
Need to add in more detailed tests.
Need to add in tests to switch data with scirpy. Also need to write native 10x data parser to reduce reliance on scirpy.

Known issues

The current instructions to use singularity comes with a few issues if ~/.bash_rc is present:
- If the ~/.bash_rc is present and within it comes with conda initialization code, then the default conda path will be appended to the front of the container's $PATH. This impacts on users who install igblast and blast via conda + also want to use the container. What will happen is that the container will try to use the blast outside the container first, which may then lead to issues where it cannot find the database files.
- Specifying --no-home doesn't solve the issue completely as scanpy requires a writeable numba_cache_dir. Currently the container will hopefully try to create a $PWD/dandelion_cache folder if this is an issue but needs more testing.

Still needs work

#54 Check productive_only option in filter_bcr
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
#70 Check for compatibilty with mouse data

dandelion - v0.1.2

Published by zktuong over 3 years ago

A new name for v0.1.1.post1 to fix pypi issue.

dandelion - v0.1.1.post1

Published by zktuong over 3 years ago

Bug fixes and updates:

versioning
Changed versioning strategy to now use setuptools_scm to pull the version and predict next update version number.
Note to self: From now on, any major updates just requires a new tag. For example:
After the merge of the pull request, set a new tag and push

git tag -a v0.1.1
git push --tags

filter_bcr
A minor edit, but decided to switching the fold-change cut off back to 2 to fit with the original filtering strategy.
setting up container
Related to #51, to create a preprocessing wrapper, we will add in an option to export the plots generated during pre-processing.
Simple addition of a boolean save_plot option to:
reassign_allele
reassign_alleles
assign_isotype
assign_isotypes

Still needs work

#51 Preprocessing wrapper for singularity container
#54 Check productive_only option in filter_bcr
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
#70 Check for compatibilty with mouse data

New features:

scirpy interopertability
Now fully works with scirpy's tool to transfer the data format.
singularity container
Singularity container recipe/image created.

ls
# database  environment.yml  ncbi-blast-2.10.1+  ncbi-igblast-1.15.0  sc-dandelion.def
singularity build --fakeroot sc-dandelion.sif sc-dandelion.def
singularity sign sc-dandelion.sif
singularity verify sc-dandelion.sif
singularity push sc-dandelion.sif library://kt16/default/sc-dandelion:latest 

To download and use:

singularity pull library://kt16/default/sc-dandelion:latest
singularity shell sc-dandelion.sif


- **pre-processing plots saving**
Meant to be used primarily for the container, but now specifiying `save_plot` into `reassign_alleles` and `assign_isotypes` will save the plots from pre-processing accordingly.

## Depreciated:

I should start using depreciation decorators but only one major change: `reassign_alleles_` no longer used/available.

dandelion - v0.1.0

Published by zktuong over 3 years ago

Bug fixes and updates:

Github actions - tests
Fixed issue with tests failing. Mostly resolved.
find_clones and generate_network
Bugs were introduced with the v0.028 due to the rework of dandelion initialization causing some clones to be excessively splitted (multiple '|' separators), IgM/IgD catcher was skipping a few steps unnecessarily, and networks were not generating properly due to incorrect index referencing. This has now been corrected. Switched back to using squareform + pdist for calculating distances to simplify the code. May revisit in the future.
filter_bcr
Similar issue as above, contigs with multiple IgH were not flagged properly and this resulted in extremely noisy data. This is now fixed and the IgM|IgD catcher is actually working properly now! Also added a productive_only toggle to only retain bcrs that are determined to be productive. Allow the user the flexibility to change this. Perhaps i should add the same toggle to all other functions so that the contigs can pass through and be flexibly used in subsequent steps?
update_metadata
Found some typos.
quantify_mutation
Fixed issue where original code wasn't allowing for the R-objects to be parsed properly when specifting non-NULL arguments.
miscellaneous
updated codes to prevent syntax warnings.
annotation
All main functions visible to user are annotated. Can be improved further for clarity but for next version update.

Still needs work

plotting issues
There's an issue in reassign_alleles where sometimes plotting fail. I've added a try-except step to try and circumvene the error while i search for what it actually is.
metadata info
I think a column is needed to deal with multi-chain flagging properly. It's currently not ideal as IgM/IgD cells can be flagged as multi and calling them single isn't the right solution.

New features:

Integration with scirpy
Dandelion can read scirpy's processed output. Related to https://github.com/icbi-lab/scirpy/pull/240
Will update when scirpy's AirrCell class is fully implemented.

Depreciated:

dandelion - v0.0.26

Published by zktuong almost 4 years ago

Bug fixes and updates:

Prevent transfer from overwriting anndata.obs columns
This was causing issues if the column name already exists. Changed this to now respect the column in anndata.obs first.

Slight adjustment to calculation of gini indices
Added a single zero to the end of each sorted array if the array length is longer than 1 so that the Lorenz curve starts from 0. This effectively solves the problem where the gini indices were originally being returned as negative values. Also added some desecription into the relevant locations to describe why the tabulation for clone/node degree/centrality is different compared to cluster size.

Fixed downsampling and metric options
Some parts were ignoring the options because I forgot to update them in the main function within functions.

Fix rpy2 dependency to be <3.3.5 until I find out what's wrong
Recently updated my mac's R to version 4 and rpy2 >3.3.5 really didn't like it. Not sure if this would be be fixed in 3.4?

New features:

Vertex size gini calculation
Finally had a crack at trying to implement this method. This was proving to be a challenge as native implementation with networkx's node contractions tools was fine for sparse clones but really struggles in highly connected samples. The workaround is a simple counter but having access to the graph would be ideal... Anyway, I will have to think about this more more in the future~, and whether I should remake the clone size gini to use a similar implementation to reflect the network more~. Currently this involves the reconstruction of networks which is quite time consuming, especially if the sample is large.

cluster size gini calculation
This is an attempt to also perform the gini calculation after network contraction. However, it also revealed a problem in that if the sample is not deeply sampled, then the gini index will not be reflected appropriately. For now, an option is placed in to choose whether or not to use the contracted network.

Depreciated:

clone_diversity gini calculation for anndata
No longer possible as the function will now absolutely require the the network to be present first i.e. it requires a dandelion object.

dandelion - v0.0.21

Published by zktuong almost 4 years ago

Many bug fixes.

Rehauled preprocessing steps to allow for more flexibility with barcode naming and reannotation strategy to be more in line with immcantation's recommendations.

New functions to plot clonal overlap as a circos plot which requires separate installation of nxviz. Didn't put nxviz in the requirements/setup as there's a few conflicts with matplotlib and others during installation.

dandelion - v0.0.16

Published by zktuong about 4 years ago

Many bug fixes.

Smoothed out the initial preprocessing functions to allow for more flexible input options.

Allows for package to be reticulated (somewhat).

dandelion - v0.0.14

Published by zktuong about 4 years ago

Major bug fixes.

Updated to work with AnnData>=7.1.0

Sped up creation of networks, filtering of BCR by allowing for parallelization.

Changed dependency on igraph to networkx.

Changed default behaviour for filter_bcr to drop contigs instead of filtering barcodes for those that only contain poor quality BCR data.