dandelion

dandelion - A single cell BCR/TCR V(D)J-seq analysis package for 10X Chromium 5' data

AGPL-3.0 License

Downloads
905
Stars
85
Committers
5

Bot releases are visible (Hide)

dandelion - v0.3.5 Latest Release

Published by zktuong 9 months ago

What's Changed

New Contributors

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.4...v0.3.5

dandelion - v0.3.4

Published by zktuong 9 months ago

Summary

  • Speed up network generation in generate_network
  • Add soft filtering and normalisation to vdj_psuedobulk functions - @ktpolanski
  • Created a new column in .data (extra) to flag if contig is considered extra.
  • New clone id definition to insert VDJ and VJ to the id to reduce ambiguity - need to check if it does it properly for cells with no clone ids. This also means that now clone ids can be created for orphan chains.
  • New to_scirpy/from_scirpy functions that will now convert them to the new scverse airr formats - @amoschoomy
  • Container build is now simplified and uses mamba to manage all the dependencies.
  • New option to run preprocessing with ogrdb references in both the base package and the container.
  • New reference download function in the container folder to ensure the latest references are pulled for every new iteration of the container.
  • Deprecate support for python3.7 tests.

What's Changed

dependabot updates

New Contributors

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.3...v0.3.4

dandelion - v0.3.3

Published by zktuong about 1 year ago

What's Changed

  • Mainly updates and bug fixes to tl.clone_overlap and pl.clone_overlap.
  • simplified pre-processing functions to call command line arguments instead of running within the code.

Detailed notes:

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.2...v0.3.3

dandelion - v0.3.2

Published by zktuong over 1 year ago

What's Changed

Mainly to fix compatibility with dependencies.

New Contributors

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.1...v0.3.2

dandelion - v0.3.1

Published by zktuong over 1 year ago

What's Changed

Just to update pypi - Some bug fixes to accompany the revision
Doesn't affect the container image (but i should add a tag on sylabs to also call it 0.3.1 just to be consisten).

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.3.0...v0.3.1

dandelion - v0.3.0

Published by zktuong almost 2 years ago

What's Changed

This release adds a number of new features and minor restructuring to accompany Dandelion's manuscript (uploading soon). Kudos to @suochenqu and @ktpolanski

  1. data strategy to handle non-productive contigs, partial contigs and 'J multi-mappers'
  2. new V(D)J pseudotime trajectory inference!
  3. revamped tutorials and documents

Detailed PRs

New Contributors

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.4...v0.3.0

dandelion - v0.2.4

Published by zktuong over 2 years ago

What's Changed

New features

slicing functionality

  • the Dandelion object can now be sliced like a AnnData, or pandas DataFrame!
    vdj[vdj.data['productive'] == 'T']
    Dandelion class object with n_obs = 38 and n_contigs = 94
        data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
        metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    
    vdj[vdj.metadata['productive_VDJ'] == 'T']
    Dandelion class object with n_obs = 17 and n_contigs = 36
        data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
        metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    
    vdj[vdj.metadata_names.isin(['cell1', 'cell2', 'cell3', 'cell4', 'cell5'])]
    Dandelion class object with n_obs = 5 and n_contigs = 20
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    
    vdj[vdj.data_names.isin(['contig1','contig2','contig3','contig4','contig5'])]
    Dandelion class object with n_obs = 2 and n_contigs = 5
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    
    • not sure implementing it like adata[:, adata.var.something] make sense as it's not really row information in the data slot?
    • also the base slot in Dandelion is .data, and doesn't make sense for .metadata to be the 'row'
    • maybe https://github.com/scverse/scirpy/issues/327 can come up with a better strategy and i can adopt that later on.

ddl.pp.check_contigs

  • created a new function ddl.pp.check_contigs as a way to just check if contigs are ambiguous, rather than outright removing them. I envisage that this will eventually replace simple mode in ddl.pp.filter_contigs in the future.
    • new column in .data: ambiguous, T/F to indicate whether contig is considered ambiguous or not (different from cell level ambiguous).
    • the .metadata and several other functions ignores any contigs marked as T to maintain the same behaviour
    • The largest difference between ddl.pp.check_contigs and ddl.pp.filter_contigs is that the onus is on the user to remove any 'bad' cells from the GEX data (illustrated in the tutorial) with check_contigs whereas this happens semi-automatically with filter_contigs.

ddl.update_metadata now comes with a 'by_celltype' option

  • This brings a new feature - B cell, alpha-beta T cell and gamma-delta T cell associated columns for V,D,J,C and productive columns!
    • this is achieved through a new .retrieve_celltype subfunction in the Query class, which breaks up the retrieval into the 3 major groups if by_celltype = True.
    • No longer the need to guess which belongs to which and allows for easy slicing! This does cause a bit of .obs bloating.
    • Which leads to the removal of constant_status_VDJ, constant_status_VJ, productive_status_VDJ, productive_status_VJ as the metadata is getting bloated with the slight rework of Dandelion metadata slot to account for the new B/abT/gdT columns

tl.productive_ratio

  • Calculates a cell-level representation of productive vs non-productive contigs.
    • Plotting is achieved through pl.productive_ratio

tl.vj_usage_pca

  • Computes PCA on a cell-level representation of V/J gene usage across designated groupings
    • uses scanpy.pp.pca internally
    • Plotting can be achieved through scanpy.pl.pca

bug fixes

  • fix cell ordering issue https://github.com/scverse/scirpy/pull/347
  • small refactor of ddl.pp.filter_contigs
    • moved some of the repetitive loops into callable functions
    • deprecate filter_vj_chains argument and replaced with filter_extra_vdj_chains and filter_extra_vj_chains to hopefully enable more interpretable behaviour. fixes #158
    • umi adjustment step was buggy but i have now made the behaviour consistent with how it functions in ddl.pp.check_contigs
  • rearrangement_status_VDJ and rearrangement_status_VJ (renamed from rearrangement_VDJ_status and rearrangement_VJ_status) from now gives a single value for whether a chimeric rearrangement occured e.g. TRDV pairing with TRAJ and TRAC as in this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267242/
  • fixed issues with progress bars getting out of hand
  • fixed issue with ddl.tl.find_clones crashing if more than 1 type of loci is found in the data.
    • now a B, abT and gdT prefix will be appended to BCR/TR-ab/TR-gd clones.
  • check_contigs, find_clones and define_clones were removing non-productive contigs even though there's no need to. May cause issues with filter_contigs... but there's a problem for next time.
  • fix issue with min_size in network not behaving as intended. switch to using connected components to find which nodes to trim

other changes

  • new column chain_status, to summarise the reworked locus_status column.
    • Should contain values like ambiguous, Orphan VDJ, Single pair etc, similar to chain_pairing in scirpy.
  • Also fixed the ordering of metadata to make it more presentable, instead of just randomly slotting into the data frame.
  • ddl.concat now allows for custom suffix/prefix - only operates on sequence_id
  • remove .edges from Dandelion class because this doesn't get used anywhere and it's also stored in the networkx graphs
  • minimum spanning tree construction performed using networkx directly so that i don't have to keep changing the adjacency matrices from pandas to networkx back and forth
  • clean up documentation slightly

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.2...v0.2.4

dandelion - v0.2.3

Published by zktuong over 2 years ago

same as v0.2.2 but i seemed to have messed up the upload to pypi. so trying again.

What's Changed

Bug fixes and Improvements

  • Speed up generate_network
    • pair-wise hamming distance is calculated on per clone/clonotype only if more than 1 cell is assigned to a clone/clonotype
    • .distance slot is removed and is now directly stored/converted from the .graph slot.
    • new options:
      • compute_layout: bool = True. If dataset is too large, generate_layout can be switched to False in which case only the networkx graph is returned. The data can still be visualised later with scirpy's plotting method (see below).
      • layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'. New default uses the ultra-fast C++ implemented sfdp_layout algorithm in graph-tools to generate final layout. sfdp stands for Scalable Force Directed Placement.
        • Minor caveat is that the repulsion is not as good - when there's a lot of singleton nodes, they don't separate well unless you some how work out which of the parameters in sfdp_layout to tweak will produce an effective separate. changing gamma alone doesn't really seem to do much.
        • The original layout can still be generated by specifying layout_method = 'mod_fr'. Requires a separate installation of graph-tool via conda (not managed by pip) as it has several C++ dependencies.
        • pytest on macos may also stall because of a different backend being called - this is solved by changing tests that calls generate_network to run last.
    • added steps to reduce memory hogging.
    • min_size was doing the opposite previously and this is now fixed. #155
  • Speed up transfer
    • Found a faster way to create the connectivity matrix.
    • this also now transfer a dictionary that scirpy can use to generate the plots https://github.com/scverse/scirpy/issues/286
    • Fix #153
      • rename productive to productive_status.
  • Fix #154
    • reorder the if-else statements.
  • Speed up filter_contigs
    • tree construction is simplified and replaced for-loops with dictionary updates.
  • Speed up initialise_metadata. Dandelion should now initialise and read faster.
    • Removed an unnecessary data sanitization step when loading data.
    • Now load_data will rename umi_count to duplicate_count
    • Speed up Query
      • tree construction is simplified and replaced for-loops with dictionary updates.
      • didn't need to use an airr validator as that slows things down.
  • data initialised by Dandelion will be ordered based on productive first, then followed by umi count (largest to smallest).

Breaking Changes

  • initialise_metadata/update_metadata/Dandelion
    • For-loops to initialise the object has veen vectorized, resulting in a minor speed uprade
    • This results in reduction of some columns in the .metadata which were probably bloated and not used.
      • vdj_status and vdj_status_summary removed and replaced with rearrangement_VDJ_status and rearrange_VJ_status
      • constant_status and constant_summary removed and replaced with constant_VDJ_status and constant_VJ_status.
      • productive and productive_summary combined and replaced with productive_status.
      • locus_status and locus_status_summary combined and replaced with locus_status.
      • isotype_summary replaced with isotype_status.
  • where there was previously unassigned or '' has been changed to :str: None in .metadata.
    • Not changed to NoneType as there's quite a bit of text processing internally that gets messed up if swapped.
    • No_contig will still be populated after transfer to AnnData to reflect cells with no TCR/BCR info.
  • deprecate use of nxviz<0.7.4

Minor changes

  • Rename and deprecate read_h5/write_h5. Use of read_h5ddl/write_h5ddl will be enforced in the next update.

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.1...v0.2.2

dandelion - v0.2.2

Published by zktuong over 2 years ago

What's Changed

Bug fixes and Improvements

  • Speed up generate_network
    • pair-wise hamming distance is calculated on per clone/clonotype only if more than 1 cell is assigned to a clone/clonotype
    • .distance slot is removed and is now directly stored/converted from the .graph slot.
    • new options:
      • compute_layout: bool = True. If dataset is too large, generate_layout can be switched to False in which case only the networkx graph is returned. The data can still be visualised later with scirpy's plotting method (see below).
      • layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'. New default uses the ultra-fast C++ implemented sfdp_layout algorithm in graph-tools to generate final layout. sfdp stands for Scalable Force Directed Placement.
        • Minor caveat is that the repulsion is not as good - when there's a lot of singleton nodes, they don't separate well unless you some how work out which of the parameters in sfdp_layout to tweak will produce an effective separate. changing gamma alone doesn't really seem to do much.
        • The original layout can still be generated by specifying layout_method = 'mod_fr'. Requires a separate installation of graph-tool via conda (not managed by pip) as it has several C++ dependencies.
        • pytest on macos may also stall because of a different backend being called - this is solved by changing tests that calls generate_network to run last.
    • added steps to reduce memory hogging.
    • min_size was doing the opposite previously and this is now fixed. #155
  • Speed up transfer
    • Found a faster way to create the connectivity matrix.
    • this also now transfer a dictionary that scirpy can use to generate the plots https://github.com/scverse/scirpy/issues/286
    • Fix #153
      • rename productive to productive_status.
  • Fix #154
    • reorder the if-else statements.
  • Speed up filter_contigs
    • tree construction is simplified and replaced for-loops with dictionary updates.
  • Speed up initialise_metadata. Dandelion should now initialise and read faster.
    • Removed an unnecessary data sanitization step when loading data.
    • Now load_data will rename umi_count to duplicate_count
    • Speed up Query
      • tree construction is simplified and replaced for-loops with dictionary updates.
      • didn't need to use an airr validator as that slows things down.
  • data initialised by Dandelion will be ordered based on productive first, then followed by umi count (largest to smallest).

Breaking Changes

  • initialise_metadata/update_metadata/Dandelion
    • For-loops to initialise the object has veen vectorized, resulting in a minor speed uprade
    • This results in reduction of some columns in the .metadata which were probably bloated and not used.
      • vdj_status and vdj_status_summary removed and replaced with rearrangement_VDJ_status and rearrange_VJ_status
      • constant_status and constant_summary removed and replaced with constant_VDJ_status and constant_VJ_status.
      • productive and productive_summary combined and replaced with productive_status.
      • locus_status and locus_status_summary combined and replaced with locus_status.
      • isotype_summary replaced with isotype_status.
  • where there was previously unassigned or '' has been changed to :str: None in .metadata.
    • Not changed to NoneType as there's quite a bit of text processing internally that gets messed up if swapped.
    • No_contig will still be populated after transfer to AnnData to reflect cells with no TCR/BCR info.
  • deprecate use of nxviz<0.7.4

Minor changes

  • Rename and deprecate read_h5/write_h5. Use of read_h5ddl/write_h5ddl will be enforced in the next update.

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.1...v0.2.2

dandelion - v0.2.1

Published by zktuong over 2 years ago

What's Changed

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.2.0...v0.2.1

dandelion - 0.2.0

Published by zktuong over 2 years ago

What's Changed

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.1.12...v0.2.0

dandelion - v0.1.12

Published by zktuong over 2 years ago

What's Changed

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.1.11...v0.1.12

dandelion - v0.1.11

Published by zktuong almost 3 years ago

What's Changed

Full Changelog: https://github.com/zktuong/dandelion/compare/v0.1.10...v0.1.11

dandelion - v0.1.10

Published by zktuong about 3 years ago

Fix minor bug in TCR preprocessing.
Fix documentation building script.
Add logging in singularity container.
Also testing an automated build test for creating singularity container that seems to be working:
https://github.com/zktuong/github-ci
Added support statement in readme.

dandelion - v0.1.9

Published by zktuong about 3 years ago

  • bug fix for numeric dtypes being converted incorrectly during saving, resulting in empty columns.
  • Adjustment to Query to return numerical queries properly.
dandelion - v0.1.8

Published by zktuong about 3 years ago

  • Much required speed upgrade for the following newly added functions to be fully usable:
    • Refactored filter_contigs/FilterContigs/FilterContigLite. Solves #92
      • Reworked into a tree format where it iterates the rows to form cells (~1k iter/s), and then iterate through the cells (~150 iter/s) compared to previous 3-4 iter/s.
      • Adjusted small bug where the duplicate_counts were not adding up
    • Refactored Query.
      • Now does the __init__ method preloads the required fields as a tree and a separate retrieve method and access the dictionaries much faster. Same method as above.
    • read_10x_vdj
      • Refactored parse_annotation which was slowing everything down.
      • Similar method to above
    • sanitize_data
      • Use airr.RearrangementSchema to match the dtype to deal with missing values. Also speed up some steps to make the validation faster.
      • Also bug fix causing float columns to be unintentially converted to integers e.g. mu_freq columns should hopefully now return properly.
  • Added write_airr function that basically calls airr.create_rearrangement
  • Adjusted quantify_mutations to hopefully return the right dtypes now.
  • Added option to filter_contigs to run without anndata.
  • Bug fix to dandelion_preprocessing.py to let it run quantify_mutations based on the args.file_prefix.
dandelion - v0.1.7

Published by zktuong about 3 years ago

  • Refactored the now depreicated retrieve_metadata which is now a new internal Query class.
  • Bug fix for sanitization of Dandelion object.
  • No longer require to split locus when initializing. Dandelion should now parse everything.
    • find_clones still require locus of either ig or tr to be specified.
  • Largely addressed:
    • #62
    • #63
dandelion - v0.1.6

Published by zktuong about 3 years ago

  • bug fix when sanitizing dtypes and validating airr columns.
  • Switch from enforcing pd.Int64Dtype to just switch to np.float64 if there is missing info.
  • speed up sanitization steps.
dandelion - v0.1.5

Published by zktuong about 3 years ago

Updates and bug fixes

  • Added a series of functions to guess the best format for locus as the previous try-except statements were not working.

  • Added a few more tests - now code coverage is 70%!

  • Added sanitize_dtype function so that saving and conversion of format with R/h5/h5ad works.

  • Changed bool to str where possible so as not to intefere with saving in h5ad with h5py>=3.0.0.

  • Minor annotation update: Using Optional instead of Union[None, whatever].

  • Fix broken links in docs.

  • Bug fixes for compatibility with: mouse #70 and TCR data #81.

  • Some bug fixes to container script.

  • Updated container definition:

    • Now will try to build R version >= 4.
    • Added a test suite in the container so that it automatically runs pytest after building.
    • Trying out whether appending the paths to SINGULARITY_ENVIRONMENT first helps with #79.
  • Added pytest suite within the container. Now I can quickly test like as follows:

    # devel and testing
    sudo singularity build --notest sc-dandelion_dev.sif sc-dandelion_dev.def
    sudo singularity test --writable-tmpfs sc-dandelion_dev.sif
    
    # for release
    sudo singularity build --notest sc-dandelion.sif sc-dandelion.def
    sudo singularity test --writable-tmpfs sc-dandelion.sif
    singularity sign sc-dandelion.sif
    singularity verify sc-dandelion.sif
    singularity push sc-dandelion.sif library://kt16/default/sc-dandelion
    

    The test requires root access otherwise wouldn't be able to write into the container. Should look into whether sandbox/fakeroot modes works.

  • Interactive shell session of the container should now work with singularity shell --writable-tmpfs sc-dandelion.sif.

Still needs work

#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?

dandelion - v0.1.4

Published by zktuong over 3 years ago

  • Multiple bug fixes.

  • Reworked filter_contigs

    • Added a 'lite' mode for filter_contigs where it just checks for v/j/c gene call mis-match. Toggled with simple = True.
    • Split the filtering between and productive and non-productive contigs:
      • Replaced rescue_igh option with keep_highest_umi option which applies to all locus.
  • Updated the isotype dictionary to allow for mouse genes which seems to address #70.

  • Added more tests.

  • Updated ddl.pp.calculate_threshold to reflect the updated shazam's disToNearest functionalities.

  • Added sanitization functions to check for data stored in dandelion is relatively compliant with airr-standards (barring missing columns from 10x's data or scirpy's transferred data.

  • Updated transfer of boolean columns to anndata to be stored as string rather than category during filter_contigs.

  • Updated h5py requirement to be >=3.1.0. 2.10.0 should still work though. This led to some updates to how AnnData was storing info from dandelion after filter_contigs. Should update the tutorial in the next version.

TODO before merging:

  • complete 10x output parser
  • address #54

Ongoing:

  • Slight issue with integration with scirpy https://github.com/icbi-lab/scirpy/pull/283, but should be solved. Will edit tests when the the pr is merged.

  • Need to create larger fixtures to get access to some steps within the functions.

  • Also need to test a mouse fixture. Maybe i should merge a large mouse fixture?