BASC: barcoded single-cell data processing pipeline.

BASC is a snakemake workflow for processing single-cell chromatin data containing multiple barcodes per sequenced fragment. Most single-cell chromatin datasets contain a single barcode (the cell barcode), and can be handled by other processing pipelines such as cellranger-atac, or simpler methods. This workflow is designed for experiments where there are 1 or 2 additional barcodes sequenced for each DNA fragment, which may encode different information such as the data modality measured by the DNA fragment, droplet combinatorial indexing well of origin, or both.

Installation

The workflow is implemented using Snakemake, and contains several dependencies. These dependencies can be easily installed using conda (or mamba).

To get started, clone the git repository:

git clone [email protected]:timoast/basc.git

To install dependencies in a new conda environment:

# using conda
conda env create -f env.yaml

# using mamba
mamba env create -f env.yaml

Configuring the BASC pipeline

To configure BASC, create a config file containing the barcode combinations for each sample. See config.example for the list of required parameters in the config file. Alternatively, individual configuration parameters can be passed to snakemake on the command line, for example:

snakemake --config reference=/path/to/reference name=NTT

Create a tab-delimited file describing the barcode sequences that were used for each sample or assay. Example samples.tsv file:

sample_name	well	mark	i5_index	i7_index	sample_index
whole_cell	1	H3K27me3	GGATTGCT	AACAACAC	GGACTCCT,TAGGCATG
whole_cell	2	H3K27ac	GTGTGACC	ACGTATGG	GGACTCCT,TAGGCATG
nuclei	1	S2S5P	CGTCTATG	AACATTCC	CACATCGG,GGTTGGCA

Running the workflow

To execute the snakemake workflow first activate the conda environment containing all of the dependencies: conda activate basc. Next, run the snakemake workflow specifying the configfile parameter.

Setting the -n option will perform a dry-run, and allow you to see what steps will be executed in the workflow without computing anyting. Setting the -j parameter will start the workflow with the desired number of cores.

snakemake --configfile /path/to/config -j 24

See the snakemake documentation for more information about running snakemake and all the available options.

Workflow steps

The BASC pipeline executes the following steps:

Barcode FASTA files are generated from the config file
Combinatorial barcode demultiplexing using a custom python script
Map reads to the genome using bwa-mem2
BAM file sorted and indexed using samtools
Fragment file created using sinto

Related Projects

solida

SOLIDA is command-line solution that facilitate the reproducibility and portability of NGS pipeli...

04 Nov 2017 4

covid19

SARS-CoV-2 analysis pipeline for short-read, paired-end illumina sequencing

24 Jan 2021 6

sinto

Tools for single-cell data processing

28 Mar 2019 117

bs3

BS-Seeker3: An Ultra-fast, Versatile Pipeline for Mapping Bisulfite-treated Reads.

17 Aug 2016 26

demuxalot

Reliable, scalable, efficient demultiplexing for single-cell RNA sequencing

22 Jan 2020 23

snakemake-parallel-bwa

A Snakefile to parallelize bwa.

11 Apr 2013 13

snakefiles

Snakefiles for common RNA-seq data analysis workflows (STAR and Kallisto).

24 Nov 2015 87

riboraptor

Tool for ribo-seq analysis. Most of the functionality moved to ribotricer (https://github.com/smi...

06 Mar 2017 13

pATLASflow

A pipeline to run mapping, mash screen and assembly methods for pATLAS.

30 Jan 2018 2

SNARE-seq

Code to process raw SNARE-seq ATAC reads

26 May 2020 7

metassemble

Scripts to run several metagenomics assembly programs

05 Jul 2012 7

miseq-16S-pipeline

HPC pipeline for overlapping V4 16S rRNA reads generated on Illumina MiSeq

29 Jul 2015 11

pybda

A commandline tool for analysis of big biological data sets for distributed HPC clusters.

13 Jul 2018 9

singlecell-ige

Alignment and antibody assembly pipelines for Croote et al. (Science, 2018)

27 Aug 2018 8

singlecell-qtl

Discovery and characterization of variance QTLs in human induced pluripotent stem cells

09 Aug 2017 9