MIT License
I used conda and bioconda to manage software dependencies. To replicate the computing environment, you will need to complete the following 4 steps. Note that this is only guaranteed to work on a Linux-64 based architecture.
conda install anaconda-client; anaconda download jdblischak/cardioqtl
conda env create -n cardioqtl --file environment.yaml
source activate cardioqtl
source deactivate cardioqtl
The code is freely available for reuse with attribution via the MIT license.
Snakefile
- Implements the analysis pipeline
submit-snakemake.sh
- Submits individual jobs produced in
Snakefile
to Slurm. If your cluster uses a different job
scheduler, you'll need to edit this file and cluster.json
.
scripts/
- R scripts called by Snakefile
scratch/
- Exploratory analyses written in R Markdown
data/counts-subread.txt
- Gene counts after mapping to GRCh37 with
Subjunc and summing counts per gene with featureCounts (Subread
1.5.0p3). Includes all genes in Ensembl release 75 (i.e. protein
coding plus all other biotypes; see scripts/create-exons.R
).
data/counts-clean.txt
- Gene counts after removing samples 26302,
110232, and 160001 and removing genes with log2 cpm less than 0 (see
scripts/clean-counts.R
).
data/counts-normalized.txt
- Gene counts after normalizing to
N(0,1) within each sample follwed by normalizing to N(0,1) within
each gene. Used the R function qqnorm
(see
scripts/normalize-counts.R
).