Scripts to run several metagenomics assembly programs
Content
Overview
Dependencies
Installation
Usage
Overview ===========
MetAssemble is a pipeline that runs several metagenomic assembly strategies combining Velvet, Meta-Velvet, Minimus2, Ray and Bambus2 on Illumina paired end reads. The pipeline was originally developed to validate the performance of the individual strategies, but can be used to perform the assembly strategies without validation as well. The pipeline is written in GNU make and not very user friendly for the average user, but if you are familiar with GNU make you shouldn't have too many troubles getting it to run. The only other metagenomics assembly pipeline that I am aware of is metAMOS, which seems to be an effort towards a more user-friendly approach if you are looking for that. A reason for using MetAssemble instead is because it allows one to schedule parts of the assembly pipeline with sbatch or qsub. Different steps in the assembly pipeline require different resources. Velvet for instance runs on only one node, whereas Ray runs over multiple. MetAssemble allows you to specify resource usage per rule with gnu-make-job-scheduler. Furthermore GNU make makes sure intermediate output files don't have to be recomputed in case of an error.
Dependencies =============== Dependencies need to be installed by oneself. There is no automated way to do this at the moment. One can however check if the dependencies are met by running
bash test/dependencies/test_dependencies.sh
Do note that it is not necessary to install all programs if you only want to do a subset of the assemblies that MetAssemble covers. MetAssemble requires the following programs to perform all different assemblies:
Supported input:
Running the MetAssemble pipeline (scripts/Makefile) requires
The Makefile features four steps of the metagenomic assembly pipeline:
Read processing.
Assembling contigs
Merging contigs
Scaffolding
Installation
===============
After installing all the dependencies point METASSEMBLE_DIR environment variable
to the root directory of this repository e.g.:
export METASSEMBLE_DIR='~/gitrepos/metassemble'
. You can do
a test run with cd test && make test
, which downloads a small set from
the HMP project and runs a subset of all different assembly strategies in the
MetAssemble pipeline.
Usage ======== See example in examples/chris-mock. There is a Makefile and a Makefile-sbatch which set some input paramaters and then include scripts/metassemble.mk and scripts/metassemble-scheduler.mk respectively. Hopefully that is clear enough to help you understand how to run your own subset of the available assembly strategies. If you want to change the resource usage per rule, change Makefile-sbatch accordingly. In the future I might add automatic computation of the resource usage. For assembly this is unfortunately still a problem, since it depends on the complexity of your sample and not just the filesize. The specified resource usage is for a library of ~1M and a mixed community of 60 bacteria and archaeae.
To see which assemblies have been created:
make echoexisting
All assemblies, created or not:
make echoall
To create all:
make all
Only show commands:
make -n all
Only make velvet:
make velvet
Schedule rules with sbatch:
make -f Makefile-sbatch all
For more rules check in the scripts/parameters.mk file.