SOAPdenovo-Trans

SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse.

GPL-3.0 License

Stars
31
Committers
4

Manual of SOAPdenovo-Trans

Introduction

SOAPdenovo-Trans is a de novo transcriptome assembler basing on the SOAPdenovo framework, adapt to alternative splicing and different expression level among transcripts.The assembler provides a more accurate, complete and faster way to construct the full-length transcript sets.

System Requirement

SOAPdenovo-Trans aims for the transcript assembly. It runs on 64-bit Linux systems. For animal transcriptomes like mouse, about 30-35GB memory would be required.

Update Log

1.04 | 2014-04-22 15:00:00 +0800 (Tue, 22 Apr 2014) Fixes a number of 'seqmentation fault' errors on different kinds of data. (Thanks for Chris Boursnell (twitter: @chrisboursnell) fixing the bugs.)

1.03 | 2013-07-19 12:00:00 +0800 (Fri, 19 Jul 2013) Add the function: calculate RPKM (Reads per Kilobase of assembled transcripts per Million mapped reads).

Installation

  1. You can download the pre-compiled binary according to your platform, unpack and execute directly.
  2. Or download the source code, unpack to ${destination folder} with the method above, and compile by using GNU make with command "sh make.sh" at ${destination folder} and generate the executable files "SOAPdenovo-Trans-31mer" and "SOAPdenovo-Trans-127mer".

How to use it

1. Configuration file

The configuration file in SOAPdenovo-Trans is mostly the same as SOAPdenovo, but there is no "rank" parameter. The configuration file tells the assembler where to find these files and the relevant information. "example.config" demonstrates how to organize the information and make configuration file.

The configuration file has a section for global information, and then multiple library sections. Right now only "max_rd_len" is included in the global information section. Any read longer than max_rd_len will be cut to this length. The library information and the information of sequencing data generated from the library should be organized in the corresponding library section. Each library section starts with tag [LIB] and includes the following items:

2. Get it started

Once the configuration file is available, the simplest way to run the assembler is:

User can also choose to run the assembly process step by step as:

NOTE: SOAPdenovo-Trans has two versions: SOAPdenovo-Trans-31mer and SOAPdenovo-Trans-127mer.

3. Options:

4. Output files

These files are output as assembly results: *.contig contig sequence file *.scafSeq scaffold sequence file There are some other files that provide useful information for advanced users, which are listed in Appendix B.

5. Parameter adjustment

APPENDIX A: example.config

APPENDIXA B:

1. Output files from the command "pregraph"

2. Output files from the command "contig"

3. Output files from the command "map"

4. Output files from the command "scaff"