Extract 3'UTR, 5'UTR, CDS, Promoter, Genes, Introns, Exons from GTF files
BSD-2-CLAUSE License
Extract 3'UTR, 5'UTR, CDS, Promoter, Genes from GTF files.
If you only care about the final output, they are hosted build and GTF version wise on riboraptor.
We recommend setting up a conda environment with Python>=3
and Python<=3.7
with gffutils v0.9
and pybedtools:
conda create --name gencode_env python=3.7
conda activate gencode_env
conda install -c bioconda gffutils=0.9 pybedtools
The corresponding output gzipped beds are in the data directory.
./create_regions_from_gencode.R <path_to_GFF/GTF> <path_to_output_dir>
Will create exons.bed, 3UTR.bed, 5UTR.bed, genes.bed, cds.bed
in <output_dir>
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
&& gunzip gencode.v25.annotation.gff3.gz
./create_regions_from_gencode.R gencode.v25.annotation.gff3 /path/to/GRCh37/annotation
We use GenePred
format to make the process a bit simple.
Download gtfToGenePred
Convert gtf to GenePred:
gtfToGenePred gencode.v25.annotation.gtf gencode.v25.annotation.genepred
Extract first exons
:
python genepred_to_bed.py --first_exon gencode.v25.annotation.genepred
Extract last exons
:
python genepred_to_bed.py --last_exon gencode.v25.annotation.genepred
This should be helpful:
or probably this: