Cython bindings and Python interface to trimAl, a tool for automated alignment trimming. Now with SIMD!
GPL-3.0 License
Cython bindings and Python interface to trimAl, a tool for automated alignment trimming. Now with SIMD!
โ ๏ธ This package is based on the release candidate of trimAl 2.0, and results may not be consistent across versions or with the trimAl 1.4 results.
PytrimAl is a Python module that provides bindings to trimAl using Cython. It implements a user-friendly, Pythonic interface to use one of the different trimming methods from trimAl and access results directly. It interacts with the trimAl internals, which has the following advantages:
Alignment
The following features are available or considered for implementation:
PytrimAl is available for all modern versions (3.6+), with no external dependencies.
It can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX) and the Aarch64 architecture (Linux only), as well as the code required to compile from source with Cython:
$ pip install pytrimal
Otherwise, pytrimal is also available as a Bioconda package:
$ conda install -c bioconda pytrimal
Let's load an Alignment
from a file on the disk, and use the strictplus
method to trim it, before printing the TrimmedAlignment
as a Clustal block:
from pytrimal import Alignment, AutomaticTrimmer
ali = Alignment.load("pytrimal/tests/data/example.001.AA.clw")
trimmer = AutomaticTrimmer(method="strictplus")
trimmed = trimmer.trim(ali)
for name, seq in zip(trimmed.names, trimmed.sequences):
print(name.decode().rjust(6), seq)
This should output the following:
Sp8 GIVLVWLFPWNGLQIHMMGII
Sp10 VIMLEWFFAWLGLEINMMVII
Sp26 GLFLAAANAWLGLEINMMAQI
Sp6 GIYLSWYLAWLGLEINMMAII
Sp17 GFLLTWFQLWQGLDLNKMPVF
Sp33 GLHMAWFQAWGGLEINKQAIL
You can then use the
dump
method to write the trimmed alignment to a file or file-like
object. For instance, save the results in
PIR format
to a file named example.trimmed.pir
:
trimmed.dump("example.trimmed.pir", format="pir")
Trimmer objects are thread-safe, and the trim
method is re-entrant.
This means you can batch-process alignments in parallel using a
ThreadPool
with a single trimmer object:
import glob
import multiprocessing.pool
from pytrimal import Alignment, AutomaticTrimmer
trimmer = AutomaticTrimmer()
alignments = map(Alignment.load, glob.iglob("pytrimal/tests/data/*.fasta"))
with multiprocessing.pool.ThreadPool() as pool:
trimmed_alignments = pool.map(trimmer.trim, alignments)
Benchmarks were run on a i7-10710U CPU
@ 1.10GHz, using a single core to time the computation of several statistics,
on a variable number of sequences from
example.014.AA.EggNOG.COG0591.fasta
,
an alignment of 3583 sequences and 7287 columns.
Each graph measures the computation time of a single trimAl statistic (see the Statistics page of the online documentation for more information.)
The None
curve shows the time using the internal trimAl 2.0 code,
the Generic
curve shows a generic C implementation with some more
optimizations, and the SSE
curve shows the time spent using a dedicated
class with SIMD
implementations of the statistic computation.
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
This library is provided under the GNU General Public License v3.0.
trimAl is developed by the trimAl team and is distributed under the
terms of the GPLv3 as well. See vendor/trimal/LICENSE
for more information.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the trimAl authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.