Resolve und retrieve database cross reference data used in bioinformatics databases
MIT License
A tool that resolves database cross references (dbxrefs). It can return a list of locations where the cross reference points to in different formats, like HTML, XML, flat file or json. It can also retrieve the data for some of the supported databases and convert it into json.
The intended audience for this tool are bioinformatician that need to collect data for dbxrefs and postprocess it. By returning everything in json format the need for normalization and special parsing of the data is reduced.
Prerequisites:
Supported databases:
Checkout the repository::
git clone [email protected]:SOaAS/dbxref.git
Setup a virtualenv for development and install it in editable mode::
# install in development environment
virtualenv --python=python3 venv; source venv/bin/activate;
pip install -e .
# run tests
python3 setup.py test
# compile documentation
python3 setup.py build_sphinx
Use the application::
dbxref resolve GO:0097281
Use it as a library::
# resolve urls for an entry
from dbxref import resolver
resolver.resolve([{'db': 'taxid', 'id': '12345'}])
# => [{'dbxref': 'taxid:12345', 'locations': {'json': ['https://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/tax-id/12345'], 'xml_ncbi': ['https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=12345'], 'xml': ['http://www.uniprot.org/taxonomy/12345.rdf'], 'html': ['http://www.uniprot.org/taxonomy/12345']}, 'status': 'found'}]
# retrieve an entry
from dbxref import retriever
retriever.retrieve([{'db':'taxid', 'id': '12345'}])
# => [{'geneticCodes': {'geneticCode': '11'}, 'scientificName': 'Bacillus virus GA1', 'lineage': ['Viruses', 'Duplodnaviria', 'Heunggongvirae', 'Uroviricota', 'Caudoviricetes', 'Caudovirales', 'Podoviridae', 'Picovirinae', 'Salasvirus'], 'id': 'taxid:12345', 'rank': 'species'}]