Unipressed

Comprehensive Python client for the Uniprot REST API

MIT License

Downloads
2.2K
Stars
43
Committers
1

{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "from rich.pretty import install\n", "# Use rich to pretty print outputs, but truncate nested objects\n", "install()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Unipressed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please visit the project website for more comprehensive documentation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Unipressed (Uniprot REST) is an API client for the protein database Uniprot.\n", "It provides thoroughly typed and documented code to ensure your use of the library is easy, fast, and correct!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example\n", "Let's say we're interested in very long proteins that are encoded within a chloroplast, in any organism:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'primaryAccession': <span style="color: #008000; text-decoration-color: #008000">'A0A088CK67',\n", " <span style="color: #008000; text-decoration-color: #008000">'genes': <span style="font-weight: bold">[\n", " <span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'geneName': <span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'evidences': <span style="font-weight: bold">[{<span style="color: #008000; text-decoration-color: #008000">'evidenceCode': <span style="color: #008000; text-decoration-color: #008000">'ECO:0000313', <span style="color: #008000; text-decoration-color: #008000">'source': <span style="color: #008000; text-decoration-color: #008000">'EMBL', <span style="color: #008000; text-decoration-color: #008000">'id': <span style="color: #008000; text-decoration-color: #008000">'AID67672.1'<span style="font-weight: bold">}],\n", " <span style="color: #008000; text-decoration-color: #008000">'value': <span style="color: #008000; text-decoration-color: #008000">'ftsH'\n", " <span style="font-weight: bold">}\n", " <span style="font-weight: bold">}\n", " <span style="font-weight: bold">],\n", " <span style="color: #008000; text-decoration-color: #008000">'sequence': <span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'length': <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">5242<span style="font-weight: bold">}\n", "<span style="font-weight: bold">}\n", "\n" ], "text/plain": [ "\n", "\u001b[1m{\u001b[0m\n", " \u001b[32m'primaryAccession'\u001b[0m: \u001b[32m'A0A088CK67'\u001b[0m,\n", " \u001b[32m'genes'\u001b[0m: \u001b[1m[\u001b[0m\n", " \u001b[1m{\u001b[0m\n", " \u001b[32m'geneName'\u001b[0m: \u001b[1m{\u001b[0m\n", " \u001b[32m'evidences'\u001b[0m: \u001b[1m[\u001b[0m\u001b[1m{\u001b[0m\u001b[32m'evidenceCode'\u001b[0m: \u001b[32m'ECO:0000313'\u001b[0m, \u001b[32m'source'\u001b[0m: \u001b[32m'EMBL'\u001b[0m, \u001b[32m'id'\u001b[0m: \u001b[32m'AID67672.1'\u001b[0m\u001b[1m}\u001b[0m\u001b[1m]\u001b[0m,\n", " \u001b[32m'value'\u001b[0m: \u001b[32m'ftsH'\u001b[0m\n", " \u001b[1m}\u001b[0m\n", " \u001b[1m}\u001b[0m\n", " \u001b[1m]\u001b[0m,\n", " \u001b[32m'sequence'\u001b[0m: \u001b[1m{\u001b[0m\u001b[32m'length'\u001b[0m: \u001b[1;36m5242\u001b[0m\u001b[1m}\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from unipressed import UniprotkbClient\n", "\n", "for record in UniprotkbClient.search(\n", " query={\n", " "and_": [\n", " {"organelle": "chloroplast"},\n", " {"length": (5000, "")}\n", " ]\n", " },\n", " fields=["length", "gene_names"]\n", ").each_record():\n", " display(record)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Advantages\n", "\n", " Detailed type hints for autocompleting queries as you type\n", "* Autocompletion for return fields\n", "* Documentation for each field\n", "* Automatic results parsing, for json, tsv, list, and xml\n", "* Built-in pagination, so you don't have to handle any of that yourself!\n", "* Most of the API is automatically generated, ensuring very rapid updates whenever the API changes\n", "* Thoroughly tested, with 41 unit tests and counting!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Usage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installation\n", "\n", "If you're using poetry:\n", "bash\n", "poetry add unipressed\n", "\n", "\n", "Otherwise:\n", "bash\n", "pip install unipressed\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dataset Clients\n", "\n", "The unipressed module exports a client object for each UniProt dataset:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from unipressed import UniprotkbClient, UniparcClient" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With one of these clients, you can search the dataset:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "install(max_depth=1)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'entryType': <span style="color: #008000; text-decoration-color: #008000">'UniProtKB reviewed (Swiss-Prot)',\n", " <span style="color: #008000; text-decoration-color: #008000">'primaryAccession': <span style="color: #008000; text-decoration-color: #008000">'Q96RW7',\n", " <span style="color: #008000; text-decoration-color: #008000">'secondaryAccessions': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtkbId': <span style="color: #008000; text-decoration-color: #008000">'HMCN1_HUMAN',\n", " <span style="color: #008000; text-decoration-color: #008000">'entryAudit': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'annotationScore': <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">5.0,\n", " <span style="color: #008000; text-decoration-color: #008000">'organism': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinExistence': <span style="color: #008000; text-decoration-color: #008000">'1: Evidence at protein level',\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinDescription': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'genes': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'comments': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'features': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'keywords': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'references': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtKBCrossReferences': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'sequence': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'extraAttributes': <span style="color: #808000; text-decoration-color: #808000">...\n", "<span style="font-weight: bold">}\n", "\n" ], "text/plain": [ "\n", "\u001b[1m{\u001b[0m\n", " \u001b[32m'entryType'\u001b[0m: \u001b[32m'UniProtKB reviewed \u001b[0m\u001b[32m(\u001b[0m\u001b[32mSwiss-Prot\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m'primaryAccession'\u001b[0m: \u001b[32m'Q96RW7'\u001b[0m,\n", " \u001b[32m'secondaryAccessions'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtkbId'\u001b[0m: \u001b[32m'HMCN1_HUMAN'\u001b[0m,\n", " \u001b[32m'entryAudit'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'annotationScore'\u001b[0m: \u001b[1;36m5.0\u001b[0m,\n", " \u001b[32m'organism'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'proteinExistence'\u001b[0m: \u001b[32m'1: Evidence at protein level'\u001b[0m,\n", " \u001b[32m'proteinDescription'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'genes'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'comments'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'features'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'keywords'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'references'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtKBCrossReferences'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'sequence'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'extraAttributes'\u001b[0m: \u001b[33m...\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "records = UniprotkbClient.search({\n", " "length": (5000, 6000)\n", "}).each_record()\n", "\n", "# Show the first record\n", "next(records)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can request a single record by ID:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'entryType': <span style="color: #008000; text-decoration-color: #008000">'UniProtKB reviewed (Swiss-Prot)',\n", " <span style="color: #008000; text-decoration-color: #008000">'primaryAccession': <span style="color: #008000; text-decoration-color: #008000">'Q96RW7',\n", " <span style="color: #008000; text-decoration-color: #008000">'secondaryAccessions': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtkbId': <span style="color: #008000; text-decoration-color: #008000">'HMCN1_HUMAN',\n", " <span style="color: #008000; text-decoration-color: #008000">'entryAudit': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'annotationScore': <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">5.0,\n", " <span style="color: #008000; text-decoration-color: #008000">'organism': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinExistence': <span style="color: #008000; text-decoration-color: #008000">'1: Evidence at protein level',\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinDescription': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'genes': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'comments': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'features': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'keywords': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'references': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtKBCrossReferences': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'sequence': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'extraAttributes': <span style="color: #808000; text-decoration-color: #808000">...\n", "<span style="font-weight: bold">}\n", "\n" ], "text/plain": [ "\n", "\u001b[1m{\u001b[0m\n", " \u001b[32m'entryType'\u001b[0m: \u001b[32m'UniProtKB reviewed \u001b[0m\u001b[32m(\u001b[0m\u001b[32mSwiss-Prot\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m'primaryAccession'\u001b[0m: \u001b[32m'Q96RW7'\u001b[0m,\n", " \u001b[32m'secondaryAccessions'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtkbId'\u001b[0m: \u001b[32m'HMCN1_HUMAN'\u001b[0m,\n", " \u001b[32m'entryAudit'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'annotationScore'\u001b[0m: \u001b[1;36m5.0\u001b[0m,\n", " \u001b[32m'organism'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'proteinExistence'\u001b[0m: \u001b[32m'1: Evidence at protein level'\u001b[0m,\n", " \u001b[32m'proteinDescription'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'genes'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'comments'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'features'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'keywords'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'references'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtKBCrossReferences'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'sequence'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'extraAttributes'\u001b[0m: \u001b[33m...\u001b[0m\n", "\u001b[1m}\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.fetch_one("Q96RW7")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also request multiple records:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "install(max_depth=2)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="font-weight: bold">[\n", " <span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'entryType': <span style="color: #008000; text-decoration-color: #008000">'UniProtKB reviewed (Swiss-Prot)',\n", " <span style="color: #008000; text-decoration-color: #008000">'primaryAccession': <span style="color: #008000; text-decoration-color: #008000">'A0A0C5B5G6',\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtkbId': <span style="color: #008000; text-decoration-color: #008000">'MOTSC_HUMAN',\n", " <span style="color: #008000; text-decoration-color: #008000">'entryAudit': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'annotationScore': <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">5.0,\n", " <span style="color: #008000; text-decoration-color: #008000">'organism': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinExistence': <span style="color: #008000; text-decoration-color: #008000">'1: Evidence at protein level',\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinDescription': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'genes': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'comments': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'features': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'geneLocations': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'keywords': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'references': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtKBCrossReferences': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'sequence': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'extraAttributes': <span style="color: #808000; text-decoration-color: #808000">...\n", " <span style="font-weight: bold">},\n", " <span style="font-weight: bold">{\n", " <span style="color: #008000; text-decoration-color: #008000">'entryType': <span style="color: #008000; text-decoration-color: #008000">'UniProtKB reviewed (Swiss-Prot)',\n", " <span style="color: #008000; text-decoration-color: #008000">'primaryAccession': <span style="color: #008000; text-decoration-color: #008000">'A0A1B0GTW7',\n", " <span style="color: #008000; text-decoration-color: #008000">'secondaryAccessions': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtkbId': <span style="color: #008000; text-decoration-color: #008000">'CIROP_HUMAN',\n", " <span style="color: #008000; text-decoration-color: #008000">'entryAudit': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'annotationScore': <span style="color: #008080; text-decoration-color: #008080; font-weight: bold">5.0,\n", " <span style="color: #008000; text-decoration-color: #008000">'organism': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinExistence': <span style="color: #008000; text-decoration-color: #008000">'1: Evidence at protein level',\n", " <span style="color: #008000; text-decoration-color: #008000">'proteinDescription': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'genes': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'comments': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'features': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'keywords': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'references': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'uniProtKBCrossReferences': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'sequence': <span style="color: #808000; text-decoration-color: #808000">...,\n", " <span style="color: #008000; text-decoration-color: #008000">'extraAttributes': <span style="color: #808000; text-decoration-color: #808000">...\n", " <span style="font-weight: bold">}\n", "<span style="font-weight: bold">]\n", "\n" ], "text/plain": [ "\n", "\u001b[1m[\u001b[0m\n", " \u001b[1m{\u001b[0m\n", " \u001b[32m'entryType'\u001b[0m: \u001b[32m'UniProtKB reviewed \u001b[0m\u001b[32m(\u001b[0m\u001b[32mSwiss-Prot\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m'primaryAccession'\u001b[0m: \u001b[32m'A0A0C5B5G6'\u001b[0m,\n", " \u001b[32m'uniProtkbId'\u001b[0m: \u001b[32m'MOTSC_HUMAN'\u001b[0m,\n", " \u001b[32m'entryAudit'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'annotationScore'\u001b[0m: \u001b[1;36m5.0\u001b[0m,\n", " \u001b[32m'organism'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'proteinExistence'\u001b[0m: \u001b[32m'1: Evidence at protein level'\u001b[0m,\n", " \u001b[32m'proteinDescription'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'genes'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'comments'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'features'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'geneLocations'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'keywords'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'references'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtKBCrossReferences'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'sequence'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'extraAttributes'\u001b[0m: \u001b[33m...\u001b[0m\n", " \u001b[1m}\u001b[0m,\n", " \u001b[1m{\u001b[0m\n", " \u001b[32m'entryType'\u001b[0m: \u001b[32m'UniProtKB reviewed \u001b[0m\u001b[32m(\u001b[0m\u001b[32mSwiss-Prot\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[32m'primaryAccession'\u001b[0m: \u001b[32m'A0A1B0GTW7'\u001b[0m,\n", " \u001b[32m'secondaryAccessions'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtkbId'\u001b[0m: \u001b[32m'CIROP_HUMAN'\u001b[0m,\n", " \u001b[32m'entryAudit'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'annotationScore'\u001b[0m: \u001b[1;36m5.0\u001b[0m,\n", " \u001b[32m'organism'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'proteinExistence'\u001b[0m: \u001b[32m'1: Evidence at protein level'\u001b[0m,\n", " \u001b[32m'proteinDescription'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'genes'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'comments'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'features'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'keywords'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'references'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'uniProtKBCrossReferences'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'sequence'\u001b[0m: \u001b[33m...\u001b[0m,\n", " \u001b[32m'extraAttributes'\u001b[0m: \u001b[33m...\u001b[0m\n", " \u001b[1m}\u001b[0m\n", "\u001b[1m]\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.fetch_many(["A0A0C5B5G6", "A0A1B0GTW7"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ID Mapping\n", "\n", "Unipressed also provides one other unique client, which is designed for mapping identifiers. You provide the source and destination database (both of which will autocomplete in VS Code), and a list of identifiers for the source database." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="font-weight: bold">[\n", " <span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'from': <span style="color: #008000; text-decoration-color: #008000">'A1L190', <span style="color: #008000; text-decoration-color: #008000">'to': <span style="color: #008000; text-decoration-color: #008000">'SYCE3'<span style="font-weight: bold">},\n", " <span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'from': <span style="color: #008000; text-decoration-color: #008000">'A0PK11', <span style="color: #008000; text-decoration-color: #008000">'to': <span style="color: #008000; text-decoration-color: #008000">'CLRN2'<span style="font-weight: bold">},\n", " <span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'from': <span style="color: #008000; text-decoration-color: #008000">'A0JP26', <span style="color: #008000; text-decoration-color: #008000">'to': <span style="color: #008000; text-decoration-color: #008000">'POTEB3'<span style="font-weight: bold">}\n", "<span style="font-weight: bold">]\n", "\n" ], "text/plain": [ "\n", "\u001b[1m[\u001b[0m\n", " \u001b[1m{\u001b[0m\u001b[32m'from'\u001b[0m: \u001b[32m'A1L190'\u001b[0m, \u001b[32m'to'\u001b[0m: \u001b[32m'SYCE3'\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[1m{\u001b[0m\u001b[32m'from'\u001b[0m: \u001b[32m'A0PK11'\u001b[0m, \u001b[32m'to'\u001b[0m: \u001b[32m'CLRN2'\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[1m{\u001b[0m\u001b[32m'from'\u001b[0m: \u001b[32m'A0JP26'\u001b[0m, \u001b[32m'to'\u001b[0m: \u001b[32m'POTEB3'\u001b[0m\u001b[1m}\u001b[0m\n", "\u001b[1m]\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from unipressed import IdMappingClient\n", "request = IdMappingClient.submit(\n", " source="UniProtKB_AC-ID", dest="Gene_Name", ids={"A1L190", "A0JP26", "A0PK11"}\n", ")\n", "list(request.each_result())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that, if you submit a large number of IDs, you might need to add a sleep() call between submitting the request and retrieving the results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Query Syntax\n", "\n", "The query syntax refers to the values you pass in to the query argument of the search() method.\n", "\n", "In general, you can't go wrong by following the type hints.\n", "I strongly recommend using something like pylance for Visual Studio Code, which will provide automatic completions and warn you when you have used the wrong syntax.\n", "\n", "If you already know how to use the Uniprot query language, you can always just input your queries as strings:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="color: #800080; text-decoration-color: #800080; font-weight: bold">Search<span style="font-weight: bold">(\n", " <span style="color: #808000; text-decoration-color: #808000">query=<span style="color: #008000; text-decoration-color: #008000">'(gene:BRCA*) AND (organism_id:10090)',\n", " <span style="color: #808000; text-decoration-color: #808000">dataset=<span style="color: #008000; text-decoration-color: #008000">'uniprotkb',\n", " <span style="color: #808000; text-decoration-color: #808000">format=<span style="color: #008000; text-decoration-color: #008000">'json',\n", " <span style="color: #808000; text-decoration-color: #808000">fields=<span style="color: #800080; text-decoration-color: #800080; font-style: italic">None,\n", " <span style="color: #808000; text-decoration-color: #808000">include_isoform=<span style="color: #00ff00; text-decoration-color: #00ff00; font-style: italic">True,\n", " <span style="color: #808000; text-decoration-color: #808000">size=<span style="color: #008080; text-decoration-color: #008080; font-weight: bold">500\n", "<span style="font-weight: bold">)\n", "\n" ], "text/plain": [ "\n", "\u001b[1;35mSearch\u001b[0m\u001b[1m(\u001b[0m\n", " \u001b[33mquery\u001b[0m=\u001b[32m'\u001b[0m\u001b[32m(\u001b[0m\u001b[32mgene:BRCA*\u001b[0m\u001b[32m)\u001b[0m\u001b[32m AND \u001b[0m\u001b[32m(\u001b[0m\u001b[32morganism_id:10090\u001b[0m\u001b[32m)\u001b[0m\u001b[32m'\u001b[0m,\n", " \u001b[33mdataset\u001b[0m=\u001b[32m'uniprotkb'\u001b[0m,\n", " \u001b[33mformat\u001b[0m=\u001b[32m'json'\u001b[0m,\n", " \u001b[33mfields\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", " \u001b[33minclude_isoform\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n", " \u001b[33msize\u001b[0m=\u001b[1;36m500\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.search(query="(gene:BRCA*) AND (organism_id:10090)")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, if you want some built-in query validation and code completion using Python's type system, then you can instead use a dictionary.\n", "The simplest query is a dictionary with a single key: " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="color: #800080; text-decoration-color: #800080; font-weight: bold">Search<span style="font-weight: bold">(\n", " <span style="color: #808000; text-decoration-color: #808000">query=<span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'family': <span style="color: #008000; text-decoration-color: #008000">'kinase'<span style="font-weight: bold">},\n", " <span style="color: #808000; text-decoration-color: #808000">dataset=<span style="color: #008000; text-decoration-color: #008000">'uniprotkb',\n", " <span style="color: #808000; text-decoration-color: #808000">format=<span style="color: #008000; text-decoration-color: #008000">'json',\n", " <span style="color: #808000; text-decoration-color: #808000">fields=<span style="color: #800080; text-decoration-color: #800080; font-style: italic">None,\n", " <span style="color: #808000; text-decoration-color: #808000">include_isoform=<span style="color: #00ff00; text-decoration-color: #00ff00; font-style: italic">True,\n", " <span style="color: #808000; text-decoration-color: #808000">size=<span style="color: #008080; text-decoration-color: #008080; font-weight: bold">500\n", "<span style="font-weight: bold">)\n", "\n" ], "text/plain": [ "\n", "\u001b[1;35mSearch\u001b[0m\u001b[1m(\u001b[0m\n", " \u001b[33mquery\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'family'\u001b[0m: \u001b[32m'kinase'\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[33mdataset\u001b[0m=\u001b[32m'uniprotkb'\u001b[0m,\n", " \u001b[33mformat\u001b[0m=\u001b[32m'json'\u001b[0m,\n", " \u001b[33mfields\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", " \u001b[33minclude_isoform\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n", " \u001b[33msize\u001b[0m=\u001b[1;36m500\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.search(query={"family": "kinase"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can compile more complex queries using the and_, or_ and not_ keys.\n", "These first two operators take a list of query dictionaries: " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="color: #800080; text-decoration-color: #800080; font-weight: bold">Search<span style="font-weight: bold">(\n", " <span style="color: #808000; text-decoration-color: #808000">query=<span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'and_': <span style="color: #808000; text-decoration-color: #808000">...<span style="font-weight: bold">},\n", " <span style="color: #808000; text-decoration-color: #808000">dataset=<span style="color: #008000; text-decoration-color: #008000">'uniprotkb',\n", " <span style="color: #808000; text-decoration-color: #808000">format=<span style="color: #008000; text-decoration-color: #008000">'json',\n", " <span style="color: #808000; text-decoration-color: #808000">fields=<span style="color: #800080; text-decoration-color: #800080; font-style: italic">None,\n", " <span style="color: #808000; text-decoration-color: #808000">include_isoform=<span style="color: #00ff00; text-decoration-color: #00ff00; font-style: italic">True,\n", " <span style="color: #808000; text-decoration-color: #808000">size=<span style="color: #008080; text-decoration-color: #008080; font-weight: bold">500\n", "<span style="font-weight: bold">)\n", "\n" ], "text/plain": [ "\n", "\u001b[1;35mSearch\u001b[0m\u001b[1m(\u001b[0m\n", " \u001b[33mquery\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'and_'\u001b[0m: \u001b[33m...\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[33mdataset\u001b[0m=\u001b[32m'uniprotkb'\u001b[0m,\n", " \u001b[33mformat\u001b[0m=\u001b[32m'json'\u001b[0m,\n", " \u001b[33mfields\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", " \u001b[33minclude_isoform\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n", " \u001b[33msize\u001b[0m=\u001b[1;36m500\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.search(query={\n", " "and_": [\n", " {"family": "kinase"},\n", " {"organism_id": "9606"},\n", " ]\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most "leaf" nodes of the query tree (ie those that aren't operators like and_) are strings, integers or floats, which you input as normal Python literals as you can see above.\n", "For string fields, you also have access to wildcards, namely the * character. \n", "For example, if you want every human protein belonging to a gene whose name starts with PRO, you could use:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="color: #800080; text-decoration-color: #800080; font-weight: bold">Search<span style="font-weight: bold">(\n", " <span style="color: #808000; text-decoration-color: #808000">query=<span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'and_': <span style="color: #808000; text-decoration-color: #808000">...<span style="font-weight: bold">},\n", " <span style="color: #808000; text-decoration-color: #808000">dataset=<span style="color: #008000; text-decoration-color: #008000">'uniprotkb',\n", " <span style="color: #808000; text-decoration-color: #808000">format=<span style="color: #008000; text-decoration-color: #008000">'json',\n", " <span style="color: #808000; text-decoration-color: #808000">fields=<span style="color: #800080; text-decoration-color: #800080; font-style: italic">None,\n", " <span style="color: #808000; text-decoration-color: #808000">include_isoform=<span style="color: #00ff00; text-decoration-color: #00ff00; font-style: italic">True,\n", " <span style="color: #808000; text-decoration-color: #808000">size=<span style="color: #008080; text-decoration-color: #008080; font-weight: bold">500\n", "<span style="font-weight: bold">)\n", "\n" ], "text/plain": [ "\n", "\u001b[1;35mSearch\u001b[0m\u001b[1m(\u001b[0m\n", " \u001b[33mquery\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'and_'\u001b[0m: \u001b[33m...\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[33mdataset\u001b[0m=\u001b[32m'uniprotkb'\u001b[0m,\n", " \u001b[33mformat\u001b[0m=\u001b[32m'json'\u001b[0m,\n", " \u001b[33mfields\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", " \u001b[33minclude_isoform\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n", " \u001b[33msize\u001b[0m=\u001b[1;36m500\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.search(query={\n", " "and_": [\n", " {"gene": "PRO*"},\n", " {"organism_id": "9606"},\n", " ]\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few query fields are ranges, which you input using a tuple with two elements, indicating the start and end of the range.\n", "If you use the literal \"*\" then you can leave the range open at one end. \n", "For example, this query returns any protein that is in the range $[5000, \infty)$" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="color: #800080; text-decoration-color: #800080; font-weight: bold">Search<span style="font-weight: bold">(\n", " <span style="color: #808000; text-decoration-color: #808000">query=<span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'length': <span style="color: #808000; text-decoration-color: #808000">...<span style="font-weight: bold">},\n", " <span style="color: #808000; text-decoration-color: #808000">dataset=<span style="color: #008000; text-decoration-color: #008000">'uniprotkb',\n", " <span style="color: #808000; text-decoration-color: #808000">format=<span style="color: #008000; text-decoration-color: #008000">'json',\n", " <span style="color: #808000; text-decoration-color: #808000">fields=<span style="color: #800080; text-decoration-color: #800080; font-style: italic">None,\n", " <span style="color: #808000; text-decoration-color: #808000">include_isoform=<span style="color: #00ff00; text-decoration-color: #00ff00; font-style: italic">True,\n", " <span style="color: #808000; text-decoration-color: #808000">size=<span style="color: #008080; text-decoration-color: #008080; font-weight: bold">500\n", "<span style="font-weight: bold">)\n", "\n" ], "text/plain": [ "\n", "\u001b[1;35mSearch\u001b[0m\u001b[1m(\u001b[0m\n", " \u001b[33mquery\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'length'\u001b[0m: \u001b[33m...\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[33mdataset\u001b[0m=\u001b[32m'uniprotkb'\u001b[0m,\n", " \u001b[33mformat\u001b[0m=\u001b[32m'json'\u001b[0m,\n", " \u001b[33mfields\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", " \u001b[33minclude_isoform\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n", " \u001b[33msize\u001b[0m=\u001b[1;36m500\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "UniprotkbClient.search(query={"length": (5000, "")})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, a few query fields take dates.\n", "These you input as a Python datetime.date object.\n", "For example, to find proteins added to UniProt since July 2022, we would do:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">\n", "<span style="color: #800080; text-decoration-color: #800080; font-weight: bold">Search<span style="font-weight: bold">(\n", " <span style="color: #808000; text-decoration-color: #808000">query=<span style="font-weight: bold">{<span style="color: #008000; text-decoration-color: #008000">'date_created': <span style="color: #808000; text-decoration-color: #808000">...<span style="font-weight: bold">},\n", " <span style="color: #808000; text-decoration-color: #808000">dataset=<span style="color: #008000; text-decoration-color: #008000">'uniprotkb',\n", " <span style="color: #808000; text-decoration-color: #808000">format=<span style="color: #008000; text-decoration-color: #008000">'json',\n", " <span style="color: #808000; text-decoration-color: #808000">fields=<span style="color: #800080; text-decoration-color: #800080; font-style: italic">None,\n", " <span style="color: #808000; text-decoration-color: #808000">include_isoform=<span style="color: #00ff00; text-decoration-color: #00ff00; font-style: italic">True,\n", " <span style="color: #808000; text-decoration-color: #808000">size=<span style="color: #008080; text-decoration-color: #008080; font-weight: bold">500\n", "<span style="font-weight: bold">)\n", "\n" ], "text/plain": [ "\n", "\u001b[1;35mSearch\u001b[0m\u001b[1m(\u001b[0m\n", " \u001b[33mquery\u001b[0m=\u001b[1m{\u001b[0m\u001b[32m'date_created'\u001b[0m: \u001b[33m...\u001b[0m\u001b[1m}\u001b[0m,\n", " \u001b[33mdataset\u001b[0m=\u001b[32m'uniprotkb'\u001b[0m,\n", " \u001b[33mformat\u001b[0m=\u001b[32m'json'\u001b[0m,\n", " \u001b[33mfields\u001b[0m=\u001b[3;35mNone\u001b[0m,\n", " \u001b[33minclude_isoform\u001b[0m=\u001b[3;92mTrue\u001b[0m,\n", " \u001b[33msize\u001b[0m=\u001b[1;36m500\u001b[0m\n", "\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from datetime import date\n", "\n", "UniprotkbClient.search(query={"date_created": (date(2022, 7, 1), "")})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use with Visual Studio Code\n", "To get VS Code to offer suggestions, press the Trigger Suggest shortcut which is usually bound to Ctrl + Space.\n", "In particular, code completion generally won't work until you open a string literal using a quotation mark.\n", "\n", "Secondly, to get live access to the documentation, you can either use the Show Hover shortcut, which is usually bound to Ctrl + K, Ctrl + I, or you can install the docs-view extension, which lets you view the docstrings in the sidebar without interfering with your code." ] } ], "metadata": { "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" }, "vscode": { "interpreter": { "hash": "c0137c43e910c15b77561d75e521e84649b6e722cf4653e9dfc9be3dea5b8876" } } }, "nbformat": 4, "nbformat_minor": 2 }