A Python library for working with Kasabi.com APIs
OTHER License
A Python library for working with Kasabi.com APIs.
This is a library for working with APIs provided by Kasabi
Homepage: http://github.com/iand/pytassium Pypi: http://pypi.python.org/pypi/pytassium
Install with easy_install:
>sudo easy_install pytassium
If you already have it installed then simply upgrade with:
>sudo easy_install --upgrade pytassium
pytassium requires the following (should be handled automatically with easy_install):
The basic pattern of usage is as follows:
import pytassium
import time
dataset = pytassium.Dataset('nasa','put-your-api-key-here')
# --------------------------
# Use the lookup API
# --------------------------
response, data = dataset.lookup('http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan')
if response.status in range(200,300):
# data now contains an rdflib.Graph
print data.serialize(format="turtle")
else:
print "Oh no! %d %s - %s" % (response.status, response.reason, body)
# --------------------------
# Use the sparql API
# --------------------------
response, data = dataset.select('select ?s where {?s a <http://xmlns.com/foaf/0.1/Person>} limit 10')
if response.status in range(200,300):
# data now contains a dictionary of results
print data
else:
print "Oh no! %d %s - %s" % (response.status, response.reason, body)
# --------------------------
# Use the attribution API
# --------------------------
response, data = dataset.attribution()
# assuming success, data now contains dictionary
print data['homepage']
# --------------------------
# Use the search API
# --------------------------
# search for 5 results matching apollo
response, data = dataset.search("apollo", 5)
for result in data['results']:
print "%s (score: %s)" % (result['title'], result['score'])
# facet on a search for alan, with the name and type fields
fields = ['name', 'type']
query = "alan"
response, data = dataset.facet(query, fields)
for facet in data['fields']:
print "Top %ss matching %s" % (facet['name'],query)
for term in facet['terms']:
print "%s (%s results)" % (term['value'], term['number'])
# --------------------------
# Use the reconciliation API
# --------------------------
# Reconcile one label
response, data = dataset.reconcile('Alan Shepard')
print "Best match is: %s" % data['result'][0]['id']
# Reconcile a list of labels
labels = ['Neil Armstrong','alan shepard']
response, data = dataset.reconcile(labels)
for i in range(0, len(labels)):
print "Best match for %s is: %s" % (labels[i], data['q%s'%i]['result'][0]['id'])
# Reconcile a label with specific parameters
response, data = dataset.reconcile('Apollo 11', limit=3, type='http://purl.org/net/schemas/space/Mission', type_strict ='any')
print "Best match is: %s" % data['result'][0]['id']
# Reconcile with a specific query
query = {
"query" : "Apollo 11",
"limit" : 3,
"type" : "http://purl.org/net/schemas/space/Mission",
"type_strict" : "any",
}
response, data = dataset.reconcile(query)
print "Best match is: %s" % data['result'][0]['id']
# --------------------------
# Use the update API
# --------------------------
dataset = pytassium.Dataset('my-writable-dataset','put-your-api-key-here')
# Store the contents of a turtle file
dataset.store_file('/tmp/mydata.ttl', media_type='text/turtle')
# Store data from a string
mytriples = "<http://example.com/foo> a <http://example.com/Cat> ."
dataset.store_data(mytriples, media_type='text/turtle')
# --------------------------
# Use the jobs API
# --------------------------
response, job_uri = dataset.schedule_reset()
print "Reset scheduled, URI is: %s" % job_uri
print "Waiting for reset to complete"
done = False
while not done:
response, data = dataset.job_status(job_uri)
if response.status in range(200,300):
if data['status'] == 'scheduled':
print "Reset has not started yet"
elif data['status'] == 'running':
print "Reset is in progress"
elif data['status'] == 'failed':
print "Reset has failed :("
done = True
elif data['status'] == 'succeeded':
print "Reset has completed :)"
done = True
if not done:
time.sleep(5)
The pytassium package comes with a command line utility. Use it from the command line like this:
pytassium
You'll be presented with a command prompt:
>>>
First you need to tell it which dataset you want to work with. The "use" command does this. You can supply the short name of the store or the full URI, it doesn't matter:
>>> use nasa
Using nasa
You also need to supply your API key:
>>> apikey yourapikey
You can also specify the dataset and apikey using the -d and -a command line options:
./pytassium -d nasa -a yourapikey
Alternatively you can specify the default apikey to use by setting the KASABI_API_KEY environment variable. In Linux:
export KASABI_API_KEY=yourapikey
pytassium -d nasa
The -a parameter will override the environment variable.
To stop using pytassium use the "exit" command:
>>> exit
pytassium has a number of commands that help with exploring a dataset. First up is "sample" which returns a sample of the subjects from the data:
>>> sample
0. http://data.kasabi.com/dataset/nasa/launchsite/russia
1. http://data.kasabi.com/dataset/nasa/mission/apollo-10/role/lunar-module-pilot
2. http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan
3. http://data.kasabi.com/dataset/nasa/launchsite/tyuratambaikonurcosmodrome
4. http://data.kasabi.com/dataset/nasa/launchsite/xichang
5. http://data.kasabi.com/dataset/nasa/discipline/resupply/refurbishment/repair
6. http://data.kasabi.com/dataset/nasa/mission/apollo-10
7. http://www.bbc.co.uk/programmes/b00lg2xb#programme
8. http://data.kasabi.com/dataset/nasa/mission/apollo-11
9. http://data.kasabi.com/dataset/nasa/mission/apollo-11/role/backup-commander
You'll see that each URI is numbered. You can quickly describe that URI by typing the describe command followed by its number:
>>> describe 1
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix space: <http://purl.org/net/schemas/space/> .
<http://data.kasabi.com/dataset/nasa/mission/apollo-10/role/lunar-module-pilot>
a <http://purl.org/net/schemas/space/MissionRole>;
rdfs:label "Apollo 10 Lunar Module Pilot";
space:actor <http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan>;
space:mission <http://data.kasabi.com/dataset/nasa/mission/apollo-10>;
space:role <http://data.kasabi.com/dataset/nasa/roles/lunar-module-pilot> .
The numbers are remembered between each listing of URIs, so describe 2 will still work.
You can also describe by URI:
>>> describe <http://www.bbc.co.uk/programmes/b00lg2xb#programme>
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix po: <http://purl.org/ontology/po/> .
<http://www.bbc.co.uk/programmes/b00lg2xb#programme> a <http://purl.org/ontology/po/Episode>;
dc:title "One Small Step";
po:short_synopsis "The story of Neil Armstrong and Buzz Aldrin's trip to the moon.";
foaf:primaryTopic <http://data.kasabi.com/dataset/nasa/mission/apollo-10>,
<http://data.kasabi.com/dataset/nasa/mission/apollo-11>,
<http://data.kasabi.com/dataset/nasa/person/edwineugenealdrinjr>,
<http://data.kasabi.com/dataset/nasa/person/neilaldenarmstrong>;
foaf:topic <http://data.kasabi.com/dataset/nasa/mission/apollo-10> .
You can get a count of the triples in a store:
>>> count
99448 triples
Or counts of various other types:
>>> count subjects
12357 subjects
>>> count classes
10 classes
>>> count properties
39 properties
You can also count occurrences of a class:
>>> count <http://xmlns.com/foaf/0.1/Person>
58 <http://xmlns.com/foaf/0.1/Person>
Or you can use the prefixed version (see below for more on prefixes)
>>> count foaf:Person
58 foaf:Person
The "show" command enables you to explore characteristics of the data:
>>> show classes
0. http://purl.org/net/schemas/space/MissionRole
1. http://purl.org/net/schemas/space/Mission
2. http://xmlns.com/foaf/0.1/Person
3. http://purl.org/net/schemas/space/Spacecraft
4. http://purl.org/net/schemas/space/Launch
5. http://xmlns.com/foaf/0.1/Image
6. http://purl.org/net/schemas/space/Discipline
7. http://purl.org/net/schemas/space/LaunchSite
8. http://purl.org/ontology/po/Episode
9. http://rdfs.org/ns/void#Dataset
>>> show properties
0. http://purl.org/net/schemas/space/place
1. http://www.w3.org/2000/01/rdf-schema#label
2. http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3. http://purl.org/net/schemas/space/actor
4. http://purl.org/net/schemas/space/role
5. http://purl.org/net/schemas/space/mission
6. http://xmlns.com/foaf/0.1/name
7. http://purl.org/net/schemas/space/performed
8. http://www.w3.org/2002/07/owl#sameAs
9. http://purl.org/net/schemas/space/country
10. http://xmlns.com/foaf/0.1/isPrimaryTopicOf
11. http://purl.org/dc/elements/1.1/title
12. http://purl.org/net/schemas/space/missionRole
13. http://purl.org/ontology/po/short_synopsis
>>> show schemas
http://purl.org/net/schemas/space/
http://xmlns.com/foaf/0.1/
http://purl.org/ontology/po/
http://rdfs.org/ns/void#
http://www.w3.org/2000/01/rdf-schema#
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://www.w3.org/2002/07/owl#
http://purl.org/dc/elements/1.1/
http://purl.org/dc/terms/
>>> show topclasses
class | count
==============================================+======
http://purl.org/net/schemas/space/Spacecraft | 6692
http://purl.org/net/schemas/space/Launch | 5090
http://xmlns.com/foaf/0.1/Image | 303
http://purl.org/net/schemas/space/MissionRole | 142
http://xmlns.com/foaf/0.1/Person | 58
The "show void" command lists all void descriptions in the dataset, or describes it if there is only one:
>>> show void
@prefix dcterm: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
<http://data.kasabi.com/dataset/nasa/> a <http://rdfs.org/ns/void#Dataset>;
dcterm:description """
Conversion of various NASA datasets into RDF, starting with the spacecraft data from the NSSDC master catalog
""";
dcterm:source <http://history.nasa.gov/SP-4029/Apollo_00a_Cover.htm>,
<http://nssdc.gsfc.nasa.gov/nmc/>;
dcterm:title "NASA Space Flight & Astronaut data";
void:exampleResource <http://data.kasabi.com/dataset/nasa/mission/apollo-11>,
<http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan>,
<http://data.kasabi.com/dataset/nasa/person/neilaldenarmstrong>,
<http://data.kasabi.com/dataset/nasa/spacecraft/1957-001B>,
<http://data.kasabi.com/dataset/nasa/spacecraft/1969-059A>,
<http://data.kasabi.com/dataset/nasa/spacecraft/1977-084A>;
void:sparqlEndpoint <http://api.talis.com/stores/space/services/sparql>;
void:uriRegexPattern "http://data.kasabi.com/dataset/nasa/.+" .
The status and attribution commands provide more information about a dataset:
>>> status
Status: published
>>> attribution
Name: NASA
Homepage: http://beta.kasabi.com/dataset/nasa
Source: http://beta.kasabi.com
Logo: http://api.kasabi.com/images/kasabi-20-20.png
You can load data from a local file with the "store" command:
>>> store yourdata.nt
Uploading 'yourdata.nt'
The store command will automatically chunk ntriples files and load the pieces into the store. Note: this does not take account of blank nodes so don't use store on files over 2MB if they contain blank nodes
A future version will add support for the ingest service.
pytassium provides a "sparql" command to run a sparql query. It will attempt to format the results nicely.
>>> sparql select * where {?s a <http://xmlns.com/foaf/0.1/Person>} limit 5
s
===================================================================
http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan
http://data.kasabi.com/dataset/nasa/person/jamesarthurlovelljr
http://data.kasabi.com/dataset/nasa/person/richardfrancisgordonjr
http://data.kasabi.com/dataset/nasa/person/robertfranklynovermyer
http://data.kasabi.com/dataset/nasa/person/edgardeanmitchellusn/scd
The sparql command expands well known prefixes automatically:
>>> sparql select ?title where {?s a space:Mission; dc:title ?title } limit 5
title
=========
Apollo 10
Apollo 11
Apollo 12
Apollo 17
Apollo 15
You can use "show prefixes" to list the recognised prefixes:
>>> show prefixes
foaf: <http://xmlns.com/foaf/0.1/>
owl: <http://www.w3.org/2002/07/owl#>
xsd: <http://www.w3.org/2001/XMLSchema#>
bibo: <http://purl.org/ontology/bibo/>
You can add your own prefixes with the "prefix" command:
>>> prefix ex <http://example.com/foo/>
By default, when pytassium starts up it attempts to fetch a list of common prefixes from http://prefix.cc. This file is cached in the system temp directory for future use.
pytassium provides the "search" command for accessing a dataset's search API. All following parameters are assumed to be the search query:
>>> search apollo
0. Apollo 6 (score: 1.0)
1. ASTP-Apollo (score: 0.9938665)
2. Apollo 7 (score: 0.9717672)
3. Apollo 10 (score: 0.9620834)
4. Apollo 8 (score: 0.9620834)
>>> search apollo 13
0. Apollo 13 (score: 1.0)
1. Apollo 13 Lunar Module/ALSEP (score: 0.97858286)
2. Apollo 13 Command and Service Module (CSM) (score: 0.83995086)
3. Apollo 13 SIVB (score: 0.7720434)
4. Soyuz 13 (score: 0.71551764)
>>> search "apollo 13"
0. Apollo 13 (score: 1.0)
1. Apollo 13 Lunar Module/ALSEP (score: 0.9803725)
2. Apollo 13 Command and Service Module (CSM) (score: 0.84758013)
3. Apollo 13 SIVB (score: 0.49793136)
4. Apollo 13 Capsule Communicator (score: 0.41402295)
pytassium provides a reconcile command which invokes the dataset's reconciliation service.
>>> reconcile apollo
0. http://data.kasabi.com/dataset/nasa/spacecraft/1968-025A (score: 1.0)
1. http://data.kasabi.com/dataset/nasa/spacecraft/1975-066A (score: 0.9938665)
2. http://data.kasabi.com/dataset/nasa/spacecraft/1968-089A (score: 0.9717672)
3. http://data.kasabi.com/dataset/nasa/spacecraft/1969-043A (score: 0.9620834)
4. http://data.kasabi.com/dataset/nasa/spacecraft/1968-118A (score: 0.9620834)
Enclose multi word labels in quotes:
>>> reconcile "apollo 13"
0. http://data.kasabi.com/dataset/nasa/mission/apollo-13 (score: 1.0)
1. http://data.kasabi.com/dataset/nasa/spacecraft/1970-029C (score: 0.97858286)
2. http://data.kasabi.com/dataset/nasa/spacecraft/1970-029A (score: 0.83995086)
3. http://data.kasabi.com/dataset/nasa/spacecraft/1970-029B (score: 0.7720434)
4. http://data.kasabi.com/dataset/nasa/spacecraft/1973-103A (score: 0.71551764)
Specify a type:
>>> reconcile apollo space:Mission
0. http://data.kasabi.com/dataset/nasa/mission/apollo-13 (score: 1.0)
1. http://data.kasabi.com/dataset/nasa/mission/apollo-12 (score: 1.0)
2. http://data.kasabi.com/dataset/nasa/mission/apollo-14 (score: 1.0)
3. http://data.kasabi.com/dataset/nasa/mission/apollo-7 (score: 1.0)
4. http://data.kasabi.com/dataset/nasa/mission/apollo-10 (score: 1.0)
You can schedule a reset job on your dataset:
>>> reset
Scheduling reset job for immediate execution
Reset scheduled, URI is: http://api.kasabi.com/dataset/id-test-dataset/jobs/8777c36e-a904-4498-b837-bcc214a9216d
Reset has not started yet
Reset is in progress
Reset has completed
pytassium provides a -f command line options which specifies a filename containing commands to run. When pytassium is invoked with the -f option it reads the commands from the file, runs them and then terminates
./pytassium -f /tmp/myscript
You can save the history from an interactive session with the "save" command:
>>> save history /tmp/newscript
And execute the commands in any script with the "run" command:
>>> run /tmp/newscript
Any parameters supplied on the command line are assumed to a command for pytassium. It runs the command and then terminates:
pytassium -a yourapikey -d nasa show classes
0. http://purl.org/net/schemas/space/MissionRole
1. http://purl.org/net/schemas/space/Mission
2. http://xmlns.com/foaf/0.1/Person
Sparql queries will typically need to be enclosed in quotes:
pytassium -a yourapikey -d nasa sparql "select * where {?s a <http://xmlns.com/foaf/0.1/Person>}"
s
===================================================================
http://data.kasabi.com/dataset/nasa/person/eugeneandrewcernan
http://data.kasabi.com/dataset/nasa/person/jamesarthurlovelljr
http://data.kasabi.com/dataset/nasa/person/richardfrancisgordonjr
http://data.kasabi.com/dataset/nasa/person/robertfranklynovermyer
http://data.kasabi.com/dataset/nasa/person/edgardeanmitchellusn/scd
Multi-word reconciliations will need quotes to be doubled or escaped othewise the second word will be treated as the type:
pytassium -a yourapikey -d nasa reconcile "'apollo 13'" space:Mission
0. http://data.kasabi.com/dataset/nasa/mission/apollo-13 (score: 1.0)
A common pattern is to reset a dataset and load some fresh data into it:
pytassium -a yourapikey -d yourdataset reset
pytassium -a yourapikey -d yourdataset store yourdata.nt
The following APIs are not yet implemented:
If Python's not your thing, you may also be interested in:
This work is hereby released into the Public Domain.
To view a copy of the public domain dedication, visit http://creativecommons.org/licenses/publicdomain or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.