Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)
OTHER License
A MediaWiki API wrapper in Python for humans and elk.
Wapiti makes it simple for python scripts to retrieve data from the Wikipedia API. No more worries about query limits, continue strings, or formatting. Just ask for data and get structured results.
Let's get the members of Wikipedia's Category:Lists of
superlatives. First,
initialize a WapitiClient
and change any settings. Next, run the
operation get_category_pages
on the category 'Lists of superlatives'
, with a limit of 10
:
>>> import wapiti
>>> client = wapiti.WapitiClient('[email protected]')
>>> client.get_category_articles_recursive('Lists of superlatives', 10))
[PageInfo(title=u'The Fifty Worst Films of All Time', page_id=1820513, ns=0),
PageInfo(title=u"World's busiest city airport systems by passenger traffic", page_id=33167241, ns=0),
PageInfo(title=u'List of oldest Major League Baseball players', page_id=1947309, ns=0),
PageInfo(title=u'List of firsts in India', page_id=3752148, ns=0),
PageInfo(title=u'List of the first female holders of political offices in Europe', page_id=18904865, ns=0),
PageInfo(title=u'List of the busiest airports in the Republic of Ireland', page_id=26712480, ns=0),
PageInfo(title=u'List of longest bridges above water in India', page_id=32312925, ns=0),
PageInfo(title=u'List of the busiest airports in China', page_id=33396262, ns=0),
PageInfo(title=u'List of most common surnames in Asia', page_id=26810011, ns=0),
PageInfo(title=u'List of largest mosques', page_id=20897194, ns=0)]
This returns a list of PageInfo
objects for the category's members.
Operations usually take two positional arguments: the query_param
(page, category, template, etc.), and limit
(maximum number of
results).
get_random(limit)
: returns a list of PageIdentifiers
for random pages.get_category_articles(category, limit)
: returns a list of PageIdentifiers
for the articles or talk pages in a category. If you are interested in getting pages beyond of the main and talk namespace, try get_category
.get_category_articles_recursive(category, limit)
: returns a list of PageInfos
for the articles in a category and its subcategories. If you are interested in getting pages beyond of the main and talk namespace, try get_category_recursive
.get_transcludes(page, limit)
: returns a list of PageIdentifiers
for the articles that embed (transclude) a page. For example, see the pages that embed Template:Infobox with client.get_transcludes('Infobox')
.get_backlinks(page, limit)
: returns a list of PageIdentifiers
for pages that internally link back to a page. For example, see the pages that link to 'Coffee' with client.get_backlinks('Coffee')
.get_revision_infos(page, limit)
: returns a list of RevisionInfos
for a page's revisions.get_current_content(page, limit)
: returns a list of Revisions
(including text content) for the page's most recent revisions.Other operations are available: see wapiti/operations
Models describe the structure for result data. For the full list of models, see wapiti/operations/models.py
A PageIdentifier
describes the standard information available for a page.
A RevisionInfo
describes the standard information for a revision.
PageIdentifier
A Revision
includes the same data as RevisionInfo
, plus full text content.