wtf_wikipedia

a pretty-committed wikipedia markup parser

MIT License

Downloads
6.7K
Stars
773
Committers
70
wtf_wikipedia - 9.0.3

Published by spencermountain almost 3 years ago

9.0.3

  • [fix] - typescript error
  • [change] - update demos
wtf_wikipedia - 9.0.2

Published by spencermountain almost 3 years ago

  • bump deps, npm audit fixes
  • minor releases for all plugins

thank you @wvanderp

wtf_wikipedia - 9.0.1

Published by spencermountain about 3 years ago

  • [fix] - runtime error in cli (thanks maxlath!)
  • [fix] - linter fixes for regexes
  • update deps
wtf_wikipedia - 9.0.0

Published by spencermountain over 3 years ago

Tldr:

  • .templates() now return Template objects, instead of json.
  • cool new http library for .fetch()
  • custom templates recieve pre-parsed json
  • more development of plugins

detail:

  • [breaking] - .templates() now returns Template objects, like other methods (call .json())

  • [breaking] - change interpretation of reversed params in .fetch() method (thanks wouter!)

  • [breaking] - change params for custom templates

  • [breaking] - move .random() and .category() to plugin-api

  • [breaking] - always return an array for plural methods, even with number param, like .links(3)

  • [possibly-breaking] - cleanup null|undefined responses from methods

  • [possibly-breaking] - remove .dates() method (prev deprecated)

  • [possibly-breaking] - require node 10, ie > 11

  • [change] - normalize table rows

  • [change] - move wiktionary templates to wtf-plugin-wiktionary

  • [change] - Link.text() now returns page

  • [change] - improvements to 'soft' isDisambiguation detection

  • [change] - deprecate wtf-plugin-category (move to wtf-plugin-api)

  • [new] - api plugin

  • [new] - disambig plugin

  • [new] - person plugin

  • [new] - Table.get() method

  • [new] - set new infoboxes using .extend()

  • plugin-api 0.0.1

  • plugin-classify 1.0.0

  • plugin-disambig 0.0.1

  • plugin-image 0.3.0

  • plugin-person 0.2.0

  • plugin-summary 0.3.0

  • plugin-wikitext 1.1.0

  • plugin-wikinews 0.0.1

  • plugin-wikivoyage 0.0.1

  • plugin-wiktionary 0.0.1

wtf_wikipedia - 8.5.1

Published by spencermountain about 4 years ago

fix reference json encoding for mongodb

wtf_wikipedia - 8.5.0

Published by spencermountain about 4 years ago

  • fix for cross-domain 3rd-party wikis
  • improved support for fetching non-wikipedia domains
wtf_wikipedia - 8.4.0

Published by spencermountain over 4 years ago

8.4.0

  • new wikidata() method
  • new domain() method
  • support image urls from 3rd-party wikis
  • support for some html formatting tags #374
  • support for sub and sup templates
  • [fix] for link-parsing bug #375
wtf_wikipedia - 8.3.0

Published by spencermountain over 4 years ago

  • adds some wikivoyage templates
  • fix cli help options
  • change covid template again
wtf_wikipedia - 8.2.1

Published by spencermountain over 4 years ago

fix #260 and #348

wtf_wikipedia - 8.2.0

Published by spencermountain over 4 years ago

8.2.0

  • export http lib for plugin in .extend()
  • stop exporting (huge) mapfile in builds
  • deprecate .dates() from sentence class (didn't work)
  • stop ignoring ref-list template, keep otherwise empty ==References== sections
wtf_wikipedia - 8.1.2

Published by spencermountain over 4 years ago

track changes to covid templates

wtf_wikipedia - 8.1.1

Published by spencermountain over 4 years ago

bugfix for table parser

wtf_wikipedia - 8.1.0

Published by spencermountain over 4 years ago

8.1.0

  • [major] fix Link json object in .json() result
  • [major] fix inconsistent response for singular method aliases like .template('foo')
  • [major] change in rowspan behaviour to support covid table
  • support <noinclude>
  • add .url() and .language() methods
    • support setters on Link methods
    • add Link.href() method
    • support proper urls for interwiki links
  • replicate wikipedia behaviour for apostrophe-s after link
  • new plugins summary, classify, category, and i18n.
  • Link hrefs are not titlecased anymore by default
wtf_wikipedia - 8.0.0

Published by spencermountain over 4 years ago

8.0.0

  • [breaking] move .html(), .latex(), and .markdown() to their respective plugins
    • drop header/footer boilerplate from outputs
  • [breaking] .templates() and .links() return Template and Link objects, and not bare JSON (use .map(l=> l.json()))
  • [breaking] refactor inputs for .fetch()
    • no longer support 'enwikiquote' etc format as input
    • use 'wiki' instead of undocumented 'wikiUrl' param
    • no more automatic throttling/rate-limiting
  • [breaking] remove Image.exists() method to plugin
  • [major] create seperate client/server-side build formats (use native fetch/node lib)
  • [major] support deep (infinite) recursion in templates
  • [major] much-stronger i18n support
  • no-longer automatically titlecase links
  • support adding template parsers through plugins in .extend()
    • support array, number, and string shorthand for template parsers
  • deprecate .plaintext() in favour of .text()
wtf_wikipedia - 7.2.10

Published by spencermountain over 5 years ago

7.1.0

  • some template fixes
  • add a 'number' field in sentence json, when it looks like a number
  • slight change in coordinate result format, support inline coordinate text
  • handle fetching a large list of titles in sequence

7.1.1

  • support population, weatherbox templates

7.2.0

  • improved date templates, bugfixes

7.2.9

  • few more sports templates,
  • rowspan parsing fix
  • no-longer include package.json in builds
  • use full template-parser for image captions
  • support manually setting doc.title()

7.2.10

  • improved unicode support for sentence/paragraph splitting
  • supporting more formatting templates, like Mono
  • more flexible reference support in .json()
wtf_wikipedia - 7.0.0

Published by spencermountain almost 6 years ago

6.0.0 🚨

  • support .paragraphs()
  • ⚠️ major changes to output of .json(). cleaning-up redundant data.⚠️
    • remove top-level templates data (found in section) - resume it with {templates:true}
    • remove top-level coordinates data (found in templates) - resume it with {coordinates:true}
    • remove top-level citations data (found in section) - resume it with {citations:true}
  • return empty arrays in .json() again ¯_(:/)_ /¯
  • remove title on html output
  • change ambiguous options.title for sections to options.headers
  • support lists of 1
  • begin removing empty references section by default
  • begin support for rendering citations at the bottom of documents
  • begin first-class references-parsing as objects at paragraph-level
    • use this: .citations() --> .citations().map(c => c.json());
  • remove .wikitext() and .reparse() methods - keeping wikitext stateful caused too many issues
  • turn Image.file into a function
  • include interwiki() results in .links()
  • support follow_redirects option to fetch
  • hide object data in console.logs
  • move ALL image urls from upload.wikimedia.org/wikipedia/commons to wikipedia.org/wiki/Special:Redirect/file/ via #86
  • image captions are now Sentence objects
  • rename citation → reference internally, and in json output
  • remove references inside section titles

6.1.0

  • titlecase internal link destinations #192

6.2.0

  • support categories in redirects
  • add mongo-encoding from dumpster-dive

6.3.0

  • support way (+20%?) more templates.

7.0.0

  • change result-format in a lot of templates, for more consistency.
    • notably: reference format, see also, IPA, main
  • support colspan/rowspan in tables (a little!)
  • support implicit first-row headers for some tables
  • return templates even if they have no data
  • begin support for some well-used {{foo start}}...{{foo end}} templates
  • remove empty [] for some more section properties in .json() response
wtf_wikipedia - 5.3.1

Published by spencermountain about 6 years ago

last stable release before v6

from changelog:
5.1.0
improved support for gallery tag
more support for wiktionary grammar templates
tweak some regexes
5.2.0
make .json() results return proper json for tables
5.3.0
add infobox html back into html output (tentative)
redirect support in .json(), .html() output
remove empty [] properties in .json() results (saves disk space!)
keep # anchor data in .links()
show links default-on in latex output, like in md and html
render html/latex/json 'soft redirect', instead of blank pages

wtf_wikipedia - 5.0.0

Published by spencermountain about 6 years ago

3.0.0

  • move .parse() to main wtf() method
  • allow repeated processes without a pre-parse of the document
  • wtf.fetch() uses promises, and native fetch() method (when available)
  • allow per-section images, lists, tables + templates
  • section depth values now start at 0
  • infobox values now return sentence objects
  • latex output (thanks @niebert!)
  • refactor shell scripts to wtf_wikipedia Toronto --plaintext
  • use babel-preset-env cause it's new-new
  • update deps

3.1.0

  • improved .json() results
  • guess a page's title based on bold formatting in first sentence
  • make section.title a function

4.0.0

  • 🚨 non-api changing, but large result-format change
  • add .wikitext() method to Document, Section, Sentence (thanks @niebert)
  • move infobox, citation parser/data to Section class
  • .templates() are now an ordered array, instead of an object, and include infoboxes and citations
  • add (early) support for 'generic' key-value template parsing
  • normalize/lowercase template/infobox properties - add loose .get('key') method to Infobox class
  • mess-around with citation-template formatting
  • beginning to support unknown template forms
  • move date data from Sentence to Section object.
  • rollback of awkward+undocumented options param in parser (but keep options param for output methods)
  • add support for about a hundred new templates
  • templates, including citations, try to be flat-text, and no-longer return Sentence objects

4.1.0

  • remove repeated/redundant text in .links() results
  • don't automatically titlecase link srcs anymore

4.2.0

  • return a result or undefined for sentences.bolds(0), and the like

4.2.2

  • support dollar templates

4.5.0

  • support section(0).wikitext()
  • support inline {{marriage}} template
  • dangling semi-colons in first-sentence parentheses

4.6.0

  • <gallery> tag support in .images()
  • support pageids again in .fetch()
  • better disambiguation-page detection in english
  • remove wikitext from caption titles
  • support 3-level templates (whew!)

5.0.0

  • new Table class and List classes
  • improved table-parser - generate name col1 instead of col-0
  • support options.verbose_template for debugging
  • support recursive tables
wtf_wikipedia - 1.0.0

Published by spencermountain over 7 years ago

breaking change with 0x, sections are now formatted as an array of objects, with depth information.

tables are parsed into an array of key-value pairs.

options object is removed.

all is refactored