MwParserFromScratch

A basic .NET Library for parsing wikitext into AST.

APACHE-2.0 License

Stars
16
MwParserFromScratch - v0.3.0-int.7 Latest Release

Published by CXuesong 2 months ago

  • HtmlTag.ToPlainTextCore: Write \n when formatting <br /> or <hr /> into plaintext.
MwParserFromScratch - v0.3.0-int.6

Published by CXuesong almost 3 years ago

  • Fixed CloneCore implementation for various Node classes. (#18 by @FaFre)
  • TemplateArgumentCollection.SetValue now accepts string as argument name.
  • Added Wikitext constructor overload shorthands that accept string (as PlainText) or InlineNode instances.
  • Dropped explicit support for .NET Framework. We are planning to drop support for lower .NET Standard version in the future in order to adopt new C# language features.
MwParserFromScratch - v0.3.0-int.4

Published by CXuesong over 4 years ago

  • Bug fix: NullReferenceException when calling Node.ToPlainText on HtmlTags with empty content. (#15)
MwParserFromScratch - v0.3.0-int.3

Published by CXuesong over 4 years ago

  • Add support for embedded image expression parsing. AST node type: WikiImageLink. (#13)
MwParserFromScratch - v0.3.0-int.2

Published by CXuesong over 4 years ago

  • Refactored Node.ToPlainText to receive an optional delegate for customizing how a Node should be converted into plain text. (#15)
    • Removed NodePlainTextOptions to this end.
  • The following parser tags are not shown in ToPlainText output
    • math
    • ref
    • templatedata
    • templatestyles
    • To change this default behavior, refer to #15.
MwParserFromScratch - v0.3.0-int.1

Published by CXuesong over 4 years ago

  • Force to end all the unbalanced HTML tags at the end of WIKITEXT. (#14)
    • The old behavior is to leave tags as unparsed PLAIN_TEXT. The old behavior can cause performance issue when parsing unbalanced tags, such as #3.
  • Take care of unbalanced <li> tag parsing. (#14)
  • Unbalanced tags will have TagStyle set to TagStyle.NotClosed.
MwParserFromScratch - 0.2.1

Published by CXuesong over 6 years ago

  • Fixed #12. Now parser handles tags with uppercase letters properly.
  • Prepare for the wikitext table expression support
    • Added TagAttributeCollection, which in the future will be used in TagNode as well as in TableContentNode
    • Converted InlineContainer abstract class to IInlineContainer interface
    • Added preliminary Table class.
    • The table parsing support is still yet to be implemented.
MwParserFromScratch - 0.2.0

Published by CXuesong over 7 years ago

  • (#6) Implemented a rudimentary closing mark inference feature.
  • (#7) Migrated IWikitextSpanInfo from offset-based to line/column-based.
    • Renamed IWikitextSpanInfo to IWikitextLineInfo.
  • Requires LIST_ITEM starts at the beginning of the line.
    • the * item in {{Template|* item}} will be parsed as plain-text rather than unordered list.
  • WikitextParser is now thread-safe.
    • As long as you do not call Parse when changing the content of WikitextParserOptions.

The attached "debug" package is built in DEBUG mode, and may provide some extra runtime assertions (Debug.Assert).

MwParserFromScratch - 0.1.4

Published by CXuesong over 7 years ago

  • Fixed missing LineInfo of PlainText in Paragraph, especially when a \n is in the paragraph content.
MwParserFromScratch - 0.1.3

Published by CXuesong over 7 years ago

  • Now WikitextParser.Parse() supports CancellationToken.
  • Fxied NullReferenceException when WikitextParser is instantiated with IWikitextParserLogger.
MwParserFromScratch - 0.1.2

Published by CXuesong over 7 years ago

  • Use MagicTemplateNames to take the place of VariableNames in WikitextParserOptions.
    • You can specifiy whether a variable or parser function is case-sensitive.
  • Improved debugger view of NodeCollection.
  • Fixed the bug of presence of multiple LineInfoAnnotation in parsed Node instances.
  • Fixed the bug of ParentNode == null for HTML tag content Wikitext node.
MwParserFromScratch - 0.1.1

Published by CXuesong over 7 years ago

Maybe it's time to publish a real release.


  • The NuGet package now has two target platforms: .NET Framework 4.5 and .NET Standard 1.1.
  • Template can now distingish parser functions, and variables, from normal template expressions.
    • You will be able to extract the first argument from {{#if:expr|yes|no}}, rather than getting a Template whose Name is #if:expr.
  • Fixed the missing LineInfo for ExternalLink & TagAttribute nodes.
  • NormalizeTitle & NormalizeTemplateArgumentName now accepts null values, and they will return null for such cases.
MwParserFromScratch -

Published by CXuesong over 7 years ago

  • Add NodeCollection.AddFirst & InlineContainer.Prepend methods.
  • Bug fix: Incorrect implementation of INodeCollection.Remove.
  • Bug fix: Incorrect decision of template argument name for positional arguments.
MwParserFromScratch -

Published by CXuesong over 7 years ago

  • Now NormalizeTitlePart recognizes all whitespace characters, including \r, \n, in addition to space, so that the leading and trailing line carry will be stripped in the normalization process.
MwParserFromScratch - v0.1-beta

Published by CXuesong over 7 years ago

Initial release.