pyparsing

Python library for creating PEG parsers

MIT License

Downloads
153.4M
Stars
2.1K
Committers
71

Bot releases are hidden (Show)

pyparsing - Pyparsing 3.1.2 Latest Release

Published by ptmcg 8 months ago

  • Support for Python 3.13.

  • Added ieee_float expression to pyparsing.common, which parses float values, plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538).

  • Updated pep8 synonym wrappers for better type checking compatibility. PR submitted by Ricardo Coccioli (#507).

  • Fixed empty error message bug, PR submitted by InSync (#534). This should return pyparsing's exception messages to a former, more helpful form. If you have code that parses the exception messages returned by pyparsing, this may require some code changes.

  • Added unit tests to test for exception message contents, with enhancement to pyparsing.testing.assertRaisesParseException to accept an expected exception message.

  • Updated example select_parser.py to use PEP8 names and added Groups for better retrieval of parsed values from multiple SELECT clauses.

  • Added example email_address_parser.py, as suggested by John Byrd (#539).

  • Added example directx_x_file_parser.py to parse DirectX template definitions, and generate a Pyparsing parser from a template to parse .x files.

  • Some code refactoring to reduce code nesting, PRs submitted by InSync.

  • All internal string expressions using '%' string interpolation and str.format() converted to f-strings.

pyparsing - Pyparsing 3.1.1

Published by ptmcg about 1 year ago

  • Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue #502)

  • Fixed bug in bad exception messages raised by Forward expressions. PR submitted by Kyle Sunden, thanks for your patience and collaboration on this (#493).

  • Fixed regression in SkipTo, where ignored expressions were not checked when looking for the target expression. Reported by catcombo, Issue #500.

  • Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498)

  • Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)

pyparsing - Pyparsing 3.1.0

Published by ptmcg over 1 year ago

NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString) will start to raise DeprecationWarnings. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)

Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.

Version 3.1.0 - June, 2023

API CHANGES

  • A slight change has been implemented when unquoting a quoted string parsed using the QuotedString class. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '' escaping would be done on the resulting string. This would parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks!

  • Reworked delimited_list function into the new DelimitedList class. DelimitedList has the same constructor interface as delimited_list, and in this release, delimited_list changes from a function to a synonym for DelimitedList. delimited_list and the older delimitedList method will be deprecated in a future release, in favor of DelimitedList.

  • ParserElement.validate() is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such as ParserElement.set_debug() and ParserElement.create_diagram(). (Raised in Issue #444, thanks Andrea Micheli!)

NEW FEATURES AND ENHANCEMENTS

  • Optional(expr) may now be written as expr | ""

    This will make this code:

    "{" + Optional(Literal("A") | Literal("a")) + "}"
    

    writable as:

    "{" + (Literal("A") | Literal("a") | "") + "}"
    

    Some related changes implemented as part of this work:

    • Literal("") now internally generates an Empty() (and no longer raises an exception)
    • Empty is now a subclass of Literal

    Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.

  • Added new class method ParserElement.using_each, to simplify code that creates a sequence of Literals, Keywords, or other ParserElement subclasses.

    For instance, to define suppressible punctuation, you would previously write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
    

    You can now write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
    

    using_each will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:

    algebra_var = MatchFirst(
        Char.using_each(string.ascii_lowercase, as_keyword=True)
    )
    
  • Added new builtin python_quoted_string, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.)

  • Extended expr[] notation for repetition of expr to accept a slice, where the slice's stop value indicates a stop_on expression:

    test = "BEGIN aaa bbb ccc END"
    BEGIN, END = Keyword.using_each("BEGIN END".split())
    body_word = Word(alphas)
    
    expr = BEGIN + Group(body_word[...:END]) + END
    # equivalent to
    # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
    
    print(expr.parse_string(test))
    

    Prints:

    ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
    
  • Added named field "url" to pyparsing.common.url, returning the entire parsed URL string.

  • Added bool embed argument to ParserElement.create_diagram(). When passed as True, the resulting diagram will omit the <DOCTYPE>, <HEAD>, and <BODY> tags so that it can be embedded in other HTML source. (Useful when embedding a call to create_diagram() in a PyScript HTML page.)

  • Added recurse argument to ParserElement.set_debug to set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399.

  • Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.

  • ParseResults now has a new method deepcopy(), in addition to the current copy() method. copy() only makes a shallow copy - any contained ParseResults are copied as references - changes in the copy will be seen as changes in the original. In many cases, a shallow copy is sufficient, but some applications require a deep copy. deepcopy() makes a deeper copy: any contained ParseResults or other mappings or containers are built with copies from the original, and do not get changed if the original is later changed. Addresses issue #463, reported by Bryn Pickering.

  • Added new class property identifier to all Unicode set classes in pyparsing.unicode, using the class's values for cls.identchars and cls.identbodychars. Now Unicode-aware parsers that formerly wrote:

    ppu = pyparsing.unicode
    ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
    

    can now write:

    ident = ppu.Greek.identifier
    # or
    # ident = ppu.Ελληνικά.identifier
    
  • Error messages from MatchFirst and Or expressions will try to give more details if one of the alternatives matches better than the others, but still fails. Question raised in Issue #464 by msdemlei, thanks!

BUG FIXES AND GENERAL CHANGES

  • Added support for Python 3.12.

  • Updated ci.yml permissions to limit default access to source - submitted by Joyce Brum of Google. Thanks so much!

  • Updated create_diagram() code to be compatible with railroad-diagrams package version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars), reported by Sam Morley-Short.

  • Fixed bug in NotAny, where parse actions on the negated expr were not being run. This could cause NotAny to incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False. Fixes Issue #482, raised by byaka, thank you!

  • Fixed create_diagram() to accept keyword args, to be passed through to the template.render() method to generate the output HTML (PR submitted by Aussie Schnore, good catch!)

  • Fixed bug in python_quoted_string regex.

  • Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:

    expr = Literal("X").add_parse_action(lambda tokens: "")("value")
    result = expr.parse_string("X")
    print(result["value"])
    

    would raise a KeyError. Now empty strings will be saved with the associated results name. Raised in Issue #470 by Nicco Kunzmann, thank you.

  • Fixed bug in SkipTo where ignore expressions were not properly handled while scanning for the target expression. Issue #475, reported by elkniwt, thanks (this bug has been there for a looooong time!).

  • Fixed bug in Word when max=2. Also added performance enhancement when specifying exact argument. Reported in issue #409 by panda-34, nice catch!

  • Word arguments are now validated if min and max are both given, that min <= max; raises ValueError if values are invalid.

  • Fixed bug in srange, when parsing escaped '/' and '' inside a range set.

  • Fixed exception messages for some ParserElements with custom names, which instead showed their contained expression names.

  • Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.

  • Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!

  • Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.

  • General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!

EXAMPLE UPDATES

  • Added tag_emitter.py to examples. This example demonstrates how to insert tags into your parsed results that are not part of the original parsed text.

  • Added bf.py Brainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results.

  • invRegex.py example renamed to inv_regex.py and updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks!

  • Removed examples sparser.py and pymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.

  • Updated the lucene_grammar.py example (better support for '*' and '?' wildcards) and corrected the test cases - brought to my attention by Elijah Nicol, good catch!

pyparsing - Pyparsing 3.1.0b2

Published by ptmcg over 1 year ago

  • Updated create_diagram() code to be compatible with railroad-diagrams package version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars), reported by Sam Morley-Short.

  • Fixed bug in NotAny, where parse actions on the negated expr were not being run. This could cause NotAny to incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False. Fixes Issue #482, raised by byaka, thank you!

  • Fixed create_diagram() to accept keyword args, to be passed through to the template.render() method to generate the output HTML (PR submitted by Aussie Schnore, good catch!)

  • Fixed bug in python_quoted_string regex.

  • Added examples/bf.py Brainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results.

pyparsing - Pyparsing 3.1.0b1

Published by ptmcg over 1 year ago

  • Added support for Python 3.12.

  • API CHANGE: A slight change has been implemented when unquoting a quoted string parsed using the QuotedString class. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '' escaping would be done on the resulting string. This would parse "\n" as "<newline>". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks!

  • Added named field "url" to pyparsing.common.url, returning the entire parsed URL string.

  • Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:

    expr = Literal("X").add_parse_action(lambda tokens: "")("value")
    result = expr.parse_string("X")
    print(result["value"])
    

    would raise a KeyError. Now empty strings will be saved with the associated results name. Raised in Issue #470 by Nicco Kunzmann, thank you.

  • Fixed bug in SkipTo where ignore expressions were not properly handled while scanning for the target expression. Issue #475, reported by elkniwt, thanks (this bug has been there for a looooong time!).

  • Updated ci.yml permissions to limit default access to source - submitted by Joyce Brum of Google. Thanks so much!

  • Updated the lucene_grammar.py example (better support for '*' and '?' wildcards) and corrected the test cases - brought to my attention by Elijah Nicol, good catch!

pyparsing - Pyparsing 3.1.0a1

Published by ptmcg over 1 year ago

NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString) will start to raise DeprecationWarnings. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)

Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.

  • API ENHANCEMENT: Optional(expr) may now be written as expr | ""

    This will make this code:

    "{" + Optional(Literal("A") | Literal("a")) + "}"
    

    writable as:

    "{" + (Literal("A") | Literal("a") | "") + "}"
    

    Some related changes implemented as part of this work:

    • Literal("") now internally generates an Empty() (and no longer raises an exception)
    • Empty is now a subclass of Literal

    Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.

  • Added new class property identifier to all Unicode set classes in pyparsing.unicode, using the class's values for cls.identchars and cls.identbodychars. Now Unicode-aware parsers that formerly wrote:

    ppu = pyparsing.unicode
    ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
    

    can now write:

    ident = ppu.Greek.identifier
    # or
    # ident = ppu.Ελληνικά.identifier
    
  • Reworked delimited_list function into the new DelimitedList class. DelimitedList has the same constructor interface as delimited_list, and in this release, delimited_list changes from a function to a synonym for DelimitedList. delimited_list and the older delimitedList method will be deprecated in a future release, in favor of DelimitedList.

  • Added new class method ParserElement.using_each, to simplify code that creates a sequence of Literals, Keywords, or other ParserElement subclasses.

    For instance, to define suppressable punctuation, you would previously write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
    

    You can now write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
    

    using_each will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:

    algebra_var = MatchFirst(
        Char.using_each(string.ascii_lowercase, as_keyword=True)
    )
    
  • Added new builtin python_quoted_string, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.)

  • Extended expr[] notation for repetition of expr to accept a slice, where the slice's stop value indicates a stop_on expression:

    test = "BEGIN aaa bbb ccc END"
    BEGIN, END = Keyword.using_each("BEGIN END".split())
    body_word = Word(alphas)
    
    expr = BEGIN + Group(body_word[:END]) + END
    # equivalent to
    # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
    
    print(expr.parse_string(test))
    

    Prints:

    ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
    
  • ParserElement.validate() is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such as ParserElement.set_debug() and ParserElement.create_diagram(). (Raised in Issue #444, thanks Andrea Micheli!)

  • Added bool embed argument to ParserElement.create_diagram(). When passed as True, the resulting diagram will omit the <DOCTYPE>, <HEAD>, and <BODY> tags so that it can be embedded in other HTML source. (Useful when embedding a call to create_diagram() in a PyScript HTML page.)

  • Added recurse argument to ParserElement.set_debug to set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399.

  • Added '·' (Unicode MIDDLE DOT) to the set of pp.unicode.Latin1.identbodychars.

  • Fixed bug in Word when max=2. Also added performance enhancement when specifying exact argument. Reported in issue #409 by panda-34, nice catch!

  • Word arguments are now validated if min and max are both given, that min <= max; raises ValueError if values are invalid.

  • Fixed bug in srange, when parsing escaped '/' and '' inside a range set.

  • Fixed exception messages for some ParserElements with custom names, which instead showed their contained expression names.

  • Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.

  • Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!

  • Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.

  • General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!

  • invRegex.py example renamed to inv_regex.py and updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks!

  • Removed examples sparser.py and pymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.

pyparsing - pyparsing 3.0.9

Published by ptmcg over 2 years ago

  • Added Unicode set BasicMultilingualPlane (may also be referenced as BMP) representing the Basic Multilingual Plane (Unicode characters up to code point 65535). Can be used to parse most language characters, but omits emojis, wingdings, etc. Raised in discussion with Dave Tapley (issue #392).

  • To address mypy confusion of pyparsing.Optional and typing.Optional resulting in error: "_SpecialForm" not callable message reported in issue #365, fixed the import in exceptions.py. Nice sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you! (Removed definitions of OptionalType, DictType, and IterableType and replaced them with typing.Optional, typing.Dict, and typing.Iterable throughout.)

  • Fixed typo in jinja2 template for railroad diagrams, thanks for the catch Nioub (issue #388).

  • Removed use of deprecated pkg_resources package in railroad diagramming code (issue #391).

  • Updated bigquery_view_parser.py example to parse examples at https://cloud.google.com/bigquery/docs/reference/legacy-sql

pyparsing - pyparsing 3.0.8

Published by ptmcg over 2 years ago

Version 3.0.8 -

  • API CHANGE: modified pyproject.toml to require Python version 3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6 fail in evaluating the version_info class (implemented using typing.NamedTuple). If you are using an earlier version of Python 3.6, you will need to use pyparsing 2.4.7.

  • Improved pyparsing import time by deferring regex pattern compiles. PR submitted by Anthony Sottile to fix issue #362, thanks!

  • Updated build to use flit, PR by Michał Górny, added BUILDING.md doc and removed old Windows build scripts - nice cleanup work!

  • More type-hinting added for all arithmetic and logical operator methods in ParserElement. PR from Kazantcev Andrey, thank you.

  • Fixed infix_notation's definitions of lpar and rpar, to accept parse expressions such that they do not get suppressed in the parsed results. PR submitted by Philippe Prados, nice work.

  • Fixed bug in railroad diagramming with expressions containing Combine elements. Reported by Jeremy White, thanks!

  • Added show_groups argument to create_diagram to highlight grouped elements with an unlabeled bounding box.

  • Added unicode_denormalizer.py to the examples as a demonstration of how Python's interpreter will accept Unicode characters in identifiers, but normalizes them back to ASCII so that identifiers print and 𝕡𝓻ᵢ𝓃𝘁 and 𝖕𝒓𝗂𝑛ᵗ are all equivalent.

  • Removed imports of deprecated sre_constants module for catching exceptions when compiling regular expressions. PR submitted by Serhiy Storchaka, thank you.

pyparsing - pyparsing 3.0.7

Published by ptmcg over 2 years ago

  • Fixed bug #345, in which delimitedList changed expressions in place using expr.streamline(). Reported by Kim Gräsman, thanks!

  • Fixed bug #346, when a string of word characters was passed to WordStart or WordEnd instead of just taking the default value. Originally posted as a question by Parag on StackOverflow, good catch!

  • Fixed bug #350, in which White expressions could fail to match due to unintended whitespace-skipping. Reported by Fu Hanxi, thank you!

  • Fixed bug #355, when a QuotedString is defined with characters in its quoteChar string containing regex-significant characters such as ., *, ?, [, ], etc.

  • Fixed bug in ParserElement.run_tests where comments would be displayed using with_line_numbers.

  • Added optional "min" and "max" arguments to delimited_list. PR submitted by Marius, thanks!

  • Added new API change note in whats_new_in_pyparsing_3_0_0, regarding a bug fix in the bool() behavior of ParseResults.

    Prior to pyparsing 3.0.x, the ParseResults class implementation of __bool__ would return False if the ParseResults item list was empty, even if it contained named results. In 3.0.0 and later, ParseResults will return True if either the item list is not empty or if the named results dict is not empty.

    # generate an empty ParseResults by parsing a blank string with
    # a ZeroOrMore
    result = Word(alphas)[...].parse_string("")
    print(result.as_list())
    print(result.as_dict())
    print(bool(result))
    
    # add a results name to the result
    result["name"] = "empty result"
    print(result.as_list())
    print(result.as_dict())
    print(bool(result))
    

    Prints:

    []
    {}
    False
    
    []
    {'name': 'empty result'}
    True
    

    In previous versions, the second call to bool() would return False.

  • Minor enhancement to Word generation of internal regular expression, to emit consecutive characters in range, such as "ab", as "ab", not "a-b".

  • Fixed character ranges for search terms using non-Western characters in booleansearchparser, PR submitted by tc-yu, nice work!

  • Additional type annotations on public methods.

pyparsing - pyparsing 3.0.6

Published by ptmcg almost 3 years ago

  • Added suppress_warning() method to individually suppress a warning on a specific ParserElement. Used to refactor original_text_for to preserve internal results names, which, while undocumented, had been adopted by some projects.

  • Fix bug when delimited_list was called with a str literal instead of a parse expression.

pyparsing - pyparsing 3.0.5

Published by ptmcg almost 3 years ago

  • Added return type annotations for col, line, and lineno.

  • Fixed bug when warn_ungrouped_named_tokens_in_collection warning was raised when assigning a results name to an original_text_for expression. (Issue #110, would raise warning in packaging.)

  • Fixed internal bug where ParserElement.streamline() would not return self if already streamlined.

  • Changed run_tests() output to default to not showing line and column numbers. If line numbering is desired, call with with_line_numbers=True. Also fixed minor bug where separating line was not included after a test failure.

pyparsing - pyparsing 3.0.4

Published by ptmcg almost 3 years ago

  • Fixed bug in which Dict classes did not correctly return tokens as nested ParseResults, reported by and fix identified by Bu Sun Kim, many thanks!!!

  • Documented API-changing side-effect of converting ParseResults to use __slots__ to pre-define instance attributes. This means that code written like this (which was allowed in pyparsing 2.4.7):

    result = Word(alphas).parseString("abc")
    result.xyz = 100
    

    now raises this Python exception:

    AttributeError: 'ParseResults' object has no attribute 'xyz'
    

    To add new attribute values to ParseResults object in 3.0.0 and later, you must assign them using indexed notation:

    result["xyz"] = 100
    

    You will still be able to access this new value as an attribute or as an indexed item.

  • Fixed bug in railroad diagramming where the vertical limit would count all expressions in a group, not just those that would create visible railroad elements.

pyparsing - pyparsing 3.0.3

Published by ptmcg almost 3 years ago

  • Fixed regex typo in one_of fix for as_keyword=True.

  • Fixed a whitespace-skipping bug, Issue #319, introduced as part of the revert of the LineStart changes. Reported by Marc-Alexandre Côté, thanks!

  • Added header column labeling > 100 in with_line_numbers - some input lines are longer than others.

pyparsing - pyparsing 3.0.2

Published by ptmcg almost 3 years ago

  • Reverted change in behavior with LineStart and StringStart, which changed the interpretation of when and how LineStart and StringStart should match when a line starts with spaces. In 3.0.0, the xxxStart expressions were not really treated like expressions in their own right, but as modifiers to the following expression when used like LineStart() + expr, so that if there were whitespace on the line before expr (which would match in versions prior to 3.0.0), the match would fail.

    3.0.0 implemented this by automatically promoting LineStart() + expr to AtLineStart(expr), which broke existing parsers that did not expect expr to necessarily be right at the start of the line, but only be the first token found on the line. This was reported as a regression in Issue #317.

    In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new AtLineStart and AtStringStart expression classes, so that parsers can chose whichever behavior applies in their specific instance. Specifically:

    # matches expr if it is the first token on the line (allows for leading whitespace)
    LineStart() + expr
    
    # matches only if expr is found in column 1
    AtLineStart(expr)
    
  • Performance enhancement to one_of to always generate an internal Regex, even if caseless or as_keyword args are given as True (unless explicitly disabled by passing use_regex=False).

  • IndentedBlock class now works with recursive flag. By default, the results parsed by an IndentedBlock are grouped. This can be disabled by constructing the IndentedBlock with grouped=False.

pyparsing - pyparsing 3.0.1

Published by ptmcg almost 3 years ago

  • Fixed bug where Word(max=n) did not match word groups less than length 'n'. Thanks to Joachim Metz for catching this!

  • Fixed bug where ParseResults accidentally created recursive contents. Joachim Metz on this one also!

  • Fixed bug where warn_on_multiple_string_args_to_oneof warning is raised even when not enabled.

pyparsing - pyparsing 3.0.0

Published by ptmcg almost 3 years ago

Version 3.0.0 -

Version 3.0.0.final -

  • Added support for python -W warning option to call enable_all_warnings() at startup. Also detects setting of PYPARSINGENABLEALLWARNINGS environment variable to any non-blank value.

  • Fixed named results returned by url to match fields as they would be parsed using urllib.parse.urlparse.

  • Early response to with_line_numbers was positive, with some requested enhancements:
    . added a trailing "|" at the end of each line (to show presence of trailing spaces); can be customized using eol_mark argument
    . added expand_tabs argument, to control calling str.expandtabs (defaults to True to match parseString)
    . added mark_spaces argument to support display of a printing character in place of spaces, or Unicode symbols for space and tab characters
    . added mark_control argument to support highlighting of control characters using '.' or Unicode symbols, such as "␍" and "␊".

  • Modified helpers common_html_entity and replace_html_entity() to use the HTML entity definitions from html.entities.html5.

  • Updated the class diagram in the pyparsing docs directory, along with the supporting .puml file (PlantUML markup) used to create the diagram.

  • Added global method autoname_elements() to call set_name() on all locally defined ParserElements that haven't been explicitly named using set_name(), using their local variable name. Useful for setting names on multiple elements when creating a railroad diagram.

          a = pp.Literal("a")
          b = pp.Literal("b").set_name("bbb")
          pp.autoname_elements()
    

    a will get named "a", while b will keep its name "bbb".

pyparsing - pyparsing 3.0.0rc2

Published by ptmcg about 3 years ago

  • Added url expression to pyparsing_common. (Sample code posted by Wolfgang Fahl, very nice!)

    This new expression has been added to the urlExtractorNew.py example, to show how it extracts URL fields into separate results names.

  • Added method to pyparsing_testing to help debugging, with_line_numbers. Returns a string with line and column numbers corresponding to values shown when parsing with expr.set_debug():

    data = """\
       A
          100"""
    expr = pp.Word(pp.alphanums).set_name("word").set_debug()
    print(ppt.with_line_numbers(data))
    expr[...].parseString(data)
    

    prints:

                  1
         1234567890
       1:   A
       2:      100
      Match word at loc 3(1,4)
           A
           ^
      Matched word -> ['A']
      Match word at loc 11(2,7)
              100
              ^
      Matched word -> ['100']
    
  • Added new example cuneiform_python.py to demonstrate creating a new Unicode range, and writing a Cuneiform->Python transformer (inspired by zhpy).

  • Fixed issue #272, reported by PhasecoreX, when LineStart() expressions would match expressions that were not necessarily at the beginning of a line.

    As part of this fix, two new classes have been added: AtLineStart and AtStringStart.
    The following expressions are equivalent:

    LineStart() + expr      and     AtLineStart(expr)
    StringStart() + expr    and     AtStringStart(expr)
    
  • Fixed ParseFatalExceptions failing to override normal exceptions or expression matches in MatchFirst expressions. Addresses issue #251, reported by zyp-rgb.

  • Fixed bug in which ParseResults replaces a collection type value with an invalid type annotation (changed behavior in Python 3.9). Addresses issue #276, reported by Rob Shuler, thanks.

  • Fixed bug in ParseResults when calling __getattr__ for special double-underscored methods. Now raises AttributeError for non-existent results when accessing a name starting with '__'. Addresses issue #208, reported by Joachim Metz.

  • Modified debug fail messages to include the expression name to make it easier to sync up match vs success/fail debug messages.

pyparsing - pyparsing 3.0.0rc1

Published by ptmcg about 3 years ago

  • Railroad diagrams have been reformatted:
    . creating diagrams is easier - call

      expr.create_diagram("diagram_output.html")
    

    create_diagram() takes 3 arguments:
    . the filename to write the diagram HTML
    . optional 'vertical' argument, to specify the minimum number of items in a path to be shown vertically; default=3
    . optional 'show_results_names' argument, to specify whether results name annotations should be shown; default=False

    . every expression that gets a name using setName() gets separated out as a separate subdiagram
    . results names can be shown as annotations to diagram items
    . Each, FollowedBy, and PrecededBy elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND] annotations
    . removed annotations for Suppress elements
    . some diagram cleanup when a grammar contains Forward elements
    . check out the examples make_diagram.py and railroad_diagram_demo.py

  • Type annotations have been added to most public API methods and classes.

  • Better exception messages to show full word where an exception occurred.

    Word(alphas)[...].parseString("abc 123", parseAll=True)
    

    Was:

    pyparsing.ParseException: Expected end of text, found '1'  (at char 4), (line:1, col:5)
    

    Now:

    pyparsing.exceptions.ParseException: Expected end of text, found '123'  (at char 4), (line:1, col:5)
    
  • Suppress can be used to suppress text skipped using "...".

    source = "lead in START relevant text END trailing text"
    start_marker = Keyword("START")
    end_marker = Keyword("END")
    find_body = Suppress(...) + start_marker + ... + end_marker
    print(find_body.parseString(source).dump())
    

    Prints:

    ['START', 'relevant text ', 'END']
    - _skipped: ['relevant text ']
    
  • New string constants identchars and identbodychars to help in defining identifier Word expressions

    Two new module-level strings have been added to help when defining identifiers, identchars and identbodychars.

    Instead of writing::

    import pyparsing as pp
    identifier = pp.Word(pp.alphas + "_", pp.alphanums + "_")
    

    you will be able to write::

    identifier = pp.Word(pp.indentchars, pp.identbodychars)
    

    Those constants have also been added to all the Unicode string classes::

    import pyparsing as pp
    ppu = pp.pyparsing_unicode
    
    cjk_identifier = pp.Word(ppu.CJK.identchars, ppu.CJK.identbodychars)
    greek_identifier = pp.Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
    
  • Added a caseless parameter to the CloseMatch class to allow for casing to be ignored when checking for close matches. (Issue #281) (PR by Adrian Edwards, thanks!)

  • Fixed bug in Located class when used with a results name. (Issue #294)

  • Fixed bug in QuotedString class when the escaped quote string is not a repeated character. (Issue #263)

  • parseFile() and create_diagram() methods now will accept pathlib.Path arguments.

pyparsing -

Published by ptmcg about 3 years ago

  • PEP-8 compatible names are being introduced in pyparsing version 3.0!
    All methods such as parseString have been replaced with the PEP-8
    compliant name parse_string. In addition, arguments such as parseAll
    have been renamed to parse_all. For backward-compatibility, synonyms for
    all renamed methods and arguments have been added, so that existing
    pyparsing parsers will not break. These synonyms will be removed in a future
    release.

    In addition, the Optional class has been renamed to Opt, since it clashes
    with the common typing.Optional type specifier that is used in the Python
    type annotations. A compatibility synonym is defined for now, but will be
    removed in a future release.

  • HUGE NEW FEATURE - Support for left-recursive parsers!
    Following the method used in Python's PEG parser, pyparsing now supports
    left-recursive parsers when left recursion is enabled.

      import pyparsing as pp
      pp.ParserElement.enable_left_recursion()
    
      # a common left-recursion definition
      # define a list of items as 'list + item | item'
      # BNF:
      #   item_list := item_list item | item
      #   item := word of alphas
      item_list = pp.Forward()
      item = pp.Word(pp.alphas)
      item_list <<= item_list + item | item
    
      item_list.run_tests("""\
          To parse or not to parse that is the question
          """)
    

    Prints:

      ['To', 'parse', 'or', 'not', 'to', 'parse', 'that', 'is', 'the', 'question']
    

    Great work contributed by Max Fischer!

  • delimited_list now supports an additional flag allow_trailing_delim,
    to optionally parse an additional delimiter at the end of the list.
    Contributed by Kazantcev Andrey, thanks!

  • Removed internal comparison of results values against b"", which
    raised a BytesWarning when run with python -bb. Fixes issue #271 reported
    by Florian Bruhin, thank you!

  • Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by
    legrandlegrand - much better.

  • Python 3.5 will not be supported in the pyparsing 3 releases. This will allow
    for future pyparsing releases to add parameter type annotations, and to take
    advantage of dict key ordering in internal results name tracking.

pyparsing - Pyparsing 3.0.0b2

Published by ptmcg almost 4 years ago

  • API CHANGE
    locatedExpr is being replaced by the class Located. Located has the same constructor interface as locatedExpr, but fixes bugs in the returned ParseResults when the searched expression contains multiple tokens, or has internal results names.

    locatedExpr is deprecated, and will be removed in a future release.

Package Rankings
Top 11.52% on Anaconda.org
Top 2.69% on Alpine-v3.13
Top 2.86% on Alpine-v3.18
Top 3.2% on Alpine-v3.17
Top 2.85% on Alpine-v3.15
Top 6.87% on Alpine-edge
Top 1.07% on Pypi.org
Top 47.87% on Formulae.brew.sh
Top 3.49% on Alpine-v3.14
Top 3.02% on Alpine-v3.16