RustyLR

GLR, LR(1) LALR(1) parser generator for Rust with custom reduce action

APACHE-2.0 License

Downloads
112.1K
Stars
9
RustyLR - v3.3.0 Latest Release

Published by ehwan 5 days ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v3.0.0...v3.3.0

v3.3.0

  • add error and tree feature for debug purpose
  • add Context::can_feed() for checking if given terminal can be feeded to current status without chaning context
RustyLR - v3.0.0

Published by ehwan about 2 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.7.0...v3.0.0

Breaking API

  • removed Parser::feed(), Parser::begin()

    • Context can be created with Context::new()
    • feed() moved to Context::feed()
  • removed lalr! macro

    • LALR parser can be generated with %lalr directive.
  • cleaned README and cargo doc

  • changed LICENSE from MIT to MIT OR APACHE 2.0

RustyLR - v2.7.0

Published by ehwan about 2 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.6.0...v2.7.0

Resolving Ambiguities

In GLR parser, there can be both shift/reduce action possible at the same time, this leads to ambiguity of the grammar.
You can resolve the ambiguties through the reduce action.

  • Returning Result::Err(Error) from the reduce action will revoke current reducing path.
  • Setting predefined variable shift: &mut bool to false will revoke the shift action with lookahead token.

Consider the following example:

E : E plus E
  | E star E
  | digit
  ;

And you are trying to feed 1 + 2 * 3 + 4 eof to the parser.
There are 5 ways to represent the input sequence:

  • ((1 + 2) * 3) + 4
  • (1 + (2 * 3)) + 4
  • 1 + ((2 * 3) + 4)
  • 1 + (2 * (3 + 4))
  • (1 + 2) * (3 + 4)

However, we know the 2nd path is the only correct one,
since the star has higher precedence than plus, and both are left-associative.

To resolve the ambiguity, you can write the reduce action as follows:

E : E plus E {
      match *lookahead {
          '*' => {
              // no reduce if the next token is '*'
              // this prevent
              // E + E   /   *
              //             ^ lookahead
              // to be  E *  ...
              //        ^ (E + E)
              return Err("".to_string());
          }
          _ => {
              // revoke the shift action
              // this prevent
              // E + E   /  +
              //            ^ lookahead
              // to be E + E +  ...
              // and enforce the reduced token takes place
              // E + ...
              // ^ (E + E)
              *shift = false;
          }

      }
  }
  | E star E {
      *shift = false;
  }
  | Number
  ;
RustyLR - v2.6.0

Published by ehwan 2 months ago

Add GLR parser generator

Adding %glr; directive will switch the parser generation to GLR parser.
With this directive, any Shift/Reduce, Reduce/Reduce conflicts will not be treated as errors.

Resolving Ambiguities

You can resolve the ambiguties through the reduce action.
Simply, returning Result::Err(Error) from the reduce action will revoke current path.


Add tree feature

[dependencies]
rusty_lr = { version = "2.6.0", features=["tree"] }

For debugging purpose, tree feature enables the automatic construction of the syntax tree.
You can call context.to_tree_list() to get the reduced syntax tree.

let parser = Parser::new();
let mut context = parser.begin();
/// feed tokens...
println!( "{:?}", context.to_tree_list() ); // print tree list with `Debug` trait
println!( "{}", context.to_tree_list() );   // print tree list with `Display` trait
TreeList
├─A
│ └─M
│   └─P
│     └─Number
│       ├─WS0
│       │ └─_space_Star1
│       │   └─_space_Plus0
│       │     ├─_space_Plus0
│       │     │ └─' '
│       │     └─' '
│       ├─_Digit_Plus3
│       │ └─Digit
│       │   └─_TerminalSet2
│       │     └─'1'
│       └─WS0
│         └─_space_Star1
│           └─_space_Plus0
│             └─' '
├─'+'
├─M
│ └─P
... continue
RustyLR - v2.5.0-test

Published by ehwan 2 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.4.0...v2.5.0


Add GLR parser generator

Adding %glr; directive will switch the parser generation to GLR parser.
With this directive, any Shift/Reduce, Reduce/Reduce conflicts will not be treated as errors.

Resolving Ambiguities

You can resolve the ambiguties through the reduce action.
Simply, returning Result::Err(Error) from the reduce action will revoke current path.

Note on GLR Parser

  • Still in development, not have been tested enough (patches are welcome!).
  • Since there are multiple paths, the reduce action can be called multiple times, even if the result will be thrown away in the future.
    • Every RuleType and Term must implement Clone trait.
  • User must be aware of the point where shift/reduce or reduce/reduce conflicts occur.
    Every time the parser diverges, the calculation cost will increase.

Add lookahead predefined variable in reduce action

You can refer via lookahead variable in the reduce action, which token caused the reduce action.


RustyLR - v2.4.0

Published by ehwan 2 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.1.0...v2.4.0

syntax added - token with lookahead

  • P / term
  • P / [term1 term_start-term_last]
  • P / [^term1 term_start-term_last]
    Pattern P followed by one of given terminal set. Lookaheads are not consumed.

syntax added - parenthesis group patterns

E : A p=( P1 P2 P3 ) B { A + p.0 + p.1 + p.2 + B } ;

Captures subsequence P1 P2 P3 as single token.

  • If none of the patterns hold value, the group itself will not hold any value.
  • If only one of the patterns holds value, the group will hold the value of the very pattern. And the variable name will be same as the pattern.
    (i.e. If P1 holds value, and others don't, then (P1 P2 P3) will hold the value of P1, and can be accessed via name P1)
  • If there are multiple patterns holding value, the group will hold Tuple of the values. There is no default variable name for the group, you must define the variable name explicitly by = operator.
NoRuleType: ... ;

I(i32): ... ;

// I will be chosen
A: (NoRuleType I NoRuleType) {
    println!( "Value of I: {:?}", I ); // can access by 'I'
    I
};

// ( i32, i32 )
B: i2=( I NoRuleType I ) {
    println!( "Value of I: {:?}", i2 ); // must explicitly define the variable name
};

RustyLR - v2.1.0

Published by ehwan 2 months ago

Add feature for build.rs support

  • This buildscripting tool will provide much more detailed, pretty-printed error messages than the procedural macros, at compile time.
  • Generated code will contain the same structs and functions as the procedural macros.
  • In your actual source code, you can include! the generated file.
    You can enable the feature build to use in the build script.
[build-dependencies]
rusty_lr = { version = "...", features = ["build"] }
// build.rs
use rusty_lr::build;

fn main() {
    println!("cargo::rerun-if-changed=src/parser.rs");

    let output = format!("{}/parser.rs", std::env::var("OUT_DIR").unwrap());
    build::Builder::new()
        .file("src/parser.rs") // path to the input file
    //  .lalr()                // to generate LALR(1) parser
        .build(&output);       // path to the output file
}

The program searches for %% in the input file, not the lr1!, lalr1! macro. The contents before %% will be copied into the output file as it is. And the context-free grammar must be followed by %%.

If there is any errors when building a grammar, it will print error messages to stderr and then panic. This will make the messages to be shown during compilation.

image

RustyLR - v2.0.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.6.0...v2.0.0

v2.0.0 Release (stable)

  • removed feed_callback()
  • fixed ParseError variants
    - InvalidNonTerm is deleted since it never happens. Moved it to unreachable!
    - CallbackError is deleted
  • add type aliases for Rule, State, ParseError in generated structs
  • add exclamation ! pattern to ignore <RuleType> of token in production rule
RustyLR - v1.6.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.5.0...v1.6.0

executable rustylr now prints pretty error message

Screenshot 2024-08-12 at 11 06 03 PM

RustyLR - v1.5.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.3.0...v1.5.0

add feature fxhash

[dependencies]
rusty_lr = { version = "1", features = ["fxhash"] }

This replace std::collections::HashMap by FxHashMap of DFA.

removed trait bound PartialOrd, Ord from terminal symbol type.

This enable std::mem::discriminant can be used for enum Token type

RustyLR - v1.3.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.2.0...v1.3.0

make parse error messages much readable

Sample error message:

error: Invalid Terminal: '*'
Expected one of: ' ', '(', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
-------------------------------Backtracing state--------------------------------
M -> M '*' • M
-----------------------------------Prev state-----------------------------------
M -> M • '*' M
-----------------------------------Prev state-----------------------------------
A -> • A '+' A
A -> A '+' • A
A -> • M
M -> • M '*' M
-----------------------------------Prev state-----------------------------------
A -> A • '+' A
-----------------------------------Prev state-----------------------------------
A -> • A '+' A
E -> • A
Augmented -> • E '\0'
RustyLR - v1.2.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.1.0...v1.2.0

Add support for regex character set, range pattern

%token lparen '(';
%token rparen ')';
%token zero '0';
...
%token nine '9';

A: [zero-nine]+; // zero to nine

B: [^lparen rparen]; // any token except lparen and rparen

C: [lparen rparen one-nine]*; // lparen and rparen, and one to nine

Note that in range pattern [first-last], the range is constructed by the order of the %token directives, not by the actual value of the token. If you define tokens in the following order:

%token one '1';
%token two '2';
...
%token zero '0';
%token nine '9';

The range [zero-nine] will be ['0', '9'], not ['0'-'9'].

RustyLR - v1.1.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.0.0...v1.1.0

macro generates an enum for non-terminal symbols.

  • name of generated enum is <StartSymbol>NonTerminals
  • enum holds every non terminal symbols in the CFGs.
  • it may contains auto-generated rules ( e.g. Augmented, or for '+', '*', '?' pattern )
RustyLR - v1.0.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.12.1...v1.0.0

variables in reduce action now takes the value itself, include terminal

%tokentype char;
%token plus ... ;
A(i32): A plus {
    println!( "A is i32: {}", A );
    println!( "plus is char: {}", plus ); // not a reference `&char`
}
;
  • removed slice capture
  • you can use Vec as <RuleType> manually.
  • remove TermData, NonTermData

Reduce action can be omitted if the production rule has only one token data

// <RuleType> not defined for 'Ws0'
Ws0: ... ; 

// 'Number' will be used.
// since Number is the only token that holds data
A(i32): Ws0 Number Ws0;
  • bootstrapped new syntax
  • fix calculator example following new syntax
  • return Err if RuleType is defined but action is not defined
RustyLR - v0.12.1

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.11.2...v0.12.1

Add syntax to support regex pattern partially

  • support * (zero or more), + (one or more), ? (zero or one)

Example:

E : A plus? a2=A*
  {
    println!("Value of 1st A: {}", A.value); // i32
    println!("Slice of 1st A: {:?}", A.slice);
    println!("Value of 2nd As: {:?}", a2.value); // Vec<i32>
    println!("Slice of 2nd As: {:?}", a2.slice);

    if let plus.slice.len() == 0 {
        // plus is not captured
    }else {
        // plus is captured
    }
  }
  ;

For regex pattern, <RuleType> will be modified by following:

Pattern Non-Terminal<RuleType>=T Non-Terminal<RuleType>=(not defined) Terminal
'*' Vec<T> (not defined) (not defined)
'+' Vec<T> (not defined) (not defined)
'?' Option<T> (not defined) (not defined)

fix some HashMap to BTreeMap for consistent output

  • same output for same input.
  • HashMap was randomizing the id of state, tokens
RustyLR - v0.11.2

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.10.0...v0.11.2

  • manually expand macros of bootstrap to avoid deep-recursive dependency
  • add lr1_runtime!, lalr1_runtime! macro.
    - This call grammar.build() on runtime
    - The generated code will be about 1/10 in size ( rust-analyzer dies a lot due to TONS of code generated by former macro )
RustyLR - v0.10.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.9.0...v0.10.0

Bootstrapped proc-macro line parser.

cargo's version control is simply amazing

  • add %moduleprefix syntax for future bootstrapping
RustyLR - v0.9.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.8.0...v0.9.0

  • Add support for custom error type in reduce action. String will be used if not set.
%error <ErrType>;
...

A: token0 token1 token2 {
  return Err( <ErrType>::new() );
};
match parser.feed( ... ) {
    Ok(_) => {},
    Err(e) => {
        match e {
            rysty_lr::ParseError::ReduceAction( err_from_action ) => { ... }
            _ => {}
        }
    }
}
  • clean some doc comments
RustyLR - v0.8.0

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.7.2...v0.8.0

Fixed syntax for capturing token data into variable.

  • s0, s1, ... v0, v1, ... are removed
  • can access data by its name directly
  • can remap the variable name
M(i32) : M star m2=M { *M * *m2 }
      | P { *P }
      ;
  • fixed eof as reserved keyword
RustyLR - v0.7.2

Published by ehwan 3 months ago

Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.7.0...v0.7.2

  • add more error checking and emit compile_error
    • check if <RuleType> is enclosed with parenthesis '(' and ')'
    • check if <ReduceAction> is enclosed with braces '{' and '}'
  • removed unnecessary clone()
  • moved all trait bounds in impl to each functions
  • add documents about proc-macro syntax in README