GLR, LR(1) LALR(1) parser generator for Rust with custom reduce action
APACHE-2.0 License
Full Changelog: https://github.com/ehwan/RustyLR/compare/v3.0.0...v3.3.0
error
and tree
feature for debug purposeContext::can_feed()
for checking if given terminal can be feeded to current status without chaning context
Published by ehwan about 2 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.7.0...v3.0.0
removed Parser::feed()
, Parser::begin()
Context
can be created with Context::new()
feed()
moved to Context::feed()
removed lalr!
macro
%lalr
directive.cleaned README and cargo doc
changed LICENSE from MIT to MIT OR APACHE 2.0
Published by ehwan about 2 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.6.0...v2.7.0
In GLR parser, there can be both shift/reduce action possible at the same time, this leads to ambiguity of the grammar.
You can resolve the ambiguties through the reduce action.
Result::Err(Error)
from the reduce action will revoke current reducing path.shift: &mut bool
to false
will revoke the shift action with lookahead token.Consider the following example:
E : E plus E
| E star E
| digit
;
And you are trying to feed 1 + 2 * 3 + 4 eof
to the parser.
There are 5 ways to represent the input sequence:
((1 + 2) * 3) + 4
(1 + (2 * 3)) + 4
1 + ((2 * 3) + 4)
1 + (2 * (3 + 4))
(1 + 2) * (3 + 4)
However, we know the 2nd path is the only correct one,
since the star
has higher precedence than plus
, and both are left-associative.
To resolve the ambiguity, you can write the reduce action as follows:
E : E plus E {
match *lookahead {
'*' => {
// no reduce if the next token is '*'
// this prevent
// E + E / *
// ^ lookahead
// to be E * ...
// ^ (E + E)
return Err("".to_string());
}
_ => {
// revoke the shift action
// this prevent
// E + E / +
// ^ lookahead
// to be E + E + ...
// and enforce the reduced token takes place
// E + ...
// ^ (E + E)
*shift = false;
}
}
}
| E star E {
*shift = false;
}
| Number
;
Published by ehwan 2 months ago
Adding %glr;
directive will switch the parser generation to GLR parser.
With this directive, any Shift/Reduce, Reduce/Reduce conflicts will not be treated as errors.
You can resolve the ambiguties through the reduce action.
Simply, returning Result::Err(Error)
from the reduce action will revoke current path.
tree
feature[dependencies]
rusty_lr = { version = "2.6.0", features=["tree"] }
For debugging purpose, tree
feature enables the automatic construction of the syntax tree.
You can call context.to_tree_list()
to get the reduced syntax tree.
let parser = Parser::new();
let mut context = parser.begin();
/// feed tokens...
println!( "{:?}", context.to_tree_list() ); // print tree list with `Debug` trait
println!( "{}", context.to_tree_list() ); // print tree list with `Display` trait
TreeList
├─A
│ └─M
│ └─P
│ └─Number
│ ├─WS0
│ │ └─_space_Star1
│ │ └─_space_Plus0
│ │ ├─_space_Plus0
│ │ │ └─' '
│ │ └─' '
│ ├─_Digit_Plus3
│ │ └─Digit
│ │ └─_TerminalSet2
│ │ └─'1'
│ └─WS0
│ └─_space_Star1
│ └─_space_Plus0
│ └─' '
├─'+'
├─M
│ └─P
... continue
Published by ehwan 2 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.4.0...v2.5.0
Adding %glr;
directive will switch the parser generation to GLR parser.
With this directive, any Shift/Reduce, Reduce/Reduce conflicts will not be treated as errors.
You can resolve the ambiguties through the reduce action.
Simply, returning Result::Err(Error)
from the reduce action will revoke current path.
RuleType
and Term
must implement Clone
trait.lookahead
predefined variable in reduce actionYou can refer via lookahead
variable in the reduce action, which token caused the reduce action.
Published by ehwan 2 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v2.1.0...v2.4.0
P / term
P / [term1 term_start-term_last]
P / [^term1 term_start-term_last]
P
followed by one of given terminal set. Lookaheads are not consumed.E : A p=( P1 P2 P3 ) B { A + p.0 + p.1 + p.2 + B } ;
Captures subsequence P1 P2 P3
as single token.
P1
holds value, and others don't, then (P1 P2 P3)
will hold the value of P1
, and can be accessed via name P1
)Tuple
of the values. There is no default variable name for the group, you must define the variable name explicitly by =
operator.NoRuleType: ... ;
I(i32): ... ;
// I will be chosen
A: (NoRuleType I NoRuleType) {
println!( "Value of I: {:?}", I ); // can access by 'I'
I
};
// ( i32, i32 )
B: i2=( I NoRuleType I ) {
println!( "Value of I: {:?}", i2 ); // must explicitly define the variable name
};
Published by ehwan 2 months ago
build.rs
supportinclude!
the generated file.build
to use in the build script.[build-dependencies]
rusty_lr = { version = "...", features = ["build"] }
// build.rs
use rusty_lr::build;
fn main() {
println!("cargo::rerun-if-changed=src/parser.rs");
let output = format!("{}/parser.rs", std::env::var("OUT_DIR").unwrap());
build::Builder::new()
.file("src/parser.rs") // path to the input file
// .lalr() // to generate LALR(1) parser
.build(&output); // path to the output file
}
The program searches for %%
in the input file, not the lr1!
, lalr1!
macro. The contents before %%
will be copied into the output file as it is. And the context-free grammar must be followed by %%
.
If there is any errors when building a grammar, it will print error messages to stderr and then panic. This will make the messages to be shown during compilation.
Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.6.0...v2.0.0
feed_callback()
ParseError
variantsInvalidNonTerm
is deleted since it never happens. Moved it to unreachable!
CallbackError
is deletedRule
, State
, ParseError
in generated structs!
pattern to ignore <RuleType>
of token in production rulePublished by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.5.0...v1.6.0
rustylr
now prints pretty error messagePublished by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.3.0...v1.5.0
fxhash
[dependencies]
rusty_lr = { version = "1", features = ["fxhash"] }
This replace std::collections::HashMap
by FxHashMap
of DFA.
PartialOrd
, Ord
from terminal symbol type.This enable std::mem::discriminant
can be used for enum Token type
Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.2.0...v1.3.0
Sample error message:
error: Invalid Terminal: '*'
Expected one of: ' ', '(', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
-------------------------------Backtracing state--------------------------------
M -> M '*' • M
-----------------------------------Prev state-----------------------------------
M -> M • '*' M
-----------------------------------Prev state-----------------------------------
A -> • A '+' A
A -> A '+' • A
A -> • M
M -> • M '*' M
-----------------------------------Prev state-----------------------------------
A -> A • '+' A
-----------------------------------Prev state-----------------------------------
A -> • A '+' A
E -> • A
Augmented -> • E '\0'
Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.1.0...v1.2.0
%token lparen '(';
%token rparen ')';
%token zero '0';
...
%token nine '9';
A: [zero-nine]+; // zero to nine
B: [^lparen rparen]; // any token except lparen and rparen
C: [lparen rparen one-nine]*; // lparen and rparen, and one to nine
Note that in range pattern [first-last]
, the range is constructed by the order of the %token
directives, not by the actual value of the token. If you define tokens in the following order:
%token one '1';
%token two '2';
...
%token zero '0';
%token nine '9';
The range [zero-nine]
will be ['0', '9']
, not ['0'-'9']
.
Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v1.0.0...v1.1.0
<StartSymbol>NonTerminals
Augmented
, or for '+', '*', '?' pattern )Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.12.1...v1.0.0
%tokentype char;
%token plus ... ;
A(i32): A plus {
println!( "A is i32: {}", A );
println!( "plus is char: {}", plus ); // not a reference `&char`
}
;
Vec
as <RuleType>
manually.// <RuleType> not defined for 'Ws0'
Ws0: ... ;
// 'Number' will be used.
// since Number is the only token that holds data
A(i32): Ws0 Number Ws0;
Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.11.2...v0.12.1
*
(zero or more), +
(one or more), ?
(zero or one)Example:
E : A plus? a2=A*
{
println!("Value of 1st A: {}", A.value); // i32
println!("Slice of 1st A: {:?}", A.slice);
println!("Value of 2nd As: {:?}", a2.value); // Vec<i32>
println!("Slice of 2nd As: {:?}", a2.slice);
if let plus.slice.len() == 0 {
// plus is not captured
}else {
// plus is captured
}
}
;
For regex pattern, <RuleType>
will be modified by following:
Pattern | Non-Terminal<RuleType>=T
|
Non-Terminal<RuleType>=(not defined)
|
Terminal |
---|---|---|---|
'*' | Vec<T> |
(not defined) | (not defined) |
'+' | Vec<T> |
(not defined) | (not defined) |
'?' | Option<T> |
(not defined) | (not defined) |
HashMap
to BTreeMap
for consistent outputHashMap
was randomizing the id of state, tokensPublished by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.10.0...v0.11.2
lr1_runtime!
, lalr1_runtime!
macro.grammar.build()
on runtimePublished by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.9.0...v0.10.0
cargo's version control is simply amazing
%moduleprefix
syntax for future bootstrappingPublished by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.8.0...v0.9.0
String
will be used if not set.%error <ErrType>;
...
A: token0 token1 token2 {
return Err( <ErrType>::new() );
};
match parser.feed( ... ) {
Ok(_) => {},
Err(e) => {
match e {
rysty_lr::ParseError::ReduceAction( err_from_action ) => { ... }
_ => {}
}
}
}
Published by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.7.2...v0.8.0
s0
, s1
, ... v0
, v1
, ... are removedM(i32) : M star m2=M { *M * *m2 }
| P { *P }
;
eof
as reserved keywordPublished by ehwan 3 months ago
Full Changelog: https://github.com/ehwan/RustyLR/compare/v0.7.0...v0.7.2
emit compile_error
<RuleType>
is enclosed with parenthesis '(' and ')'<ReduceAction>
is enclosed with braces '{' and '}'clone()
impl
to each functions