Light-weight easy-to-use PEG parser with some toppings!
[Try now!](TODO link to test page)
TODO
Welcome to Miniup, the pragmatic parsing library. Miniup is a pragmatic parsing library that is strongly inspired by context-free grammar parsing libraries such as Stratego/XT. Miniup offers grammars in a pragmatic and therefor easy-to-use way. Miniup is based on the PEG parsing formalism, but adds some nice features and language constructions on top of normal PEG grammars, which makes the tool excellent for rapid prototyping of new languages!
npm install -g miniup
-
to match nothing.a = a 'x' / 'y'
binop = 'and' > 'or' > expr
statements = (statement ; ';')+
number = (/\d+/)
numbers = @whitespace-on (@whitespace-off $('-'? [0-9]+))*
+ 1 -22 34
» ["1", "-22", "34"]
$
. (See previous example)Miniup can be run from the commandline if it is installed with npm
. (So install nodejs / npm first).
Install miniup by running npm install -g miniup
Now you can run miniup against a grammar file by using
miniup -s <grammar-file> -i <input file>
or by using
miniup -g <grammar> <input>
Example:
$ miniup -g 'numbers = @whitespace-on (FLOAT / INTEGER ; ",")*' '4, -5, 6.7'
[
"4",
"-5",
"6.7"
]
$ miniup -c -g 'g = greeting: ("hello" / "hi") (/\s+/) who: DOUBLEQUOTEDSTRING' 'hello "world"'
{
"greeting": "hello",
"who": "\"world\""
}
Check miniup --help
for a list of all command line options.
Embed miniup to your app. In a node environment, install and require miniup. Or in the browser include the miniup.js using a script tag or any other way. From there on, create load the grammar by using miniup.Grammar.load
or miniup.Grammar.loadFromFile
.
var miniup = require("miniup");
var grammar = miniup.Grammar.load("numbers = @whitespace-on (FLOAT / INTEGER ; \",\")*");
var numbers = grammar.parse("4, -5, 6.7", { cleanAST : true });
console.dir(numbers); //prints ["4", "-5", "6.7"]
Or, to put it in a one liner miniup can be invoked as function directly, with grammar
, input
and an optional parse options object. Note that we can even leave out the rule name if there is just one rule:
var numbers = require("miniup")("@whitespace-on (FLOAT / INTEGER ; \",\")*", "4, -
5, 6.7");
Thats all!
Java style multi-line comment
Java style single line comment
Defines a rule with rulename
. A rule with a certain name can be defined only once. expr
refers to any of the constructions defined below. The `"displayname" string is a human-friendly name used in error reporting and is optional.
Instead of usign the =
sign, using a left arrow <-
is allowed as well.
Searches for a rule named rulename
and tries to match it. Rules can be used before they are defined (syntactically speaking).
Example:
All examples in this section are in the format grammar
x input
» output
. The examples can be verified by running the command miniup -c -g "grammar" "input"
which should produce the mentioned output.
phone = number; number = [0-9]+;
x 45
» ["4", "5"]
Tries to match literal
literally in the input. Both single- and double quotes are allowed to define the literal. Normal java(script) escaping is allowed within the literal (e.g. 'quote\' and newline\n'
). Unicode, hexadecimal and octal escape sequences are allowed as well. The i
flag can be added after the closing quote to perform the match case-insensitive.
TODO: mention automatic word boundaries if auto whitespace.
Example:foo = "baR"i
x BAr
» "BAr"
Tries to match exactly one character from the given characterset. Ranges and negations can be used, similar to regular expressions. For example [^0-9 ]
matches everything but a digit or a space. The i
flag can be added after the closing bracket to perform the match case-insensitive. Within the characterset, the slash (\
) can be used as escape character.
Note that characterclasses can be combined with the $
operator to combine the individual matched characters.
Example: foo = [^bar]
x R
» "R"
Example: phone = number; number = [0-9]+;
x 45
» ["4", "5"]
Example: phone = number; number = $[0-9]+;
x 45
» "45"
Tries to match all the expressions or fails. Returns an object containing all submatches. If the extendedAST
option is enabled, each submatch is available under its (zero-based) index. Otherwise only labeled items are available (see label
)
Example: foo = 'bar' name:'baz'
x barbaz
» { name: "baz" }
Tries to match either expr1
, expr2
or expr-n
. The choice rule returns the first successful matches and does not attempt to match any subsequent choices. This is unlike the behavior of the |
operator in context free grammars.
Example: foo = 'a' / 'b' / 'b' / 'c'
x 'b'
» "b"
Groups the list of expressions. Behavior is identical to not using parentheses. But parentheses are very useful to make quantifiers or predicates match multiple expressions at the same time.
Example:
decl = @whitespace-on modifiers:(pub:'public'? stat:'static'?) name: IDENTIFIER '(' ')'
x static foo()
» { modifiers : { pub : null, stat: 'static' }, name: "foo" }
Matches expr
and stores it under label
in the resulting AST.
Items will not be available in the result AST unless they are either labeled or matched by a Regex or Characterclass (todo: verify:) or Literal matcher. If the extendedAST
parse option (--extended
) is used, all items are available in the resulting AST.
Example: abc = a:'a' 'b' c:'c'
x abc
» { a: "a", c: "c"}
Example (with extended AST enabled): abc = a:'a' 'b' c:'c'
x abc
» { a: "a", c: "c", 0: "a", 1: "b", 2: "c", length: 3 }
Optionally matches expr
. If expr
is not found, the parse is still considered to be successful.
Example: foo = bar:'bar'? baz:'baz'
x baz
» { bar: null, baz: 'baz'}
Matches expr
as many times as possible, but no matches are fine as well. The match will always be performed greedy. The matches will be returned as array.
Example: foo = 'a'*
x aaaa
» ['a', 'a', 'a', 'a']
In combination with the semicolon (;
) separators can be recognized. It is allowed to refer to other rules for separators.
Example: foo = ('a' ; ',')*
x a,a,a
» ['a','a','a']
Matches expr
as many times as possible, but at least once. The match will always be performed greedy. The matches will be returned as array. Use parentheses and semicolon to introduce separators.
Example: foo = 'a'+
x aaaa
» ['a', 'a', 'a', 'a']
Positive predicate. Tries to match expr
. If expr
is found, the match is considered successful, but the rule does not match anything. Can be used as 'lookahead'.
Example: foo = &'0' num:[0-9]+
x 017
» { num : [ '0', '1', '7' ] }
. But, this rule will fail on the input 117
.
Negative predicate. Tries to match expr
. If expr
is not found, the match is considered successful and parsing will continue. But fails if expr
is found.
Example: foo = 'idontlike ' !'coffee' what:/[a-z]*/
x idontlike tea
» { what: 'tea' }
. But, this rule will fail on idontlike coffee
. And rightfully so.
The .
matches any character
Matches expr
. But instead of returning the result, the original input string on which the match was made is returned.
Example: number = $('-'? [0-9]+)
x -44
» "44"
Example: number = sign:'-'? digits:[0-9]+
x -44
» { sign : null, digits:["4,4"]}
Matches the specified regular expression at the begin of the remainder of the input. Useful to express more powerful patterns than charactersets. The syntax is equal to native Javascript regular expressions and the same escaping rules apply. Only the i
flag is supported, g
and m
flags are not supported.
The surround parentheses are not strictly necessary, but are usually required to distinguish regular expressions from choices. For example a / b / c
matches just a
, b
or c
, while a (/ b /) c
only matches a b c
.
Example: float = (/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/)
x -34.3e523
» "-34.3e523"
Matches all items between the parentheses zero or more times. However, the last item of the sequence is used as separator to initiate the repitition.
Example: args = args:(expr ',')*?; expr = 'dummy'
x dummy,dummy,dummy
» { args: ["dummy", "dummy", "dummy"] }
Behaves the same as the *?
operator, but requires at least one match.
Example: args = args:(expr ',')*?; expr = 'dummy'
x dummy
» { args: ["dummy"] }
Matches any subset of the provided expressions, but none are required.
Example: modifiers = @whitespace-on (public:'public' static:'static' final: 'final')#
x final public
» {public:"public", static: null, final: "final"}
Enables or disables automatic whitespace parsing for this rule. Enabling automatic whitespace parsing avoids the need to explicitly match whitespace between tokens. This is very useful in many grammar. Automatic whitespace matching is by default turned off for compatibility with existing PEG grammars. Enabling whitespace enables it for the rest of this rule, and all rules called by it. After completing (or failing) the match, the whitespace status will be reset to its original value.
Note that using this construction requires a rule with the name 'whitespace' to be defined.
Example: numbers = @whitespace-on ($number)+; number = @whitespace-off '-'? [0-9] + ('.' [0-9]+)?; whitespace = WHITESPACECHARS
x 42 3.16 -12
» ["42", "3.16", "-12"]
TODO: @import "filename.peg".rule
Behaves similar to call
, but, with using @import
rules from other grammars can be imported and applied. This import statement applies the rule name
from the grammar with name grammar
. Note that the grammar has to be registered first.
Example:
var coffeeGrammar = miniup.Grammar.load("coffee = flavor : ('coffee' / 'cappucino')");
miniup.Grammar.register('CoffeeGrammar', coffeeGrammar);
var fooGrammar = miniup.Grammar.load("foo = @import CoffeeGrammar.coffee");
fooGrammar.parse("cappucino");
//returns: { flavor : "cappucino" }
TODO:
Some grammar
--
Some AST
Miniup can read input from files, standard input stream or arguments. To parse input from the input stream use the following command:
echo "+31 2344567" | miniup -rc -g "phone = '+' countrycode:[0-9]+ ' ' phonenumber:[0-9]+"
Which (TODO) outputs:
{
country : "31",
phonenumber: "2344567"
}
Or, if you provide invalid input:
echo "+31 23oops7" | miniup -rc -g "phone = '+' countrycode:[0-9]+ ' ' phonenumber:[0-9]+"
TODO: error output
All available command line options can be listed by using miniup --help
Build a grammar object from a grammar string:
var grammar = miniup.Grammar.load("phone = '+' countrycode:[0-9]+ ' ' phonenumber:[0-9]+");`
To parse input, use grammar.parse(input /*string*/ [, options /*options object*/])
var ast = grammar.parse("+31 2344567");
console.log("Country: " + ast.country + " Phone: " + ast.phone)
The following options can be provided to the parse function:
{
debug: true, /* default: false, prints the complete parse strategy*/
cleanAST: true, /* default: false, does not include additional match information such as $text, $rule and $pos*/
extendedAST: true, /* default: false, if true, unlabeled items are added to the resulting AST as well */
startSymbol: 'rulename', /* use alternative start symbol. Default: first rule that is defined in the grammar */
inputName: 'string' /* default: 'input', name of the input to use in error reporting, such as a filename */
allowLeftRecursion: bool; default: true. --no-leftrecursion to disable
}
The following tokens are available by default in every grammar and can be called by any rule.
These regexes are all available in the miniup.RegExpUtil
namespace.
Values for REGEX
, INTEGER
, FLOAT
, BOOLEAN
and strings will be automatically parsed to the corresponding javascript type.
IDENTIFIER /[a-zA-Z_][a-zA-Z_0-9]/ WHITESPACE /\s+/ REGEX //([^\/]|(\.))/i?/ SINGLEQUOTEDSTRING /'([^'\]|(\.))'/ DOUBLEQUOTEDSTRING /"([^"\]|(\.))"/ SINGLELINECOMMENT ///.(\n|$)/ MULTILINECOMMENT //*(?:[^]|*(?!/))?*// CHARACTERCLASS /[([^\/]|(\.))]/ INTEGER /(-|+)?\d+/ FLOAT /[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?/ BOOLEAN /(true|false)\b/ LINEENDCHAR /\r?\n|\u2028|\u2029/
* Interpreted as whitespace
** These regular expressions are in unescaped form. To use them in a Miniup grammar file, apply javascript style escaping: First prepend each forwardslash or backward slash with a backward slash, second, wrap the whole regular expression in forward slashes. Or, take a look at the miniup.txt file, which contains the miniup grammar described in miniup.