PrattParserInC

An implementation of Pratt Parsing (Vaughn Pratt 1973 "Top Down Operator Precedence") in C/C++

Stars
8

README for PrattParserInC

These files illustrate an elegant parsing method, implemented in C/C++, first described by Vaughn Pratt in his 1973 paper, "Top Down Operator Precedence."

Pratt's description was in state machines and lisp, and so somewhat hard to interpret. Douglas Crockford's description in Javascript (see Beautiful Code, and http://javascript.crockford.com/tdop/tdop.html) were instrumental in bringing this method to my attention.

They are part of a larger project, and so may not compile standalone at the moment. These files are here to illustrate the technique, and give examples of my code.

The file 'newpassing.l3' gives many examples of the code that is parsed with this method.

The heart of the Pratt method is the expression() routine, that starts on line 606 of l3pratt.cpp. As in Crockford's paper, the algorithm is surprisingly short; it is subtle. I'll reproduce it here for easy comparison.

// Crockford's version in Javascript:

var expression = function (rbp) { var left; var t = token; advance(); left = t.nud(); while (rbp < token.lbp) { t = token; advance(); left = t.led(left); } return left;

// My C/C++ implemention, stripped of debugging code:

// working expression() parsing routine. Reproduces the original Pratt and Crockford formulation.

// // _left holds the accumulated parse tree at any point in time. // "The parse Tree Up to this point, by consuming the tokens to the left" would be a better-but-too-long name. // // and... _left obviously is the stuff to the left of the current operator, hence the name. // // data flows from token -> t -> (possibly on the stack of t recursive munch_left calls) -> into the _left tree. // // better names: _left -> _accumulated_parsed_qtree (to be returned) // t -> cnode (the current token's qtree node to be processed. // Once we grab this we always advance() past it // before processing it, so that _global_next_token // (or _token) contains the following token. // // _token -> _global_next_token_node (since it is a global variable and // can/will be incremented by many routines, in particular advance();) // // meaning of rbp parameter: if you hit a token with a _token->_lbp < rbp, then don't bother // calling munch_left, stop and return what you have. // // better explanation: rbp = a lower bound on descendant nodes // precedence level, so we can guarantee the precenence-hill-climbing // property (small precedence at the top) in the resulting parse tree. // qtree* qtreefactory::expression(int rbp) {

qtree* cnode = _token;

if (eof()) {
    return cnode;
}

advance(0);

if (cnode) {

    // munch_right() of atoms returns this/itself, in which case: _accumulated_parsed_qtree = t; is the result.
    _accumulated_parsed_qtree = cnode->munch_right(cnode);
}

while(!eof() && rbp < _token->_lbp ) {
    assert(_token);

    cnode = _token;

    advance(0);

    // if cnode->munch_left() returns this/itself, then the net effect is: _accumulated_parsed_qtree = cnode;
    _accumulated_parsed_qtree = cnode->munch_left(cnode, _accumulated_parsed_qtree);

}

return _accumulated_parsed_qtree;

}

/////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////

For reference, giving the theory, I attach a pdf of the Pratt original. (file: Vaughan.Pratt.TDOP.pdf)

  • Jason

Jason E. Aten [email protected]