This controller implements a context-free grammar parser with Earley's algorithm on top of a lexer which uses derivatives of regular expressions.
It's to be used by next-generation Guidance grammars. See how it works in plan.md.
This is now available in main
branch of Guidance.
Guidance PR: https://github.com/guidance-ai/guidance/pull/951
Grammars are normally JSON-serialized. The following libraries produce llguidance grammars:
./scripts/install-deps.sh
./scripts/test-guidance.sh
This builds the Python bindings for the library and runs the tests (which mostly live in the Guidance repo - it will clone it).
The Rust crate is called llguidance_parser
.
For usage see the README there.
to_regex_vec()
in lexerspec.rs - non-contextual keywords{when_sampled:[EOS],ff:[]}
as slice when EOS ends gen()See https://github.com/microsoft/llguidance/issues/2
g = select(["a", "abq", "c"]) + optional("bQ")
check_grammar(g, ["", "a‧b‧q‧≺EOS≻"]) # fails 'q' is forced
check_grammar(g, ["", "a‧b‧Q"]) # doesn't match at all
See https://github.com/microsoft/llguidance/issues/1
.forced_byte()
method in derivre
.forced_byte()
impl in llguidance
In toktrie walk, if we encounter a forced byte, we go into forced mode where we just chase all forced bytes. The first token we find on this path we put on some list. We do not add any of these tokens to the allow set.
Then, after token trie walk, for every token on this list we re-create the forced byte string, tokenize, chop excessive tokens, and add the first token from tokenization to allow set and remaining tokens (if any) as conditional splice.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.