This repository contains Warblre, a Coq mechanization of ECMAScript regexes. ECMAScript is the specification followed by JavaScript implementations, and a mechanization of its regex semantics makes it possible to reason formally about these regexes from a proof assistant.
The mechanization has the following properties:
opam install . --deps-only
This will allow you to step through the Coq code, extract the OCaml code and compile it.nvm install 21.7.2
as well as some JavaScript dependencies:
npm install # Install packages used by our JavaScript code
npm install -g webpack-cli # Install webpack-cli, which is used to pack the code in monolithic JavaScript files
opam install coq-serapi
Alternatively, a nix flake installing all the dependencies is provided:
nix develop
npm install
dune exec example
will run an example of matching a string with a regex (source).dune exec fuzzer
will build and run the fuzzer to compare the extracted engine against Irregexp (Node.js's regex engine).dune build examples/coq_proof
will build everything so that you can step through examples/coq-proof/Example.v, which demonstrates how Warblre can be used to reason about JavaScript regexes. Alternatively, if you installed Alectryon, you can open the generated webpage in your web browser.The repository is structured as follows:
.
├── mechanization
│ ├── spec
│ │ └── base
│ ├── props
│ ├── tactics
│ └── utils
├── engines
│ ├── common
│ ├── ocaml
│ └── js
├── examples
│ ├── browser_playground
│ ├── cmd_playground
│ ├── coq_proof
│ └── ocaml_example
└── tests
├── tests
├── fuzzer
└── test262
common
; the other directories contain code specific to one particular language.The follow subsections further detail some of these.
The mechanization
directory contains the Coq code mechanizing the subset of the ECMAScript specification which describes regexes and their semantics.
It is based on the 14th edition from June 2023, available here.
Regexes are described in chapter 22.2.
The way regexes work can be described using the following pipeline:
A regex is first parsed; it is then checked for early errors, and rejected if any are found; it is then compiled into a matcher; it is finally called with a concrete input string and start position, and yield a match if one is found.
The mechanization covers the last three phases; parsing is not included.
The mechanization depends on external types and parameters (for instance for unicode character manipulation functions).
This is encoded with a functor, whose parameter is described in mechanization/spec/API.v
.
Files are organized as follows:
r*
by the empty regex when r
is a strictly nullable regex is a correct optimization.The engines
directory contains the code needed to turn the extracted code into two fully featured engines, one in OCaml and one in JavaScript.
For instance, this is where implementations for the abstract types and Unicode operations of the functor discussed above are provided.
Some of this code is common to the both engines, for instance a pretty-printer for regexes, and is stored in the common
subdirectory.
The ocaml
subdirectory contains code specific to the OCaml engine.
This includes functions to manipulate unicode characters, using the library uucp
.
The js
subdirectory contains code specific to the JavaScript engine.
This also includes functions to manipulate unicode characters, as well as some functions to work with array exotic objects (see ArrayExotic.ml
) or a parser for regexes, based on regexpp.
Our publication:
De Santo, Noé, Aurèle Barrière, and Clément Pit-Claudel. "A Coq Mechanization of JavaScript Regular Expression Semantics." [DOI; Preprint]
Additional documentation on this repository (doc
directory):
Differences.md
;Implementation.md
;Test262.md
;vFlag.md
.