rast-jit-vm

Simplistic complier~virtual-machine that transforms AST into a Rust function, with basic type checking

MIT License

Stars
9

rast-jit-vm

rast-jit-vm is a simplistic, proof-of-concept~ish compiler / virtual machine that transforms syntax tree into a Rust function, type-checking it on the way:

use rast_jit_vm::prelude::*;

fn main() {
    let mul2 = Program {
        input: Type::Int,
        output: Type::Int,
        body: Node::Mul {
            lhs: Box::new(Node::Var("input")),
            rhs: Box::new(Node::Const(Value::Int(2))),
        },
    };
    
    let mul2 = vm::compile::<_, i32>(mul2);
    
    println!("{}", mul2(15)); 
}

Instead of transforming AST into bytecode, rast-jit-vm uses a bit lesser known technique that's oriented around thunks - basically, instead of doing:

fn eval(node: Node) -> Value {
    match node {
        Node::Add { lhs, rhs } => lhs + rhs,
        /* ... */
    }
}

... what rast-jit-vm does is:

type Thunk = Box<dyn Fn() -> Value>;

fn compile(node: Node) -> (Type, Thunk) {
    match node {
        Node::Add { lhs, rhs } => {
            let (lhs_ty, lhs) = compile(lhs);
            let (rhs_ty, rhs) = compile(rhs);
 
            match (lhs_ty, rhs_ty) {
                (Type::Int, Type::Int) => {
                    let ty = Type::Int;
                    let thunk = Box::new(|| lhs() + rhs());
                    
                    (ty, thunk)
                }
                
                (lhs_ty, rhs_th) => {
                    panic!("unsupported op: {:?} + {:?}", lhs_ty, rhs_ty);
                }
            }
        },
        
        /* ... */
    }
}

This allows not only to type-check all of the code up-front, but also to perform otherwise impossible optimizations such as preallocating variables into stack-slots:

struct CompilationContext {
    stack: Vec<Type>,
    vars: BTreeMap<String, usize>, // var-name => stack-slot
}

struct RuntimeContext {
    stack: Vec<Value>,
}

type Thunk = Box<dyn Fn(&mut RuntimeContext) -> Value>;

fn compile(ctxt: &mut CompilationContext, node: Node) -> (Type, Thunk) {
    match node {
        /// `let name = value`
        Node::Let { name, value } => {
            let (ty, value) = compile(ctxt, value);
            let id = ctxt.stack.len();

            ctxt.stack.push(ty);
            ctxt.vars.insert(name, ty);

            let thunk = Box::new(move |ctxt: &mut RuntimeContext| {
                // Voil, no HashMap / BTreeMap needed at runtime!
                ctxt.stack[id] = value(ctxt);

                Value::Unit
            });
            
            (Type::Unit, thunk)
        },
        
        /// `name`
        Node::Var(name) => {
            let id = *vars.get(name).unwrap_or_else(|| {
                panic!("variable not defined: {}", name);
            });
            
            let thunk = Box::new(move |ctxt: &mut RuntimeContext| {
                // Voil, no HashMap / BTreeMap needed at runtime!
                ctxt.stack[id].clone()
            });
            
            (ty, thunk)
        },
        
        /* ... */
    }
}

Thanks to this approach, an interpreter / a virtual machine written in this way can be faster than a typical one, because the "compiled" program doesn't have to perform any var-name => var-value map-based lookups anymore - to read or write a variable, it simply accesses it by its index!

Running locally

$ git clone https://github.com/Patryk27/rast-jit-vm
$ cd rast-jit-vm

# Uses the compilation technique as described above:
$ cargo run --bin mandelbrot-compile

# Uses just a regular interpreter-based evaluation approach:
$ cargo run --bin mandelbrot-eval

# Note that printing Mandelbrot fractal at this scale is pretty fast, so if you
# want to see a difference in the times, compare:
$ cargo run --release --bin mandelbrot-compile-100k
$ cargo run --release --bin mandelbrot-eval-100k 

Thoughts

Performance

Using the Mandelbrot fractal as a benchmark, tuned to 100k iterations for extra synthetic-benchmarking-juice, I've been able to obtain the following times:

  • Rust: 1.5s
  • rast-jit-vm, compiled: 50s
  • Python: 115s
  • rast-jit-vm, evaluated: 130s

(technology: time to render the fractal; less is better)

cd benches && rustc -O mandelbrot.rs && ./mandelbrot cargo run --release --bin mandelbrot-compile-100k cd benches && python3 ./mandelbrot.py cargo run --release --bin mandelbrot-eval-100k

Missing features

  • pretty error handling (it's just panic!() all over the place),
  • both the compiler and the evaluator don't understand scoping - it's possible
    to declare a variable inside a while block and use it outside
    (to do something like while false { let x = true; } println!("{}", x);),
  • rast-jit-vm's AST is Turing-incomplete (rough proof: all programs have
    statically allocated stack and there's no runtime malloc/free-like facilities,
    making recursion impossible; though arguably that's somewhere in-between a
    missing feature and a design decision).

License

Copyright (c) 2022, Patryk Wychowaniec [email protected]. Licensed under the MIT license.