tbsp

tree-based source-processing language

Stars
9

tbsp - tree-based source-processing language

tbsp is an awk-like language that operates on tree-sitter syntax trees. to motivate the need for such a program, we could begin by writing a markdown-to-html converter using tbsp and tree-sitter-md 0. we need some markdown to begin with:

# 1 heading

content of first paragraph

## 1.1 heading

content of nested paragraph

for future reference, this markdown is parsed like so by tree-sitter-md (visualization generated by tree-viz 1):

document
|  section
|  |  atx_heading
|  |  |  atx_h1_marker "#"
|  |  |  heading_content inline "1 heading"
|  |  paragraph
|  |  |  inline "content of first paragraph"
|  |  section
|  |  |  atx_heading
|  |  |  |  atx_h2_marker "##"
|  |  |  |  heading_content inline "1.1 heading"
|  |  |  paragraph
|  |  |  |  inline "content of nested paragraph"

onto the converter itself. every tbsp program is written as a collection of stanzas. typically, we start with a stanza like so:

BEGIN {
    int depth = 0;

    print("<html>\n");
    print("<body>\n");
}

the stanza begins with a "pattern", in this case, "BEGIN", and is followed a block of code. this block specifically, is executed right at the beginning, before traversing the parse tree. in this stanza, we set a "depth" variable to keep track of nesting of markdown headers, and begin our html document by printing the "" and "" tags.

we can follow this stanza with an "END" stanza, that is executed after the traversal:

END {
    print("</body>\n");
    print("</html>\n");
}

in this stanza, we close off the tags we opened at the start of the document. we can move onto the interesting bits of the conversion now:

enter section {
    depth += 1;
}
leave section {
    depth -= 1;
}

the above stanzas begin with "enter" and "leave" clauses, followed by the name of a tree-sitter node kind: "section". the "section" identifier is visible in the tree-visualization above, it encompasses a markdown-section, and is created for every markdown header. to understand how tbsp executes above stanzas:

document                                 ...  depth = 0 
|  section <-------- enter section (1)   ...  depth = 1 
|  |  atx_heading
|  |  |  inline
|  |  paragraph
|  |  |  inline
|  |  section <----- enter section (2)   ...  depth = 2 
|  |  |  atx_heading
|  |  |  | inline
|  |  |  paragraph
|  |  |  | inline
|  |  | <----------- leave section (2)   ...  depth = 1 
|  | <-------------- leave section (1)   ...  depth = 0 

the following stanzas should be self-explanatory now:

enter atx_heading {
    print("<h");
    print(depth);
    print(">");
}
leave atx_heading {
    print("</h");
    print(depth);
    print(">\n");
}

enter inline {
    print(text(node));
}

but an explanation is included nonetheless:

document                                 ...  depth = 0 
|  section <-------- enter section (1)   ...  depth = 1 
|  |  atx_heading <- enter atx_heading   ...  print "<h1>"
|  |  |  inline <--- enter inline        ...  print ..
|  |  | <----------- leave atx_heading   ...  print "</h1>"
|  |  paragraph
|  |  |  inline <--- enter inline        ...  print ..
|  |  section <----- enter section (2)   ...  depth = 2 
|  |  |  atx_heading enter atx_heading   ...  print "<h2>"
|  |  |  | inline <- enter inline        ...  print ..
|  |  |  | <-------- leave atx_heading   ...  print "</h2>"
|  |  |  paragraph
|  |  |  | inline <- enter inline        ...  print ..
|  |  | <----------- leave section (2)   ...  depth = 1 
|  | <-------------- leave section (1)   ...  depth = 0 

the examples directory contains a complete markdown-to-html converter, along with a few other motivating examples.


usage:

the tbsp evaluator is written in rust, use cargo to build and run:

cargo build --release
./target/release/tbsp --help

tbsp requires three inputs:

  • a tbsp program, referred to as "program file"
  • a language
  • an input file or some input text at stdin

you can run the interpreter like so (this program prints an overview of a rust file):

$ ./target/release/tbsp \
      -f./examples/code-overview/overview.tbsp \
      -l rust \
      src/main.rs
module
   └╴struct Cli
   └╴trait Cli
      └╴fn program
      └╴fn language
      └╴fn file
   └╴fn try_consume_stdin
   └╴fn main

roadmap:

  • interpreter performance
    • introduce a hir with arena allocated blocks, expr
    • bytecode VM?
    • look into embedding high perf VMs, lua etc.
  • pattern matching
    • allow matching on tree-sitter queries
    • support captures
  • language features
    • arrays and loops
    • access node children
    • access node fields
    • repr for ranges
    • comments
    • regexes