tangledown

Self-contained, one-step literate markdown.

Stars
14

Tangledown: One-Step Literate Markdown

Brian Beckman

Friday, 23 Sep 2022

v0.0.8

OVERVIEW

WRITING MATTERS

Leslie Lamport, Turing-Award Winner, 2013, said, approximately:

Writing is Nature's Way of showing you how sloppy your thinking is. Coding is Nature's Way of showing you how sloppy your writing is. Testing is Nature's Way of showing you how sloppy your coding is.

If you can't write, you can't think. If you're not writing, you only think you're thinking.

In here, we will show you how to combine thinking, writing, coding, and testing in a natural way. Your code will be the central character in a narrative, a story crafted to help your readers understand both what you're doing and how you're doing it. Your code will be tested because you (and your readers) can run it, right here and now, inside this Jupytext [sic] notebook. Your story and your code will never get out of sync because you will be working with both of them all the time.

NARRATIVE ORDER

Narrative order is the natural order for a story, but it's not the natural order for interpreters and compilers, even for Jupyter kernels. Tangledown lets you write in narrative order, then, later, tangle the code out into executable order, where the definitions of the parts precede the story. That executable order is backwards and inside-out from the reader's point of view! TangleUp lets you maintain your code by rebuilding your story in narrative order from sources, in executable order, that you may have changed on disk (TangleUp is abandoned. Turned out to be too difficult).

Without something like this, you're condemned to explaining how your code works before you can say much or anything about what your code is doing. Indulge us in a little theory of writing, will you?

CREATIVE WRITING 101

You're writing a murder mystery.

METHOD 1: Start with a data sheet: all the characters and their relationships. Francis stands to inherit, but Evelyn has a life-insurance policy on Victor. Bobbie is strong enough to swing an axe. Alice has poisonous plants in her garden. Charlie has a gun collection. Danielle is a chef and owns sharp knives. Lay out their schedules and whereabouts for several weeks. Finally, write down the murder, the solution, and all the loose ends your romantic detective might try.

METHOD 2: There's a murder. Your romantic detective asks "Who knew whom? Who benefitted? Who could have been at the scene of the crime? Who had opportunity? What was the murder weapon? Who could have done it?" Your detevtive pursues all the characters and their happenstances. In a final triumph of deductive logic, the detective identifies the killer despite compelling and ultimately distracting evidence to the contrary.

If your objective is to engage the audience, to motivate them to unravel the mystery and learn the twists and turns along the way, which is the better method? If your objective is to have them spend several hours wading through reference material, trying to guess where this is all going, which is the better method? If your objective is to organize your own thoughts prior to weaving the narrative, how do you start?

Now, you're writing about some software.

METHOD 1: Present all the functions and interfaces, cross dependencies, asynchronous versus synchronous, global and local state variables, possibilities for side effects. Finally, present unit tests and a main program.

METHOD 2: Explain the program's rationale and concept of operation, the solution it delivers, its modes and methods. Present the unit tests and main program that fulfill all that. Finally, present all the functions, interfaces, and procedures, all the bits and bobs that could affect and effect the solution.

If your objective is to engage your audience, to have them understand the software deeply, as if they wrote it themselves, which is the better method? If your objective is to have them spend unbounded time wading through reference material trying to guess what you mean to do, which is the better method?

SOFTWARE AS DOCUMENTATION

Phaedrus says:

I give good, long, descriptive names to functions and parameters to make my code readable. I use Doxygen and Sphinx to automate document production. I'm a professional!

And Socrates says:

That's nice, but you only document the pieces, and say nothing about how the pieces fit together. It's like giving me a jigsaw puzzle without the box top. It's almost sadistic.

You condemn me to reverse-engineering your software: to running it in a debugger or to tracing logs and printouts.

LITERATE PROGRAMMING

Literate Programming is the best known way to save your audience the work of reverse engineering your code, of giving them the box top with the jigsaw puzzle.

Who is your audience?

  • yourself, first, down the line, trying to remember "why the heck did I do that?!?"

  • other programmers, eventually, when they take over maintaining and extending your code

IMPERATIVES

First, write a paper about your code, explaining, at least to yourself, what you want to do and how you plan to do it. Flesh out your actual code inside the paper. RUN your code inside the paper, capturing printouts and charts and diagrams and what not, so others, including, first, your future self, can see the code at work. Iterate the process, rewriting prose and code as you invent and simplify, in a loop.

THAT'S JUST JUPYTER, RIGHT?

Ok, that's just ordinary Jupyter-notebook practice, right? Code inside your documentation, right? Doxygen and Sphinx inside-out, right?

Notebooks solve the inside-out problem, but ordinary programming is both inside out and upside-down from literate programming. Literate Programming solves the upside-down problem.

With ordinary notebook practice, you write everything in executable order, because a Jupyter notebook is just an interface to an execution kernel. Notebooks inherit the sequential constraints of the underlying intrepreter and compiler. Executable order usually forces us to define all details before using them. With Literate Programming, you write in narrative order, much more understandable to humans.

Executable order is usually the reverse of narrative order. Humans want to understrand the big picture first, then the details. They want to see the box-top of the jigsaw puzzle before looking at all the pieces. Executable order is upside-down to the human's point-of-view.

We've all had the experience of reading code and notebooks backwards so we don't get overwhelmed by details before understanding the big picture. That observation leads us to another imperative.

Write about your code in narrative order. Don't be tyrranized by your programming language into defining everything before you can talk about anything. Use tools to rearrange your code in executable order.

Donald Knuth invented Literate Programming so that he could both write about MetaFont and TeX and implement them in the same place. These are two of the most important computer programs ever written. Their longevity and quality testify to the viability of Literate Programming.

TANGLEDOWN IS HERE, INSIDE THIS README.md

Tangledown is the tool that rearranges code from any Markdown file into executable order. This here document, README.md, the one you're reading right now, is the Literate Program for Tangledown.

Because our documentation language is Markdown, the language of this document is Literate Markdown. This here README.md, which you're reading right now, contains all the source for the Literate-Markdown tool, tangledown.py, with all its documentation, all presented in narrative order, like a story.

tangledown.py tangles code out of any Markdown document, not just this here README.md that you're reading right now. The verb "tangle" is traditional in Literate Programming. You might think it should be "untangle," because a Literate Markdown document is all tangled up from executable order. But Knuth prefers the human's point of view. The Markdown document contains the code in the correct, narrative order. To address untangling or detangling, we have TangleUp.

You can also run Tangledown inside a Jupyter notebook, specifically one that is linked to this here document, README.md, the one you're reading right now. See this section for more.

We should mention that Tangledown is similar to org-babel in Emacs (or spacemacs for VIM users). Those are polished, best-of-breed Literate-Programming tools for the Down direction. You have to learn some Emacs to use them, and that's an barrier for many people. Markdown is good enough for Github, and thus for most of us right now.

TANGLEUP INTRO

Tangledown, as a distribution format, is a complete solution to Literate Programming. You get a single Markdown file and all the tested source for your project is included. Run Tangledown and the project is sprayed out on disk, ready for running, further testing, and deploying.

As a development format, it's not quite enough. With only Tangledown, when you modify the source tree in executable order, your narrative is instantly out of date. We can't have that. See Section TangleUp for more.

TANGLING UP EXISTING CODE

TangleUp can generate unique names, as GUIDs, say, for new source files and blocks. You should be able to TangleUp an existing source tree into a new, fresh, non-pre-existing Markdown file and, then round-trip TangleUp and TangleDown.

TANGLEDOWN DESIGN AND IMPLEMENTATION

Let's do Tangledown, first, and TangleUp later.

OH MY! JUPYTEXT

Jupytext [sic] automatically syncs a Markdown file with a Jupyter notebook. Read about it here. It works well in JupyterLab. Read about that here. Specifically, it lets you open this here Markdown file, README.md, that you're reading right now, as a Jupyter notebook, and you can evaluate some cells in the notebook.

Here's how I installed everything on an Apple Silicon (M1) Mac Book Pro, with Python 3.9:

pip install jupyter
pip install jupyterlab
pip install jupytext

Here is how I run it:

jupyter lab

or

PYTHONPATH=".:$HOME/Documents/GitHub/tangledow:$HOME/Library/Jupyter/kernels/tangledown_kernel" jupyter lab ~

when I want the Tangledown Kernel, and I almost always want the Tangledown kernel.

In JupyterLab

  • open README.md
  • View->Activate Command Palette
  • check Pair Notebook with Markdown
  • right-click README.md and say Open With -> Jupytext Notebook
  • edit one of the two, README.md or README.ipynb ...

Jupytext will update the other,

IMPORTANT: To see the updates in the notebook when you modify the Markdown, you must File->Reload Notebook from Disk, and to see updates in the Markdown when you modify the notebook, you must File->Reload Markdown File from Disk. Jupytext forces you to reload changed files manually. I'll apologize here, on behalf of the Jupytext team.

If you're reading or modifying README.ipynb, or if you Open With -> Jupytext Notebook on README.md (my preference), you may see tiny, unrendered Markdown cells above and below all your tagged nowebs and tangles. DON'T DELETE THE TINY CELLS. Renderers of Markdown simply ignore the tags, but Jupytext makes tiny, invisible cells out of them!

Unless you're running the optional, new Tangledown Kernel, don't RUN cells with embedded block tags in Jupyter, you'll just get syntax errors from Python.

LET ME TELL YOU A STORY

This here README.md, the one you're reading right now, should tell the story of Tangledown. We'll use Tangledown to create Tangledown. That's just like bootstrapping a compiler. We'll use Tangledown to tangle Tangledown itself out of this here document named README.md that you're reading right now.

The first part of the story is that I just started writing the story. The plan and outline was in my head (I didn't explicitly do Method 1). Then I filled in the code, moved everything around when I needed to, and kept rewriting until it all worked the way I wanted it to work. Actually, I'm still doing this now. Tangledown and TangleUp are living stories!

DISCLAIMER

This is a useful toy, but it has zero error handling. We currently talk only about the happy path. I try to be rude ("DUDE!") every place where I sense trouble, but I'm only sure I haven't been rude enough. Read this with a sense of humor. You're in on the story with me, and it's supposed to be fun!

I also didn't try it on Windows, but I did try it on WSL, the Windows Subsystem for Linux. Works great on WSL!

HOW TO RUN TANGLEDOWN

One way: run python3 tangledown.py REAMDE.md or just python tangledown.py at the command line. That command should overwrite tangledown.py. The code for tangledown.py is inside this here README.md that you're reading right now. The name of the file to overwrite, namely tangledown.py, is embedded inside this here README.md itself, in the file attribute of a <tangle ...> tag. Read about tangle tags below!

If you said python3 tangledown.py MY-FOO.md, then you would tangle code out of MY-FOO.md. You'll do that once you start writing your own code in Tangledown. You will love it! We have some big examples that we'll write about elsewhere. Those examples include embedded code and microcode for exotic hardware, all written in Python!

Tangledown is both a script and a module. You can run Tangledown in a Jupytext cell after importing some stuff from the module. The next cell illustrates the typical bootstrapping joke of tangling Tangledown itself out of this here README.md that you're reading right now, after this Markdown file has been linked to a Jupytext notebook.

from tangledown import get_lines, accumulate_lines, tangle_all
tangle_all(*accumulate_lines(*get_lines("README.md")))

After you have tangled at least once, as above, and if you switch the notebook kernel to the new, optional Tangledown Kernel, you can evaluate the source code for the whole program in the later cell i'm linking right here. How Cool is That?

You'll also need to re-tangle and restart the Tangledown Kernel when you add new nowebs to your files. Sorry about that. This is still just a toy.

Because Tangledown is a Python module, you can also run Tangledown from inside a standalone Python program, say in PyCharm or VS Code or whatever; hello_world_tangler.py in this repository is an example.

Once again, Jupytext lets you RUN code from a Markdown document in a JupyterLab notebook with just the ordinary Python3 kernel. If you open hello_world.md as a Jupytext notebook in JupyterLab then you can run Tangledown in Jupyter cells. Right-click on the name hello_world.md in the JupyterLab GUI and choose

Open With ... $\longrightarrow$ Jupytext Notebook

Then run cells! This is close to the high bar set by org-babel!

HOW IT WORKS: Markdown Ignores Mysterious Tags

How can we rearrange code cells in a notebook or Markdown file from human-understandable, narrative order to executable order?

Exploit the fact that most Markdown renderers, like Jupytext's, Github's, and PyCharm's, ignore HTML / XML tags (that is, stuff inside angle brackets) that they don't recognize. Let's enclose blocks of real, live code with noweb tags, like this:

<noweb name="my_little_tests">

class TestSomething ():
    def test_something (self):
        assert (3 == 2+1)

</noweb>

TAG CELLS CAN BE RAW OR MARKDOWN, NOT CODE

The markdown above renders as follows. You can see the noweb one-liner raw cells above and below the code in Jupytext. If they were Markdown cells, they'd be tiny and invisible. That's 100% OK, and may be more to your liking! Try changing the cells from RAW (press "R") to Markdown (press "M") and back, then closing them (Shift-Enter) and opening them (Enter). Don't mark the tag cells CODE (don't press "Y"). Tangledown won't work because Jupytext will surround them with triple-backticks.

class TestSomething ():
    def test_something (self):
        assert (42 == 6 * 7)

What are the <noweb ...> and </noweb> tags? We explain them immediately below.

THREE TAGS: noweb, block, and tangle

noweb tags

Markdown ignores <noweb ...> and </noweb> tags, but tangledown.py doesn't. tangledown.py sucks up the contents of the noweb tags and sticks them into a dictionary for later lookup when processing block tags.

CONTENTS OF A TAG

The contents of a noweb tag are between the opening <noweb ...> and closing </noweb> fenceposts. Markdown renders code contents with syntax coloring and indentation. That's why we want code cells to be CODE cells and not RAW cells.

The term contents is ordinary jargon from HTML, XML, SGML, etc., and applies to any opening <foo ...> and closing </foo> pair.

ATTRIBUTES OF A TAG

The Tangledown dictionary key for contents of a noweb tag is the string value of the name attribute. For example, in <noweb name="foo">, name is an attribute, and its string value is "foo".

Noweb names must be unique in a document. TangleUp ensures that when it writes a new Markdown file from existing source, or you may do it by hand.

NOTE: the name attribute of a noweb opener must be on the same line, like this:

<noweb name="foo">

Ditto for our other attributes, as in the following. Don't separate attributes with commas!

<noweb name="foo" language="python">

This single-line rule is a limitation of the regular expressions that detect noweb tags. Remeber, Tangledown is a toy, a useful toy, but it's limited.

FENCEPOST CELLS

You can create the fencepost cells, <noweb ...> and </noweb>, either in the plain-text Markdown file, or you can create them in the synchronized Jupytext notebook.

If you create fencepost cells in plain-text Markdown as opposed to the Jupytext notebook, leave a blank line after the opening <noweb ...> and a blank line before the closing </noweb>. If you don't, the Markdown renderer won't color and indent the contents. Tangledown will still work, but the Markdown renderer will format your code like text without syntax coloring and indentatino.

If you write fencepost cells in Markdown cells in the notebook or as blank-surrounded tags in the plain-text Markdown, the fenceposts appear as tiny, invisible Markdown cells because the renderer treats them as empty markdown cells. That's the fundamental operating principle of Tangledown: Markdown ignores tags it doesn't recognize! DON'T DELETE THE TINY, INVISIBLE CELLS, but you can open (Enter) and close (Shift-Enter) them.

If you create noweb and tangle tags in the notebook and you want them visible, mark them RAW by pressing "R" with the cell highlighted but not editable. Don't mark them CODE (don't press "Y"). Tangledown will break because Jupytext will surround them with triple-backticks.

block tags

Later, in the second pass, Tangledown blows the contents of noweb tags back out wherever it sees block tags with matching name attributes. That's how you can define code anywhere in your document and use it in any other place, later or earlier, more than once if you like.

block tags can and should appear in the contents of noweb tags and in the in the contets of tangle tags, too. That's how you structure your narrative!

Tangledown discards the contents of block tags. Only the name attribute of a block tag matters.

WRITE IN ANY-OLD-ORDER YOU LIKE

You don't have to write the noweb before you write a matching block tag. You can refer to a noweb tag before it exists in time and space, more than once if you like. You can define things and name things and use things in any order that makes your thinking and your story more clear. This is literature, after all.

tangle tags

A tangle tag sprays its block-expanded contents to a file on disk. What file? The file named in the file attribute of the tangle tag. Expanding contents of a tangle tag means replacing every block tag with the contents of its matching noweb tag, recursively, until everything bottoms out in valid Python.

The same rules about blank lines hold for tangle tags as they do for noweb tags: if you want Markdown to render the contents like code, surround the contents with blank lines or mark the tag cells RAW. The following Markdown

<tangle file="/dev/null">

<block name="my_little_tests"></block>

if __name__ == '__main__':
    TestSomething().test_something()

</tangle>

renders like this

import unittest

<block name="my_little_tests"></block>

if __name__ == '__main__':
    TestSomething().test_something()

See the tiny, invisible Markdown cells above and below the code? Play around with opening and closing them with Enter and Shift-Enter, respectively, and marking them RAW (Press "R") and Markdown ("M"). Don't mark them CODE ("Y").

You can evaluate the cell with the new, optional Tangledown Kernel. If you evaluate the code cell in the Python Kernel, you'll get a syntax error because the block tag is not valid Python. The syntax error is harmless to Tangledown.

This code tangles to the file /dev/null. That's a nifty trick for temporary tangle blocks. You can talk about them, validate them by executing their cells in the Tangledown Kernel, and throw them away.

TangleUp knows where Tangledown puts all the blocks and tangles. That's how, when you change code on disk, TangleUp can put it all back in the single file of Literate Markdown.

HUMAN! READ THE block TAGS!

Markdown renders block tags verbatim inside nowebs or tangles. This is good for humans, who will think

AHA!, this block refers to some code in a noweb tag somewhere else in this Markdown document. I can read all the details of that code later, when it will make more sense. I can look at the picture on the box before the pieces of the jigsaw puzzle.

Thank you, kindly, author! Without you, I'd be awash in details. I'd get tired and cranky before understanding the big picture!

See, I'll prove it to you. Below is the code for all of tangledown.py itself. You can understand this without understanding the implementations of the sub-pieces, just getting a rought idea of what they do from the names of the block tags. READ THE NAMES IN THE BLOCK TAGS. Later, if you want to, you can read all the details in the noweb tags named by the block tags.

TANGLE ALL

If you're running the new, optional Tangledown Kernel, you can evaluate this next cell and run Tangledown on Tangledown itself, right here in a Jupyter notebook. How Cool is That?

<block name="types"></block>
<block name="openers"></block>
<block name="tracer"></block>
<block name="oh-no-there-are-two-ways"></block>
<block name="accumulate-contents"></block>
<block name="accumulate-lines"></block>
<block name="thereIsABlockTag"></block>
<block name="eatBlockTag"></block>
<block name="expandBlocks"></block>
<block name="expand-tangles"></block>

def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None:
    for filename, liness in tangles.items ():
        Path(filename).parents[0].mkdir(parents=True, exist_ok=True)
        contents: str = expand_tangles(tracer, liness, nowebs)
        with open (filename, 'w') as outfile:
            print(f"WRITING FILE: {filename}")
            outfile.write (contents)
    tracer.dump()

if __name__ == "__main__":
    fn, lines = get_lines(get_aFile())
    # test_re_matching(lines)
    tracer, nowebs, tangles = accumulate_lines(fn, lines)
    tangle_all(tracer, nowebs, tangles)

The whole program is in the function tangle_all. We get hybrid vigor by testing __name__ against "__main__": tangledown.py is both a script and module.

All we do in tangle_all is loop over all the line lists in the tangles (for filename, liness in tangles.items()) and expand them to replace blocks with nowebs. Yes, "liness" has an extra "s". Remember Smeagol Pronounce it like "my preciouses, my lineses!"

The code will create the subdirectories needed. For example, if you tangle to file foo/bar/baz/qux.py, the code creates the directory chain ./foo/br/baz/ if it doesn't exist.

TYPES

Let us now explain the implementation. The first block in the tangle above is types. What is the noweb of types? It's here.

A Line is a string, Python base type str. Lines is the type of a list of lines. Liness is the type of a list of list of lines, in a pluralizing shorthand borrowed from Haskell practice. Pronounce liness the way Smeagol would do: "my preciouses, my lineses!"

A noweb name is a string, and a tangle file name is a string. A line number is an int, a Python base type.

Nowebs are dictionaries from noweb names to lines.

Tangles are dictionaries from file names to Liness --- lists of lists of lines. Tangledown accumulates output for tangle files mentioned more than once. If you tangle to qux.py in one place and then also tangle to qux.py somewhere else, the second tangle won't overwrite the first, but append to it. That's why tangles are lists of lists of lines, one list of lines for each mentioning of a tangle file. Read more about that in expand-tangles.

from typing import List, Dict, Tuple, Match

NowebName = str
FileName = str
TangleFileName = FileName
LineNumber = int
Line = str
Lines = List[Line]
Liness = List[Lines]
LinesTuple = Tuple[LineNumber, Lines]

Nowebs = Dict[NowebName, Lines]
Tangles = Dict[TangleFileName, Liness]

We'll implement all the noweb blocks, like accumulate_contents and eatBlockTag, later. You can read about them, or not, after you've gotten more of the big picture.

DEBUGGING AND REFACTORING

The Tangledown Kernel doesn't support the Jupytext debugger, yet. Sorry about that. Tangle the code out to disk and debug it with pudb or whatever, then tangle it back up into your Literate Markdown file via TangleUp.

Tangledown is still a toy. Ditto refactoring. PyCharm is great for that, but you'll have to do it on tangled files and detangle (paste) back into the Markdown.

EXPAND TANGLES

We separated out the inner loop over Liness [sic] into another function, expand_tangles, so that the Tangledown Kernel can import it and apply it to block tags. tangle_all calls expand_tangles; expand_tangles calls expand_blocks. Read about expand_blocks here.

from graphviz import Digraph
g = Digraph(graph_attr={'size': '8,5'}, node_attr={'fontname': 'courier'})
g.attr(rankdir='LR')
g.edge('tangle_all', 'expand_tangles')
g.edge('expand_tangles', 'expand_blocks')
g
def expand_tangles(tracer: Tracer, liness: Liness, nowebs: Nowebs) -> str:
    contents: Lines = []
    for lines in liness:
        while there_is_a_block_tag (lines):
            lines = expand_blocks (tracer, nowebs, lines)
        contents += lines
    return ''.join(contents)

Tangledown Tangles Itself?

Tangledown has two kinds of regular expressions (regexes) for matching tags in a Markdown file:

  • regexes for noweb and tangle tags that appear on lines by themselves, left-justified

  • regexes that match <block ...> tags that may be indented, and match their closing </block> tags, which may appear on the same line as <block ...> or on lines by themselves.

Both kinds of regex are safe: they do not match themselves. That means it's safe to run tangledown.py on this here READMD.md, which contains tangled source for tangledown.py.

The two regexes in noweb left_justified_regexes match noweb andtangle tags that appear on lines by themselves, left-justified.

They also wont match noweb and tangle tags that are indented. That lets us talk about noweb and tangle tags without processing them: just put the examples you're talking about in an indented Markdown code cell instead of in a triple-backticked Markdown code cell.

The names in the attributes of noweb and tangle tags must start with a letter, and they can contain letters, numbers, hyphens, underscores, whitespace, and dots.

The names of noweb tags must be globally unique within the Markdown file. Multiple tangle tags may refer to the same output file, in which cases, Tangledown appends the contents of the second and subsequent tangle tags to a list of list of lines, to a Liness.

LEFT-JUSTIIED REGEXES

There is a .* at the end to catch attributes beyon name. A bit of future-proofing.

noweb_start_re = re.compile (r'^<noweb name="(\w[\w\s\-.]*)".*>$')
noweb_end_re = re.compile (r'^</noweb>$')

tangle_start_re = re.compile (r'^<tangle file="(.+/\\[^/]+|.+)".*>$')
tangle_end_re = re.compile (r'^</tangle>$')

ANYWHERE REGEXES

The regexes in this noweb, anywhere_regexes, match block tags that may be indented, preserving indentation. The block_end_re regex also preserves indentation. Indentation is critical for Python, Haskell, and other languages.

I converted the 'o' in 'block' to a harmless regex group [o] so that block_end_re won't match itself. That makes it safe to run this code on this here document itself.

block_start_re = re.compile (r'^(\s*)<block name="(\w[\w\s\-.]*)">')
block_end_re = re.compile (r'^(\s)*</bl[o]ck>')

Test the Regular Expressions

OPENERS

The code in noweb openers has two block tags that refer to the nowebs of the regexes defined above, namely left_justified_regexes and anywhere_regexes. After Tangledown substitutes the contents of the nowebs for the blocks, the code becomes valid Python and you can call test_re_matching in the Tangledown Kernel or at the command line. When you call it, it proves that we can recognize all the various kinds of tags. We leave the regexes themselves as global pseudo-constants so that they're both easy to test and to use in the body of the code (Demeter weeps because of globals).

The code in hello_world.ipynb (after you have Paired a Notebook with the Markdown File hello_world.md) runs this test as its last act to check that tangledown.py was correctly tangled from this here README.md. That code works in the ordinary Python kernel and in the Tangledown Kernel.

Notice the special treatment for block ends, which are usually on the same lines as their block opener tags, but not necessarily so. That lets you put (useless) contents in block tags.

import re
import sys
from pathlib import Path

<block name="getting a file and its lines"></block>
<block name="left_justified_regexes"></block>
<block name="anywhere_regexes"></block>

def test_re_matching(fp: Path, lines: Lines) -> None:
    for line in lines:
        noweb_start_match = noweb_start_re.match (line)
        tangle_start_match = tangle_start_re.match (line)
        block_start_match = block_start_re.match (line)

        noweb_end_match = noweb_end_re.match (line)
        tangle_end_match = tangle_end_re.match (line)
        block_end_match = block_end_re.match (line)

        if (noweb_start_match):
            print ('NOWEB: ', noweb_start_match.group (0))
            print ('name of the block: ', noweb_start_match.group (1))
        elif (noweb_end_match):
            print ('NOWEB END: ', noweb_end_match.group (0))
        elif (tangle_start_match):
            print ('TANGLE: ', tangle_start_match.group (0))
            print ('name of the file: ', tangle_start_match.group (1))
        elif (tangle_end_match):
            print ('TANGLE END: ', tangle_end_match.group (0))
        elif (block_start_match):
            print ('BLOCK: ', block_start_match.group (0))
            print ('name of the block: ', block_start_match.group (1))
            if (block_end_match):
                print ('BLOCK END SAME LINE: ', block_end_match.group (0))
            else:
                print ('BLOCK NO END')
        elif (block_end_match):
            print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0))
        else:
            pass

TANGLEDOWN: Two Passes

Tangledown passes once over the file to collect contents of noweb and tangle tags, and again over the tangle tags to expand block tags. In the second pass, Tangledown substitutes noweb contents for corresponding block tags until there are no more block tags, creating valid Python.

First Pass: Saving Noweb and Tangle Blocks

In the first pass over the file, we'll just save the contents of noweb and tangle into dictionaries, without expanding nested block tags.

GET A FILE NAME

tangledown.py is both a script and a module. As a script, you run it from the command line, so it gets its input file name from command-line arguments. As a module, called from another Python program, you probably want to give the file as an argument to a function, specifically, to get_lines.

Let's write two functions,

  • get_aFile, which parses command-line arguments and produces a file name; the default file name is README.md

  • get_lines, which

    • gets lines, without processing noweb, tangle, or block tags, from its argument, aFilename

    • replaces #raw and #endraw fenceposts with blank lines

    • writes out the full file path to a secret place where the Tangledown Kernel can pick it up

get_aFile can parse command-line arguments that come from either python on the command line or from a Jupitext notebook, which has a few kinds of command-line arguments we must ignore, namely command-line arguments that end in .py or in .json.

GET LINES

This method for getting a file name from the argument list will eat all options. It works for the Tangledown Kernel and for tangling down from a script or a notebook, but it's not future-proofed. Tangledown is still a toy.

print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv})
def get_aFile() -> str:
    """Get a file name from the command-line arguments
    or 'README.md' as a default."""
    <block name="print-sys-argv"></block>
    aFile = 'README.md'  # default
    if len(sys.argv) > 1:
        file_names = [p for p in sys.argv
                        if (p[0] != '-')  # option
                           and (p[-3:] != '.py')
                           and (p[-5:] != '.json')]
        if file_names:
            aFile = sys.argv[1]
    return aFile

raw_line_re: re = re.compile(r'<!-- #(end)?raw -->')
def get_lines(fn: FileName) -> Lines:
    """Get lines from a file named fn. Replace
    'raw' fenceposts with blank lines. Write full path to
    a secret place for the Tangledown kernel to pick it up.
    Return tuple of file path (for TangleUp's Tracer) and
    lines."""
    <block name="save-afile-path-for-kernel"></block>
    xpath = save_aFile_path_for_kernel(fn)
    with open(fn) as f:
        in_lines: Lines = f.readlines ()
        out_lines: Lines = []
        for in_line in in_lines:
            out_lines.append(
                in_line if not raw_line_re.match(in_line) else "\n")
        return xpath, out_lines

NORMALIZE FILE PATH

We must normalize file names so that, for example, "foo.txt" and "./foo.txt" indicate the same file and so that ~/ denotes the home directory on Mac and Linux. I didn't test this on Windows.

def anchor_is_tilde(path_str: str) -> bool:
    result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '')
    return result

def normalize_file_path(tangle_file_attribute: str) -> Path:
    result: Path = Path(tangle_file_attribute)
    if (anchor_is_tilde(tangle_file_attribute)):
        result = (Path.home() / tangle_file_attribute[2:])
    return result.absolute()

SAVE A FILE PATH FOR THE KERNEL

Returns its input file name after expanding its full path and saving the full path in a special place where the Tangledown Kernel can find it.

def save_aFile_path_for_kernel(fn: FileName) -> FileName:
    xpath: Path = Path.cwd() / Path(fn).name
    victim_file_name = str(xpath.absolute())
    safepath: Path = Path.home() / '.tangledown/current_victim_file.txt'
    Path(safepath).parents[0].mkdir(parents=True, exist_ok=True)
    print(f"SAVING {victim_file_name} in secret place {str(safepath)}")
    with open(safepath, "w") as t:
        t.write(victim_file_name)
    return xpath

OH NO! THERE ARE TWO WAYS

Turns out there are two ways to write code blocks in Markdown:

  1. indented by four spaces, useful for quoted Markdown and quoted triple-backtick blocks

  2. surrounded by triple backticks and not indented.

Tangledown must handle both ways.

We use the trick of a harmless regex group --- regex stuff inside square brackets --- around one of the backticks in the regex that recognizes triple backticks. This regex is safe to run on itself. See triple_backtick_re in the code immediately below.

The function first_non_blank_line_is_triple_backtick, in noweb oh-no-there-are-two-ways recognizes code blocks bracketed by triple backticks. The contents of this noweb tag is triple-bacticked, itself. Kind of a funny self-toast joke, no?

Remember the use-mention dichotomy from Philosophy class? No problem if you don't.

When we're talking about noweb and tangle tags, but don't want to process them, we indent the tags and the code blocks. Tangledown won't process indented noweb and tangle tags because the regexes in noweb left_justified_regexes won't match them.

We can also talk about triple-backticked blocks by indenting them. Tangledown won't mess with indented triple-backticked blocks, because the regex needs them left-justified. Markdown also wont get confused, so we can quote whole markdown files by indenting them. Yes, your Literate Markdown can also, recursively, tangle out more Markdown files. How cool is that? Will the recursive jokes never end?

TangleUp has a heuristic for placing language and id information on triple-backtick fence openers. Our function will retrieve those if present.

We see, below, why the code tracks line numbers. We might do all this in some super-bitchin', sophomoric list comprehension, but this is more obvious-at-a-glance. That's a good thing.

FIRST NON-BLANK LINE IS TRIPLE BACKTICK

Match lines with left-justified triple-backtick. Pass through lines with indented triple-backtick.

We must trace raw fenceposts, but not copy them to

triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)')
blank_line_re      = re.compile (r'^\s*$')

def first_non_blank_line_is_triple_backtick (
        i: LineNumber, lines: Lines) -> Match[Line]:
    while (blank_line_re.match (lines[i])):
        i = i + 1
    yes = triple_backtick_re.match (lines[i])
    language = "python"  # default
    id_ = None           # default
    if yes:
        language = yes.groups()[1] or language
        id_ = yes.groups()[3]  ## can be 'None'
    return i, yes, language, id_

ACCUMULATE CONTENTS

Tangledown is a funny little compiler. It converts Literate Markdown to Python or other languages (Tangledown supports Clojure and Markdown, too). We could go nuts and write it in highfalutin' style, and then it would be much bigger, more elaborate, and easier to explain to a Haskell programmer. It might also be less of a toy. However, we want this toy Tangledown for now to be:

  • very short

  • independent of rich libraries like beautiful soup and parser combinators

  • completely obvious to anyone

We'll just use iteration and array indices, but in a tasteful way so our functional friends won't puke. This is Python, after all, not Haskell! We can just get it done, with grace, panache, and aplomb.

The function accumulate_contents accumulates the contents of left-justified noweb or tangle tags. The function starts at line i of the input, then figures out whether a tag's first non-blank line is triple backtick, in which case it won't snip four spaces from the beginning of every line, and finally keeps going until it sees the closing fencepost, </noweb> or </tangle>. It returns a tuple of the line index after the closing fencepost, and the contents, possibly de-dented. The function manipulates line numbers to skip over triple backticks.

def accumulate_contents (
        lines: Lines, i: LineNumber, end_re: re) -> LinesTuple:
    r"""Harvest contents of a noweb or tangle tag. The start
    taglet was consumed by caller. Consume the end taglet."""
    i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines)
    snip = 0 if yes else 4
    contents_lines: Lines = []
    for j in range (i, len(lines)):
        if (end_re.match(lines[j])):
            return j + 1, language, id_, contents_lines  # the only return
        if not triple_backtick_re.match (lines[j]):
            contents_lines.append (lines[j][snip:])

NEW ACCUMULATE LINES

The old accumulate_lines has reached the end of its life. It ignores raw cells, except for some hacks for raw noweb and tangle tags. The new_accumulate_lines must parse several kinds of line-sequences explicitly. Let's be careful not to call line-sequences blocks so we don't confuse line-sequences with block tags.

  1. Regular

ACCUMULATE LINES

The function accumulate_lines calls accumulate_contents to suck up the contents of all the left-justified noweb tags and tangle tags out of a file, but doesn't expand any block tags that it finds. It just builds up dictionaries, noweb_blocks and tangle_files, keyed by name or file attributes it finds inside noweb or tangle tags.

<block name="normalize-file-path"></block>
raw_start_re = re.compile("<!-- #raw -->")
raw_end_re = re.compile("<!-- #endraw -->")
from pprint import pprint
def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]:
    tracer = Tracer()
    tracer.fp = fp
    nowebs: Nowebs = {}
    tangles: Tangles = {}
    i = 0
    while i < len(lines):
        noweb_start_match = noweb_start_re.match (lines[i])
        tangle_start_match = tangle_start_re.match (lines[i])
        if noweb_start_match:
            <block name="acclines_handle_noweb"></block>
        elif tangle_start_match:
            <block name="acclines_handle_tangle"></block>
        elif raw_start_re.match (lines[i]):
            <block name="acclines_handle_raw"></block>
        else:
            <block name="acclines_handle_markdown"></block>
    if in_between:  # Close out final markdown.
        tracer._end_betweens(i)
    return tracer, nowebs, tangles

ACCUMULATE LINES: HANDLE RAW

pass

ACCUMULATE LINES: HANDLE MARKDOWN

in_between = True
tracer.add_markdown(i, lines[i])
i += 1

ACCUMULATE LINES: HANDLE NOWEB

in_between = False
key: NowebName = noweb_start_match.group(1)
(i, language, id_, nowebs[key]) = \
    accumulate_contents(lines, i + 1, noweb_end_re)
tracer.add_noweb(i, language, id_, key, nowebs[key])

ACCUMULATE LINES: HANDLE TANGLE

in_between = False
key: TangleFileName = \
    str(normalize_file_path(tangle_start_match.group(1)))
if not (key in tangles):
    tangles[key]: Liness = []
(i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re)
tangles[key] += [things]
tracer.add_tangle(i, language, id_, key, tangles[key])

DUDE!

There is a lot that can go wrong. We can have all kinds of mal-formed contents:

  • too many or not enough triple-backtick lines
  • indentation errors
  • broken tags
  • mismatched fenceposts
  • dangling tags
  • misspelled names
  • syntax errors
  • infinite loops (cycles, hangs)
  • much, much more

We'll get to error handling someday, maybe. Tangledown is just a little toy at the moment, but I thought it interesting to write about. If it's ever distributed to hostile users, then we will handle all the bad cases. But not now. Let's get the happy case right.

Second Pass: Expanding Blocks

Iterate over all the noweb or tangle tag contents and expand the block tags we find in there, recursively. That means keep going until there are no more block tags, because nowebss are allowed (encouraged!) to refer to other nowebs via block tags. If there are cycles, this will hang.

DUDE! HANG?

We're doing the happy cases first, and will get to cycle detection someday, maybe.

THERE IS A BLOCK TAG

First, we need to detect that some list of lines contains a block tag, left-justified or not. That means we must keep running the expander on that list.

def there_is_a_block_tag (lines: Lines) -> bool:
    for line in lines:
        block_start_match = block_start_re.match (line)
        if (block_start_match):
            return True
    return False

EAT A BLOCK TAG

If there is a block tag, we must eat the tag and its meaningless contents:

def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber:
    for j in range (i, len(lines)):
        end_match = block_end_re.match (lines[j])
        # DUDE! Check leading whitespace against block_start_re
        if (end_match):
            return j + 1
        else:  # DUDE!
            pass

EXPAND BLOCKS

The following function does one round of block expansion. The caller must test whether any block tags remain, and keep running the expander until there are no more block tags. Our functional fu grandmaster might be appalled, but sometimes it's just easier to iterate than to recurse.

def expand_blocks (tracer: Tracer, nowebs: Nowebs, lines: Lines,
                   language: str = "python") -> Lines:
    out_lines = []
    block_key: NowebName = ""
    for i in range (len (lines)):
        block_start_match = block_start_re.match (lines[i])
        if (block_start_match):
            leading_whitespace: str = block_start_match.group (1)
            block_key: NowebName = block_start_match.group (2)
            block_lines: Lines = nowebs [block_key]  # DUDE!
            i: LineNumber = eat_block_tag (i, lines)
            for block_line in block_lines:
                out_lines.append (leading_whitespace + block_line)
        else:
            out_lines.append (lines[i])
    return out_lines

TRACER

For TangleUp, we'll need to trace the entire operation of Tangledown, first and second passes. TangleUp reverses Tangledown, so we will want a best-effort reconstruction of the original Markdown file.

Our first approach will be a sequential list of dictionaries with all the needed information.

from dataclasses import dataclass, field
from typing import Union  ## TODO
@dataclass
class Tracer:
    trace: List[Dict] = field(default_factory=list)
    line_no = 0
    current_betweens: Lines = field(default_factory=list)
    fp: Path = None
    # First Pass
    <block name="tracer.add_markdown"></block>
    <block name="tracer.add_raw"></block>
    <block name="tracer._end_betweens"></block>
    <block name="tracer.add_noweb"></block>
    <block name="tracer.add_tangle"></block>
    <block name="tracer.dump"></block>
    # Second Pass
    <block name="tracer.add_expanded_noweb"></block>
    <block name="tracer.add_expanded_tangle"><block>

TRACER.ADD_RAW

def add_raw(self, i, between: Line):
    self.line_no += 1
    self.current_betweens.append((self.line_no, between))

TRACER.ADD_MARKDOWN

def add_markdown(self, i, between: Line):
    self.line_no += 1
    self.current_betweens.append((self.line_no, between))

TRACER._END_BETWEENS

def _end_betweens(self, i):
    if self.current_betweens:
        self.trace.append({"ending_line_number": self.line_no, "i": i,
                           "language": "markdown", "kind": 'between',
                           "text": self.current_betweens})
    self.current_betweens = []

TRACER.ADD_NOWEB

def add_noweb(self, i, language, id_, key, noweb_lines):
    self._end_betweens(i)
    self.line_no = i
    self.trace.append({"ending_line_number": self.line_no, "i": i,
                       "language": language, "id_": id_,
                       "kind": 'noweb', key: noweb_lines})

TRACER.ADD_TANGLE

def add_tangle(self, i, language, id_, key, tangle_liness):
    self._end_betweens(i)
    self.line_no = i
    self.trace.append({"ending_line_number": self.line_no, "i": i,
                       "language": language, "id_": id_,
                       "kind": 'tangle', key: tangle_liness})

TRACER.ADD_EXPANDED_NOWEB

def add_expandedn_noweb(self, i, language, id_, key, noweb_lines):
    self._end_betweens(i)
    self.line_no = i
    self.trace.append({"ending_line_number": self.line_no, "i": i,
                       "language": language, "id_": id_,
                       "kind": 'expanded_noweb', key: noweb_lines})

TRACER.ADD_EXPANDED_TANGLE

def add_expanded_tangle(self, i, language, id_, key, tangle_liness):
    self._end_betweens(i)
    self.line_no = i
    self.trace.append({"ending_line_number": self.line_no, "i": i,
                       "language": language, "id_": id_,
                       "kind": 'expanded_tangle', key: tangle_liness})

TRACER.DUMP

def dump(self):
    pr = self.fp.parent
    fn = self.fp.name
    fn2 = fn.translate(str.maketrans('.', '_'))
    # Store the trace in the dir where the input md file is:
    vr = f'tangledown_trace_{fn2}'
    np = pr / (vr + ".py")
    with open(np, "w") as fs:
        print(f'sequential_structure = (', file=fs)
        pprint(self.trace, stream=fs)
        print(')', file=fs)

TANGLE IT, ALREADY!

Ok, you saw at the top that the code in this here Markdown document, README.md, when run as a script, will read in all the lines in ... this here Markdown document, README.md. Bootstrapping!

But you have to run something first. For that, I tangled the code manually just once and provide tangledown.py in the repository. The chicken definitely comes before the egg.

But if you have the chicken (tangledown.py), you can import it as a module and execute the following cell, a copy of the one at the top. That should overwrite tangledown.py with the contents of this notebook or Markdown file. So our little bootstrapping technique will forever update the Tangledown compiler if you change it in this here README.md that you're reading right now!

from tangledown import get_lines, accumulate_lines, tangle_all
tangle_all(*accumulate_lines(*get_lines("README.md")))

TODO

  • IN-PROGRESS: more examples, specifically, a test-generator in Clojure in subdirectory examples/asr.
  • IN-PROGRESS: TangleUp
  • NOT-STARTED: Have the Tangledown Kernel, when evaluating tangle-able cells, write them out one at a time. Without this feature, the only way to write out files is to tangle the entire notebook. Possibly do these as cell magics.
  • NOT-STARTED: Research cell magics for noweb and tangle cells.
  • NOT-STARTED: error handling (big job)
  • NOT-STARTED: type annotations for the kernel
  • DONE: convert relative file paths to absolute
  • DONE: modern Pythonic Type Annotation (PEP 484)
  • DONE: use pathlib to compare tangle file names
  • DONE: somehow get the Tangledown Kernel to tangle everything automatically when it's restarted
  • DONE: Support multiple instances of the Tangledown Kernel. Because it reads files with fixed names in the home directory, it has no way of processing multiple Tangledown notebooks.
  • DONE: find out whether pickle is a better alternative to json for dumping dictionaries for the kernel
  • DONE: Jupytext kernel for tangledown so we can run noweb and block tags that have block tags in them.

DUDE!

Some people write "TODO" in their code so they can find all the spots where they thought they might have trouble but didn't have time to write the error-handling (prophylactic) code at the time. I like to write "DUDE" because it sounds like both "TODO" but is more RUDE (also sounds like "DUDE") and funny. This story is supposed to be amusing.

KNOWN BUGS

I must apologize once again, but this is just a toy at this point! Recall the DISCLAIMER. The following are stackranked from highest to lowest priority.

  1. FIXED: writing to "tangledown.py" and to "./tangledown.py" clobbers the file rather than appending. Use pathlib to compare filenames rather than string comparison.
  2. FIXED: tangling to files in the home directory via ~ does not work. We know one dirty way to fix it, but proper practice with pathlib is a better answer.

TANGLEUP DESIGN AND IMPLEMENTATION

TANGLEUP TENETS

  1. Keep source tree and Literate Markdown consistent.

NON-REAL-TIME

We'll start with a non-real-time solution. You'll manually run tangleup to put modified source back into the Markdown. Later, we'll do something that can track changes on disk and update the Markdown in real time.

When you modify your source tree, tangleup puts the modified code back into the Markdown file with reminders to detangle and to write. There are two cases:

  1. You modified some source that corresponds to an existing noweb block in the Markdown.

  2. You added some source that doesn't yet correspond to a noweb block in the Markdown.

To assist TangleUp, Tangledown records unique names for existing noweb blocks along with the tangled source. Tangledown also records robust locations for existing blocks. Robust means that the boundary locations are flexible: starting and ending line and character positions in a source file are not enough because changing an early one invalidates all later ones.

NO PRE-EXISTING MARKDOWN

We don't need the trace file for this case.

Enumerate all the files in a directory tree. Pair each file name with a short, unique name for the nowebs. TODO: ignore files and directories listed in the .gitignore.

%pip install gitignore-parser

TANGLEUP FILES LIST

<block name="tangleup imports"></block>
def files_list(dir_name: str) -> List[str]:
    dir_path = Path(dir_name)
    files_result = []
    nyms_result = []
    file_count = 0
    <block name="unique-names"></block>
    <block name="ignore files in .gitignore"></block>
    <block name="recurse a dir"></block>
    find_first_gitignore()
    recurse_a_dir(dir_path)
    assert file_count == len(nyms_collision_check)
    return list(zip(files_result, nyms_result))

RECURSE A DIR

The only complexity, here, is ignoring .git and files in .gitignore

def recurse_a_dir(dir_path: Path) -> None:
    for p in dir_path.glob('*'):
        q = p.absolute()
        qs = str(q)
        try:  # don't skip files in dirs above .gitignore
            ok = not in_gitignore(qs)
        except ValueError as e: # one absolute and one relative?
            ok = True
        if p.name == '.git':
            ok = False
        if not ok:
            pprint(f'... IGNORING file or dir {p}')
        if ok and q.is_file():
            nonlocal file_count  # Assignment requires 'nonlocal'
            file_count += 1
            nyms_result.append(gsnym(q))  # 'nonlocal' not required
            files_result.append(qs)       # because not ass'gt but mutation
        elif ok and p.is_dir:
            recurse_a_dir(p)

UNIQUE NAMES

Correct for collisions, which will be really rare, so there is a negligible effect on speed.

nyms_collision_check = set()

def gsnym(p: Path) -> str:
    """Generate a short, unique name for a path."""
    nym = gsnym_candidate(p)
    while nym in nyms_collision_check:
        nym = gsnym_candidate(p)
    nyms_collision_check.add(nym)
    return nym


def gsnym_candidate(p: Path) -> str:
    """Generate a candidate short, unique name for a path."""
    return p.stem + '_' + uuid.uuid4().hex[:6].upper()

IGNORE FILES IN GITIGNORE

Find the first .gitignore in a directory tree. Parse it to produce a function that tests whether a file must be ignored by TangleUp.

in_gitignore = lambda _: False

def find_first_gitignore() -> Path:
    p = dir_path
    for p in dir_path.rglob('*'):
        if p.name == '.gitignore':
            in_gitignore = parse_gitignore(str(p.absolute()))
            break;
    return p

TANGLEUP IMPORTS

from pathlib import Path
from typing import List
import uuid
from gitignore_parser import parse_gitignore
from pprint import pprint

WRITE NOWEB TO LINES

Now write the contents of each Python or Clojure file to a noweb block with its ginned-up name and a corresponding tangle block. Parenthetically, this just screams for the Writer monad, but we'll just do it by hand in an obvious, kindergarten way.files_result

WARNING: The explicit '\n' newlines probably won't work on Windows.

from typing import Tuple
from pprint import pprint
<block name="wrap one as raw"></block>
<block name="wrap several with blank lines"></block>
<block name="wrap lines with triple backticks"></block>
<block name="indent four spaces"></block>
def write_noweb_to_lines(lines: List[str],
                         file_gsnym_pair: Tuple[str],
                         language: str) -> None:
    path = Path(file_gsnym_pair[0])
    wrap_n_blank(lines, [f'## {path.name}\n'])
    wrap_1_raw(lines, f'<noweb name="{file_gsnym_pair[1]}">\n')
    with open(file_gsnym_pair[0]) as f:
        try:
            inlines = f.readlines()
        except UnicodeDecodeError as e:
            pprint(f'... SKIPPING UNDECODABLE FILE {path}')
            return
        pprint(f'DETANGLING file {path}')
    bound = []  ## Really want the monadic bind, here.
    if language == "markdown":
        indent_4(bound, inlines)
    else:
        wrap_triple_backtick(bound, inlines, language)
    wrap_n_blank(lines, bound)
    wrap_1_raw(lines, '</noweb>\n')
    lines.append(BLANK_LINE)

WRAP ONE LINE AS RAW

BEGIN_RAW = '<!-- #raw -->\n'
END_RAW = '<!-- #endraw -->\n'
def wrap_1_raw(lines: List[str], s: str) -> None:
    lines.append(BEGIN_RAW)
    lines.append(s)
    lines.append(END_RAW)

WRAP SEVERAL LINES IN BLANK LINES

BLANK_LINE = '\n'
def wrap_n_blank(lines: List[str], ss: List[str]) -> None:
    lines.append(BLANK_LINE)
    for s in ss:
        lines.append(s)
    lines.append(BLANK_LINE)

WRAP LINES IN TRIPLE BACKTICKS

def wrap_triple_backtick(lines: List[str],
                         ss: List[str],
                         language: str) -> None:
    lines.append(f'```{language}\n')
    for s in ss:
        lines.append(s)
    lines.append(f'```\n')

INDENT ALL LINES FOUR SPACES

def indent_4(lines: List[str], ss: List[str]):
    for s in ss:
        lines.append('    ' + s)

WRITE TANGLE TO LINES

def write_tangle_to_lines(lines: List[str],
                          file_gsnym_pair: Tuple[str],
                          language: str) -> List[str]:
    wrap_1_raw(lines, f'<tangle file="{file_gsnym_pair[0]}">\n')
    bound = []
    wrap_triple_backtick(bound,
                         [f'<block name="{file_gsnym_pair[1]}"></block>\n'],
                         language)
    wrap_n_blank(lines, bound)
    wrap_1_raw(lines, f'</tangle>\n')

TANGLEUP OVERWRITE MARKDOWN

Test the whole magillah, the up direction. You may have to backpatch some 'language' names when you open the markdown, but 'language' only affects syntax coloring.

<block name="tangleup-files-list"></block>
<block name="tangleup-write-noweb-to-lines"></block>
<block name="tangleup-write-tangle-to-lines"></block>
def tangleup_overwrite_markdown(
        output_markdown_filename: str,
        input_directory: str,
        title: str = "Untitled") -> None:
    pprint(f'WRITING LITERATE MARKDOWN to file {output_markdown_filename}')
    file_gsnym_pairs = files_list(input_directory)
    lines: List[str] = [f'# {title}\n\n']
    for pair in file_gsnym_pairs:
        p = Path(pair[0])
        if p.suffix == '.clj':
            language = f'clojure id={uuid.uuid4()}'
        elif p.suffix == '.py':
            language = f'python id={uuid.uuid4()}'
        elif p.suffix == '.md':
            language = 'markdown'
        else:
            language = ''
        write_noweb_to_lines(lines, pair, language)
        write_tangle_to_lines(lines, pair, language)
    import json
    
    with open(output_markdown_filename, "w") as f:
        for line in lines:
            f.write(line)
    pass

YES PRE-EXISTING MARKDOWN

NO CHANGES ON DISK

If there are no changes to the tangled files on disk, then we must merely reassemble the nowebs, tangles, and block tags from the files on disk. On its first pass, Tangledown recorded the structure of nowebs and tangles and of the Markdown that surrounds them. When detangling a file:

  1. look for every tangle that mentions that file

YES CHANGES ON DISK

CHANGES TO EXISTING CONTENTS

NEW CONTENTS

DELETED CONTENTS

FIRST SHOT

PRO TIP: For the Tangldown Kernel, if your little scripts contain noweb tags, surround them with tangle to /dev/null, reload the kernel spec, restart the kernel, then you can run them in the notebook.

from pprint import pprint
from tangledown_trace_foobar_md import sequential_structure as cells
pprint(cells)
fn = "tanglup_foobar.md"
line_no = 0
for cell in cells:
    if cell["kind"] == "between":
        <block name="tangleup_write_between"></block>
    elif cell["kind"] == "noweb":
        <block name="tangleup_write_noweb"></block>
    elif cell["kind"] == "tangle":
        <block name="tangleup_write_tangle"></block>
    else:
        assert False, f"unknown kind: {cell['kind']}"
pass
pass
pass

UNIT TESTS

NO PRE-EXISTING MARKDOWN FILE

Run these at the console for now.

<block name="tangleup-overwrite-markdown"></block>
if __name__ == "__main__":
    tangleup_overwrite_markdown(
        "asr_tangleup_test.md",
        "./examples",
        title="This is a First Test of the Emergency Tangleup System")
<block name="tangleup-overwrite-markdown"></block>
if __name__ == "__main__":
    tangleup_overwrite_markdown(
        "tangleup-test.md",
        ".",
        title="This is a Second Test of the Emergency Tangleup System")

APPENDIX: Developer Notes

If you change the code in this README.md and you want to test it by running the cell in Section Tangle It, Already!, you usually must restart whatever Jupyter kernel you're running because Jupytext caches code. If things continue to not make sense, try restarting the notebook server. It rarely but occasionally produces incorrect answers for more obscure reasons.

APPENDIX: Tangledown Kernel

The Tangledown kernel is OPTIONAL, but nice. Everything I talked about so far works fine without it, but the Tangledown Kernel lets you evaluate Jupytext notebook cells that have block tags in them. For example, you can run Tangledown on Tangledown itself in this notebook just by evaluating the cell that contains all of Tangledown, including the source for the kernel, here.

The Tangledown Compiler writes the full path of the current Markdown file corresponding to the current notebook to fixed place in the home directory, and the Tangledown Kernel reads gets all the nowebs from there.

If you run more than one instance of the Tangledown Kernel at one time on your machine, you must RETANGLE THE FILE AND RESTART THE TANGLEDOWN KERNEL WHEN YOU SWITCH NOTEBOOKS because the name of the current file is a fixed singleton. The Tangledown Kernel has no way to dynamically know what file you're working with. Sorry about that!

Installing the Tangledown Kernel

After you tangle the code out of this here README.md at least once, you will have two new files

  • ./tangledown_kernel/tangledown_kernel.py
  • ./tangledown_kernel/kernel.json

You must inform Jupyter about your new kernel. The following works for me on the Mac. It might be different on your machine:

jupyter kernelspec install --user tangledown_kernel

Running the Tangledown Kernel

You must put the source for the Tangledown Kernel somewhere Python can find it before you start Jupyter Lab. One way is to modify the PYTHONPATH environment variable. The following works for me on the Mac:

PYTHONPATH=".:/Users/brian/Library/Jupyter/kernels/tangledown_kernel" jupyter lab

Once the kernel is installed, there are multiple ways to run it in Jupyter Lab. When you first open a notebook, you get a menu. The default is the regular Python 3 kernel, and it works fine, but you won't be able to run cells that have block tags in them. If you choose the Tangledown Kernel, you can run such cells.

If you modify the kernel:

  1. re-tangle the kernel source, say by running the cell in this section
  2. re-install the kernel by running the little bash script above
  3. restart the kernel inside the notebook

Most of the time, you don't have to restart Jupyter Lab itself, but sometimes after a really bad bug, you might have to.

Source for the Tangledown Kernel

Adapted from these official docs.

The kernel calls expand_tangles after reformatting the lines a little. We learned about the reformatting by experiment. We explain expand_tangles here in the section about Tangledown itself. The rest of this is boilerplate from the official kernel documentation. There is no point, by the way, in running the cell below in any kernel. It's meant for the Jupyterlab startup engine, only. You just need to tangle it out and install it, as above.

NOTE: You will get errors if you run this cell in the notebook.

TODO: plumb a Tracer through here?

<block name="kernel-imports">
class TangledownKernel(IPythonKernel):
    <block name="kernel-instance-variables">
    async def do_execute(self, code, silent, store_history=True, user_expressions=None,
                   allow_stdin=False):
        if not silent:
            cleaned_lines = [line + '\n' for line in code.split('\n')]
            # HERE'S THE BEEF!
            expanded_code = expand_tangles(None, [cleaned_lines], self.nowebs)
            reply_content = await super().do_execute(
                expanded_code, silent, store_history, user_expressions)
            stream_content = {
                'name': 'stdout',
                'text': reply_content,
            }
            self.send_response(self.iopub_socket, 'stream', stream_content)
        return {'status': 'ok',
                # The base class increments the execution count
                'execution_count': self.execution_count,
                'payload': [],
                'user_expressions': {},
               }
if __name__ == '__main__':
    from ipykernel.kernelapp import IPKernelApp
    IPKernelApp.launch_instance(kernel_class=TangledownKernel)
from ipykernel.ipkernel import IPythonKernel
from pprint import pprint
import sys  # for version_info
from pathlib import Path
from tangledown import \
        accumulate_lines, \
        get_lines, \
        expand_tangles

KERNEL INSTANCE VARIABLES

These get indented on expansion because the block tag is indented. You could do it the other way: indent the code here and DON'T indent the block tag, but that would be ugly, wouldn't it?

Notice this kernel runs Tangledown on the full file path that's stored in current_victim_file.txt. That file path got written to that special place when you tangled the file the first time. This may explain why you must tangle the file once and then restart the kernel whenever you switch notebooks that are running the Tangledown Kernel.

current_victim_filepath = ""
with open(Path.home() / '.tangledown/current_victim_file.txt') as v:
    fp = v.read()
tracer_, nowebs, tangles_ = accumulate_lines(*get_lines(fp))
implementation = 'Tangledown'
implementation_version = '1.0'
language = 'no-op'
language_version = '0.1'
language_info = {  # for syntax coloring
    "name": "python",
    "version": sys.version.split()[0],
    "mimetype": "text/x-python",
    "codemirror_mode": {"name": "ipython", "version": sys.version_info[0]},
    "pygments_lexer": "ipython%d" % 3,
    "nbconvert_exporter": "python",
    "file_extension": ".py",
}
banner = "Tangledown kernel - expanding 'block' tags"

Kernel JSON Installation Helper

{"argv":["python","-m","tangledown_kernel", "-f", "{connection_file}"],
 "display_name":"Tangledown"
}

APPENDIX: Experimental Playground