Self-contained, one-step literate markdown.
Leslie Lamport, Turing-Award Winner, 2013, said, approximately:
Writing is Nature's Way of showing you how sloppy your thinking is. Coding is Nature's Way of showing you how sloppy your writing is. Testing is Nature's Way of showing you how sloppy your coding is.
If you can't write, you can't think. If you're not writing, you only think you're thinking.
In here, we will show you how to combine thinking, writing, coding, and testing in a natural way. Your code will be the central character in a narrative, a story crafted to help your readers understand both what you're doing and how you're doing it. Your code will be tested because you (and your readers) can run it, right here and now, inside this Jupytext [sic] notebook. Your story and your code will never get out of sync because you will be working with both of them all the time.
Narrative order is the natural order for a story, but it's not the natural order for interpreters and compilers, even for Jupyter kernels. Tangledown lets you write in narrative order, then, later, tangle the code out into executable order, where the definitions of the parts precede the story. That executable order is backwards and inside-out from the reader's point of view! TangleUp lets you maintain your code by rebuilding your story in narrative order from sources, in executable order, that you may have changed on disk (TangleUp is abandoned. Turned out to be too difficult).
Without something like this, you're condemned to explaining how your code works before you can say much or anything about what your code is doing. Indulge us in a little theory of writing, will you?
You're writing a murder mystery.
METHOD 1: Start with a data sheet: all the characters and their relationships. Francis stands to inherit, but Evelyn has a life-insurance policy on Victor. Bobbie is strong enough to swing an axe. Alice has poisonous plants in her garden. Charlie has a gun collection. Danielle is a chef and owns sharp knives. Lay out their schedules and whereabouts for several weeks. Finally, write down the murder, the solution, and all the loose ends your romantic detective might try.
METHOD 2: There's a murder. Your romantic detective asks "Who knew whom? Who benefitted? Who could have been at the scene of the crime? Who had opportunity? What was the murder weapon? Who could have done it?" Your detevtive pursues all the characters and their happenstances. In a final triumph of deductive logic, the detective identifies the killer despite compelling and ultimately distracting evidence to the contrary.
If your objective is to engage the audience, to motivate them to unravel the mystery and learn the twists and turns along the way, which is the better method? If your objective is to have them spend several hours wading through reference material, trying to guess where this is all going, which is the better method? If your objective is to organize your own thoughts prior to weaving the narrative, how do you start?
Now, you're writing about some software.
METHOD 1: Present all the functions and interfaces, cross dependencies, asynchronous versus synchronous, global and local state variables, possibilities for side effects. Finally, present unit tests and a main program.
METHOD 2: Explain the program's rationale and concept of operation, the solution it delivers, its modes and methods. Present the unit tests and main program that fulfill all that. Finally, present all the functions, interfaces, and procedures, all the bits and bobs that could affect and effect the solution.
If your objective is to engage your audience, to have them understand the software deeply, as if they wrote it themselves, which is the better method? If your objective is to have them spend unbounded time wading through reference material trying to guess what you mean to do, which is the better method?
Phaedrus says:
I give good, long, descriptive names to functions and parameters to make my code readable. I use Doxygen and Sphinx to automate document production. I'm a professional!
And Socrates says:
That's nice, but you only document the pieces, and say nothing about how the pieces fit together. It's like giving me a jigsaw puzzle without the box top. It's almost sadistic.
You condemn me to reverse-engineering your software: to running it in a debugger or to tracing logs and printouts.
Literate Programming is the best known way to save your audience the work of reverse engineering your code, of giving them the box top with the jigsaw puzzle.
Who is your audience?
yourself, first, down the line, trying to remember "why the heck did I do that?!?"
other programmers, eventually, when they take over maintaining and extending your code
First, write a paper about your code, explaining, at least to yourself, what you want to do and how you plan to do it. Flesh out your actual code inside the paper. RUN your code inside the paper, capturing printouts and charts and diagrams and what not, so others, including, first, your future self, can see the code at work. Iterate the process, rewriting prose and code as you invent and simplify, in a loop.
Ok, that's just ordinary Jupyter-notebook practice, right? Code inside your documentation, right? Doxygen and Sphinx inside-out, right?
Notebooks solve the inside-out problem, but ordinary programming is both inside out and upside-down from literate programming. Literate Programming solves the upside-down problem.
With ordinary notebook practice, you write everything in executable order, because a Jupyter notebook is just an interface to an execution kernel. Notebooks inherit the sequential constraints of the underlying intrepreter and compiler. Executable order usually forces us to define all details before using them. With Literate Programming, you write in narrative order, much more understandable to humans.
Executable order is usually the reverse of narrative order. Humans want to understrand the big picture first, then the details. They want to see the box-top of the jigsaw puzzle before looking at all the pieces. Executable order is upside-down to the human's point-of-view.
We've all had the experience of reading code and notebooks backwards so we don't get overwhelmed by details before understanding the big picture. That observation leads us to another imperative.
Write about your code in narrative order. Don't be tyrranized by your programming language into defining everything before you can talk about anything. Use tools to rearrange your code in executable order.
Donald Knuth invented Literate Programming so that he could both write about MetaFont and TeX and implement them in the same place. These are two of the most important computer programs ever written. Their longevity and quality testify to the viability of Literate Programming.
Tangledown is the tool that rearranges code from any Markdown file into executable order. This here document, README.md, the one you're reading right now, is the Literate Program for Tangledown.
Because our documentation language is Markdown, the language of this document is Literate Markdown. This here README.md, which you're reading right now, contains all the source for the Literate-Markdown tool, tangledown.py
, with all its documentation, all presented in narrative order, like a story.
tangledown.py
tangles code out of any Markdown document, not just this here README.md that you're reading right now. The verb "tangle" is traditional in Literate Programming. You might think it should be "untangle," because a Literate Markdown document is all tangled up from executable order. But Knuth prefers the human's point of view. The Markdown document contains the code in the correct, narrative order. To address untangling or detangling, we have TangleUp.
You can also run Tangledown inside a Jupyter notebook, specifically one that is linked to this here document, README.md, the one you're reading right now. See this section for more.
We should mention that Tangledown is similar to org-babel in Emacs (or spacemacs for VIM users). Those are polished, best-of-breed Literate-Programming tools for the Down direction. You have to learn some Emacs to use them, and that's an barrier for many people. Markdown is good enough for Github, and thus for most of us right now.
Tangledown, as a distribution format, is a complete solution to Literate Programming. You get a single Markdown file and all the tested source for your project is included. Run Tangledown and the project is sprayed out on disk, ready for running, further testing, and deploying.
As a development format, it's not quite enough. With only Tangledown, when you modify the source tree in executable order, your narrative is instantly out of date. We can't have that. See Section TangleUp for more.
TangleUp can generate unique names, as GUIDs, say, for new source files and blocks. You should be able to TangleUp an existing source tree into a new, fresh, non-pre-existing Markdown file and, then round-trip TangleUp and TangleDown.
Let's do Tangledown, first, and TangleUp later.
Jupytext [sic] automatically syncs a Markdown file with a Jupyter notebook. Read about it here. It works well in JupyterLab. Read about that here. Specifically, it lets you open this here Markdown file, README.md, that you're reading right now, as a Jupyter notebook, and you can evaluate some cells in the notebook.
Here's how I installed everything on an Apple Silicon (M1) Mac Book Pro, with Python 3.9:
pip install jupyter
pip install jupyterlab
pip install jupytext
Here is how I run it:
jupyter lab
or
PYTHONPATH=".:$HOME/Documents/GitHub/tangledow:$HOME/Library/Jupyter/kernels/tangledown_kernel" jupyter lab ~
when I want the Tangledown Kernel, and I almost always want the Tangledown kernel.
In JupyterLab
View->Activate Command Palette
Pair Notebook with Markdown
README.md
and say Open With -> Jupytext Notebook
README.md
or README.ipynb
...Jupytext will update the other,
IMPORTANT: To see the updates in the notebook when you modify the Markdown, you must
File->Reload Notebook from Disk
, and to see updates in the Markdown when you modify the notebook, you mustFile->Reload Markdown File from Disk
. Jupytext forces you to reload changed files manually. I'll apologize here, on behalf of the Jupytext team.
If you're reading or modifying README.ipynb
, or if you Open With -> Jupytext Notebook
on README.md
(my preference), you may see tiny, unrendered Markdown cells above and below all your tagged nowebs and tangles. DON'T DELETE THE TINY CELLS. Renderers of Markdown simply ignore the tags, but Jupytext makes tiny, invisible cells out of them!
Unless you're running the optional, new Tangledown Kernel, don't RUN cells with embedded block
tags in Jupyter, you'll just get syntax errors from Python.
This here README.md, the one you're reading right now, should tell the story of Tangledown. We'll use Tangledown to create Tangledown. That's just like bootstrapping a compiler. We'll use Tangledown to tangle Tangledown itself out of this here document named README.md that you're reading right now.
The first part of the story is that I just started writing the story. The plan and outline was in my head (I didn't explicitly do Method 1). Then I filled in the code, moved everything around when I needed to, and kept rewriting until it all worked the way I wanted it to work. Actually, I'm still doing this now. Tangledown and TangleUp are living stories!
This is a useful toy, but it has zero error handling. We currently talk only about the happy path. I try to be rude ("DUDE!") every place where I sense trouble, but I'm only sure I haven't been rude enough. Read this with a sense of humor. You're in on the story with me, and it's supposed to be fun!
I also didn't try it on Windows, but I did try it on WSL, the Windows Subsystem for Linux. Works great on WSL!
One way: run python3 tangledown.py REAMDE.md
or just python tangledown.py
at the command line. That command should overwrite tangledown.py. The code for tangledown.py is inside this here README.md that you're reading right now. The name of the file to overwrite, namely tangledown.py
, is embedded inside this here README.md itself, in the file
attribute of a <tangle ...>
tag. Read about tangle
tags below!
If you said python3 tangledown.py MY-FOO.md
, then you would tangle
code out of MY-FOO.md
. You'll do that once you start writing your own code in
Tangledown. You will love it! We have some big examples that we'll write about elsewhere. Those examples include embedded code and microcode for exotic hardware, all written in Python!
Tangledown is both a script and a module. You can run Tangledown in a Jupytext cell after importing some stuff from the module. The next cell illustrates the typical bootstrapping joke of tangling Tangledown itself out of this here README.md that you're reading right now, after this Markdown file has been linked to a Jupytext notebook.
from tangledown import get_lines, accumulate_lines, tangle_all
tangle_all(*accumulate_lines(*get_lines("README.md")))
After you have tangled at least once, as above, and if you switch the notebook kernel to the new, optional Tangledown Kernel, you can evaluate the source code for the whole program in the later cell i'm linking right here. How Cool is That?
You'll also need to re-tangle and restart the Tangledown Kernel when you add new nowebs to your files. Sorry about that. This is still just a toy.
Because Tangledown is a Python module, you can also run Tangledown from inside a standalone Python program, say in PyCharm or VS Code or whatever;
hello_world_tangler.py
in this repository is an example.
Once again, Jupytext lets you RUN code from a Markdown
document in a JupyterLab notebook with just the ordinary Python3 kernel. If you open hello_world.md
as a Jupytext
notebook in JupyterLab then you
can run Tangledown in Jupyter cells. Right-click on the name hello_world.md
in the JupyterLab GUI and choose
Open With ...
$\longrightarrow$ Jupytext Notebook
Then run cells! This is close to the high bar set by org-babel!
How can we rearrange code cells in a notebook or Markdown file from human-understandable, narrative order to executable order?
Exploit the fact that most Markdown renderers, like Jupytext's, Github's, and PyCharm's, ignore HTML / XML tags (that is, stuff inside angle brackets) that they don't recognize. Let's enclose blocks of real, live code with noweb
tags, like this:
<noweb name="my_little_tests">
class TestSomething ():
def test_something (self):
assert (3 == 2+1)
</noweb>
The markdown above renders as follows. You can see the noweb
one-liner raw cells above and below the code in Jupytext. If they were Markdown cells, they'd be tiny and invisible. That's 100% OK, and may be more to your liking! Try changing the cells from RAW (press "R") to Markdown (press "M") and back, then closing them (Shift-Enter) and opening them (Enter). Don't mark the tag cells CODE (don't press "Y"). Tangledown won't work because Jupytext will surround them with triple-backticks.
class TestSomething ():
def test_something (self):
assert (42 == 6 * 7)
What are the <noweb ...>
and </noweb>
tags? We explain them immediately below.
noweb
tagsMarkdown ignores <noweb ...>
and </noweb>
tags, but tangledown.py
doesn't. tangledown.py
sucks up the contents of the noweb
tags and sticks them into a dictionary for later lookup when processing block
tags.
The contents of a noweb
tag are between the opening <noweb ...>
and closing </noweb>
fenceposts. Markdown renders code contents with syntax coloring and indentation. That's why we want code cells to be CODE cells and not RAW cells.
The term contents is ordinary jargon from HTML, XML, SGML, etc., and applies to any opening <foo ...>
and closing </foo>
pair.
The Tangledown dictionary key for contents of a noweb
tag is the string value of the name
attribute. For example, in <noweb name="foo">
, name
is an attribute, and its string value is "foo"
.
Noweb names must be unique in a document. TangleUp ensures that when it writes a new Markdown file from existing source, or you may do it by hand.
NOTE: the name
attribute of a noweb
opener must be on the same line, like this:
<noweb name="foo">
Ditto for our other attributes, as in the following. Don't separate attributes with commas!
<noweb name="foo" language="python">
This single-line rule is a limitation of the regular expressions that detect noweb
tags. Remeber, Tangledown is a toy, a useful toy, but it's limited.
You can create the fencepost cells, <noweb ...>
and </noweb>
, either in the plain-text Markdown file, or you can create them in the synchronized Jupytext notebook.
If you create fencepost cells in plain-text Markdown as opposed to the Jupytext notebook, leave a blank line after the opening <noweb ...>
and a blank line before the closing </noweb>
. If you don't, the Markdown renderer won't color and indent the contents. Tangledown will still work, but the Markdown renderer will format your code like text without syntax coloring and indentatino.
If you write fencepost cells in Markdown cells in the notebook or as blank-surrounded tags in the plain-text Markdown, the fenceposts appear as tiny, invisible Markdown cells because the renderer treats them as empty markdown cells. That's the fundamental operating principle of Tangledown: Markdown ignores tags it doesn't recognize! DON'T DELETE THE TINY, INVISIBLE CELLS, but you can open (Enter) and close (Shift-Enter) them.
If you create noweb
and tangle
tags in the notebook and you want them visible, mark them RAW by pressing "R" with the cell highlighted but not editable. Don't mark them CODE (don't press "Y"). Tangledown will break because Jupytext will surround them with triple-backticks.
block
tagsLater, in the second pass, Tangledown blows the contents of noweb
tags back out wherever it sees block
tags with matching name
attributes. That's how you can define code anywhere in your document and use it in any other place, later or earlier, more than once if you like.
block
tags can and should appear in the contents of noweb
tags and in the in the contets of tangle
tags, too. That's how you structure your narrative!
Tangledown discards the contents of block
tags. Only the name
attribute of a block
tag matters.
You don't have to write the noweb before you write a matching block
tag. You can refer to a noweb
tag before it exists in time and space, more than once if you like. You can define things and name things and use things in any order that makes your thinking and your story more clear. This is literature, after all.
tangle
tags
A tangle
tag sprays its block-expanded contents to a file on disk. What file? The file named in the file
attribute of the tangle
tag. Expanding contents of a tangle
tag means replacing every block
tag with the contents of its matching noweb
tag, recursively, until everything bottoms out in valid Python.
The same rules about blank lines hold for tangle
tags as they do for noweb
tags: if you want Markdown to render the contents like code, surround the contents with blank lines or mark the tag cells RAW. The following Markdown
<tangle file="/dev/null">
<block name="my_little_tests"></block>
if __name__ == '__main__':
TestSomething().test_something()
</tangle>
renders like this
import unittest
<block name="my_little_tests"></block>
if __name__ == '__main__':
TestSomething().test_something()
See the tiny, invisible Markdown cells above and below the code? Play around with opening and closing them with Enter and Shift-Enter, respectively, and marking them RAW (Press "R") and Markdown ("M"). Don't mark them CODE ("Y").
You can evaluate the cell with the new, optional Tangledown Kernel. If you evaluate the code cell in the Python Kernel, you'll get a syntax error because the block
tag is not valid Python. The syntax error is harmless to Tangledown.
This code tangles to the file /dev/null
. That's a nifty trick for temporary tangle
blocks. You can talk about them, validate them by executing their cells in the Tangledown Kernel, and throw them away.
TangleUp knows where Tangledown puts all the blocks and tangles. That's how, when you change code on disk, TangleUp can put it all back in the single file of Literate Markdown.
block
TAGS!Markdown renders block
tags verbatim inside nowebs or tangles. This is good for humans, who will think
AHA!, this
block
refers to some code in anoweb
tag somewhere else in this Markdown document. I can read all the details of that code later, when it will make more sense. I can look at the picture on the box before the pieces of the jigsaw puzzle.
Thank you, kindly, author! Without you, I'd be awash in details. I'd get tired and cranky before understanding the big picture!
See, I'll prove it to you. Below is the code for all of tangledown.py
itself. You can understand this without understanding the implementations of the sub-pieces, just getting a rought idea of what they do from the names of the block
tags. READ THE NAMES IN THE BLOCK TAGS. Later, if you want to, you can read all the details in the noweb
tags named by the block
tags.
If you're running the new, optional Tangledown Kernel, you can evaluate this next cell and run Tangledown on Tangledown itself, right here in a Jupyter notebook. How Cool is That?
<block name="types"></block>
<block name="openers"></block>
<block name="tracer"></block>
<block name="oh-no-there-are-two-ways"></block>
<block name="accumulate-contents"></block>
<block name="accumulate-lines"></block>
<block name="thereIsABlockTag"></block>
<block name="eatBlockTag"></block>
<block name="expandBlocks"></block>
<block name="expand-tangles"></block>
def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None:
for filename, liness in tangles.items ():
Path(filename).parents[0].mkdir(parents=True, exist_ok=True)
contents: str = expand_tangles(tracer, liness, nowebs)
with open (filename, 'w') as outfile:
print(f"WRITING FILE: {filename}")
outfile.write (contents)
tracer.dump()
if __name__ == "__main__":
fn, lines = get_lines(get_aFile())
# test_re_matching(lines)
tracer, nowebs, tangles = accumulate_lines(fn, lines)
tangle_all(tracer, nowebs, tangles)
The whole program is in the function tangle_all
. We get hybrid vigor by testing __name__
against "__main__"
: tangledown.py
is both a script and module.
All we do in tangle_all
is loop over all the line lists in the tangles (for filename, liness in tangles.items()
) and expand them to replace blocks with nowebs. Yes, "liness" has an extra "s". Remember Smeagol Pronounce it like "my preciouses, my lineses!"
The code will create the subdirectories needed. For example, if you tangle to file foo/bar/baz/qux.py,
the code creates the directory chain ./foo/br/baz/
if it doesn't exist.
Let us now explain the implementation. The first block in the tangle above is types. What is the noweb of types? It's here.
A Line
is a string, Python base type str
. Lines
is the type of a list of lines. Liness
is the type of a list of list of lines, in a pluralizing shorthand borrowed from Haskell practice. Pronounce liness
the way Smeagol would do: "my preciouses, my lineses!"
A noweb name is a string, and a tangle file name is a string. A line number is an int
, a Python base type.
Nowebs are dictionaries from noweb names to lines.
Tangles are dictionaries from file names to Liness --- lists of lists of lines. Tangledown accumulates output for tangle
files mentioned more than once. If you tangle to qux.py
in one place and then also tangle to qux.py
somewhere else, the second tangle won't overwrite the first, but append to it. That's why tangles are lists of lists of lines, one list of lines for each mentioning of a tangle file. Read more about that in expand-tangles.
from typing import List, Dict, Tuple, Match
NowebName = str
FileName = str
TangleFileName = FileName
LineNumber = int
Line = str
Lines = List[Line]
Liness = List[Lines]
LinesTuple = Tuple[LineNumber, Lines]
Nowebs = Dict[NowebName, Lines]
Tangles = Dict[TangleFileName, Liness]
We'll implement all the noweb blocks, like accumulate_contents
and eatBlockTag
, later. You can read about them, or not, after you've gotten more of the big picture.
The Tangledown Kernel doesn't support the Jupytext debugger, yet. Sorry about that. Tangle the code out to disk and debug it with pudb or whatever, then tangle it back up into your Literate Markdown file via TangleUp.
Tangledown is still a toy. Ditto refactoring. PyCharm is great for that, but you'll have to do it on tangled files and detangle (paste) back into the Markdown.
We separated out the inner loop over Liness [sic] into another function, expand_tangles
, so that the Tangledown Kernel can import it and apply it to block
tags. tangle_all
calls expand_tangles
; expand_tangles
calls expand_blocks
. Read about expand_blocks
here.
from graphviz import Digraph
g = Digraph(graph_attr={'size': '8,5'}, node_attr={'fontname': 'courier'})
g.attr(rankdir='LR')
g.edge('tangle_all', 'expand_tangles')
g.edge('expand_tangles', 'expand_blocks')
g
def expand_tangles(tracer: Tracer, liness: Liness, nowebs: Nowebs) -> str:
contents: Lines = []
for lines in liness:
while there_is_a_block_tag (lines):
lines = expand_blocks (tracer, nowebs, lines)
contents += lines
return ''.join(contents)
Tangledown has two kinds of regular expressions (regexes) for matching tags in a Markdown file:
regexes for noweb
and tangle
tags that appear on lines by themselves, left-justified
regexes that match <block ...>
tags that may be indented, and match their closing </block>
tags, which may appear on the same line as <block ...>
or on lines by themselves.
Both kinds of regex are safe: they do not match themselves. That means it's safe to run
tangledown.py
on this here READMD.md
, which contains tangled source for tangledown.py
.
The two regexes in noweb left_justified_regexes
match noweb
andtangle
tags that appear on lines by themselves, left-justified.
They also wont match
noweb
andtangle
tags that are indented. That lets us talk aboutnoweb
andtangle
tags without processing them: just put the examples you're talking about in an indented Markdown code cell instead of in a triple-backticked Markdown code cell.
The names in the attributes of noweb
and tangle
tags must start with a letter, and they can contain letters, numbers, hyphens, underscores, whitespace, and dots.
The names of noweb
tags must be globally unique within the Markdown file. Multiple tangle
tags may refer to the same output file, in which cases, Tangledown appends the contents of the second and subsequent tangle
tags to a list of list of lines, to a Liness
.
There is a .*
at the end to catch attributes beyon name
. A bit of future-proofing.
noweb_start_re = re.compile (r'^<noweb name="(\w[\w\s\-.]*)".*>$')
noweb_end_re = re.compile (r'^</noweb>$')
tangle_start_re = re.compile (r'^<tangle file="(.+/\\[^/]+|.+)".*>$')
tangle_end_re = re.compile (r'^</tangle>$')
The regexes in this noweb, anywhere_regexes
, match block
tags that may be indented, preserving indentation. The block_end_re
regex also preserves indentation. Indentation is critical for Python, Haskell, and other languages.
I converted the 'o' in 'block' to a harmless regex group [o]
so that block_end_re
won't match itself. That makes it safe to run this code on this here document itself.
block_start_re = re.compile (r'^(\s*)<block name="(\w[\w\s\-.]*)">')
block_end_re = re.compile (r'^(\s)*</bl[o]ck>')
The code in noweb openers
has two block
tags that refer to the nowebs of the regexes defined above, namely left_justified_regexes
and anywhere_regexes
. After Tangledown substitutes the contents of the nowebs for the blocks, the code becomes valid Python and you can call test_re_matching
in the Tangledown Kernel or at the command line. When you call it, it proves that we can recognize all the various kinds of tags. We leave the regexes themselves as global pseudo-constants so that they're both easy to test and to use in the body of the code (Demeter weeps because of globals).
The code in hello_world.ipynb
(after you have Paired a Notebook with the Markdown File hello_world.md
) runs this test as its last act to check that tangledown.py
was correctly tangled from this here README.md
. That code works in the ordinary Python kernel and in the Tangledown Kernel.
Notice the special treatment for block ends, which are usually on the same lines as their block opener tags, but not necessarily so. That lets you put (useless) contents in block
tags.
import re
import sys
from pathlib import Path
<block name="getting a file and its lines"></block>
<block name="left_justified_regexes"></block>
<block name="anywhere_regexes"></block>
def test_re_matching(fp: Path, lines: Lines) -> None:
for line in lines:
noweb_start_match = noweb_start_re.match (line)
tangle_start_match = tangle_start_re.match (line)
block_start_match = block_start_re.match (line)
noweb_end_match = noweb_end_re.match (line)
tangle_end_match = tangle_end_re.match (line)
block_end_match = block_end_re.match (line)
if (noweb_start_match):
print ('NOWEB: ', noweb_start_match.group (0))
print ('name of the block: ', noweb_start_match.group (1))
elif (noweb_end_match):
print ('NOWEB END: ', noweb_end_match.group (0))
elif (tangle_start_match):
print ('TANGLE: ', tangle_start_match.group (0))
print ('name of the file: ', tangle_start_match.group (1))
elif (tangle_end_match):
print ('TANGLE END: ', tangle_end_match.group (0))
elif (block_start_match):
print ('BLOCK: ', block_start_match.group (0))
print ('name of the block: ', block_start_match.group (1))
if (block_end_match):
print ('BLOCK END SAME LINE: ', block_end_match.group (0))
else:
print ('BLOCK NO END')
elif (block_end_match):
print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0))
else:
pass
Tangledown passes once over the file to collect contents of noweb
and tangle
tags, and again over the tangle
tags to expand block
tags. In the second pass, Tangledown substitutes noweb contents for corresponding block
tags until there are no more block
tags, creating valid Python.
In the first pass over the file, we'll just save the contents of noweb and tangle into dictionaries, without expanding nested block
tags.
tangledown.py
is both a script and a module. As a script, you run it from the command line, so it gets its input file name from command-line arguments. As a module, called from another Python program, you probably want to give the file as an argument to a function, specifically, to get_lines
.
Let's write two functions,
get_aFile
, which parses command-line arguments and produces a file name; the default file name is README.md
get_lines
, which
gets lines, without processing noweb
, tangle
, or block
tags, from its argument, aFilename
replaces #raw
and #endraw
fenceposts with blank lines
writes out the full file path to a secret place where the Tangledown Kernel can pick it up
get_aFile
can parse command-line arguments that come from either python
on the command line or from a Jupitext
notebook, which has a few kinds of command-line arguments we must ignore, namely command-line arguments that end in .py
or in .json
.
This method for getting a file name from the argument list will eat all options. It works for the Tangledown Kernel and for tangling down from a script or a notebook, but it's not future-proofed. Tangledown is still a toy.
print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv})
def get_aFile() -> str:
"""Get a file name from the command-line arguments
or 'README.md' as a default."""
<block name="print-sys-argv"></block>
aFile = 'README.md' # default
if len(sys.argv) > 1:
file_names = [p for p in sys.argv
if (p[0] != '-') # option
and (p[-3:] != '.py')
and (p[-5:] != '.json')]
if file_names:
aFile = sys.argv[1]
return aFile
raw_line_re: re = re.compile(r'<!-- #(end)?raw -->')
def get_lines(fn: FileName) -> Lines:
"""Get lines from a file named fn. Replace
'raw' fenceposts with blank lines. Write full path to
a secret place for the Tangledown kernel to pick it up.
Return tuple of file path (for TangleUp's Tracer) and
lines."""
<block name="save-afile-path-for-kernel"></block>
xpath = save_aFile_path_for_kernel(fn)
with open(fn) as f:
in_lines: Lines = f.readlines ()
out_lines: Lines = []
for in_line in in_lines:
out_lines.append(
in_line if not raw_line_re.match(in_line) else "\n")
return xpath, out_lines
We must normalize file names so that, for example, "foo.txt" and "./foo.txt" indicate the same file and so that ~/
denotes the home directory on Mac and Linux. I didn't test this on Windows.
def anchor_is_tilde(path_str: str) -> bool:
result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '')
return result
def normalize_file_path(tangle_file_attribute: str) -> Path:
result: Path = Path(tangle_file_attribute)
if (anchor_is_tilde(tangle_file_attribute)):
result = (Path.home() / tangle_file_attribute[2:])
return result.absolute()
Returns its input file name after expanding its full path and saving the full path in a special place where the Tangledown Kernel can find it.
def save_aFile_path_for_kernel(fn: FileName) -> FileName:
xpath: Path = Path.cwd() / Path(fn).name
victim_file_name = str(xpath.absolute())
safepath: Path = Path.home() / '.tangledown/current_victim_file.txt'
Path(safepath).parents[0].mkdir(parents=True, exist_ok=True)
print(f"SAVING {victim_file_name} in secret place {str(safepath)}")
with open(safepath, "w") as t:
t.write(victim_file_name)
return xpath
Turns out there are two ways to write code blocks in Markdown:
indented by four spaces, useful for quoted Markdown and quoted triple-backtick blocks
surrounded by triple backticks and not indented.
Tangledown must handle both ways.
We use the trick of a harmless regex group --- regex stuff inside square brackets --- around one of the backticks in the regex that recognizes triple backticks. This regex is safe to run on itself. See triple_backtick_re
in the code immediately below.
The function first_non_blank_line_is_triple_backtick
, in noweb oh-no-there-are-two-ways
recognizes code blocks bracketed by triple backticks. The contents of this noweb
tag is triple-bacticked, itself. Kind of a funny self-toast joke, no?
Remember the use-mention dichotomy from Philosophy class? No problem if you don't.
When we're talking about noweb
and tangle
tags, but don't want to process them, we indent the tags and the code blocks. Tangledown won't process indented noweb
and tangle
tags because the regexes in noweb left_justified_regexes
won't match them.
We can also talk about triple-backticked blocks by indenting them. Tangledown won't mess with indented triple-backticked blocks, because the regex needs them left-justified. Markdown also wont get confused, so we can quote whole markdown files by indenting them. Yes, your Literate Markdown can also, recursively, tangle out more Markdown files. How cool is that? Will the recursive jokes never end?
TangleUp has a heuristic for placing language and id information on triple-backtick fence openers. Our function will retrieve those if present.
We see, below, why the code tracks line numbers. We might do all this in some super-bitchin', sophomoric list comprehension, but this is more obvious-at-a-glance. That's a good thing.
Match lines with left-justified triple-backtick. Pass through lines with indented triple-backtick.
We must trace raw
fenceposts, but not copy them to
triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)')
blank_line_re = re.compile (r'^\s*$')
def first_non_blank_line_is_triple_backtick (
i: LineNumber, lines: Lines) -> Match[Line]:
while (blank_line_re.match (lines[i])):
i = i + 1
yes = triple_backtick_re.match (lines[i])
language = "python" # default
id_ = None # default
if yes:
language = yes.groups()[1] or language
id_ = yes.groups()[3] ## can be 'None'
return i, yes, language, id_
Tangledown is a funny little compiler. It converts Literate Markdown to Python or other languages (Tangledown supports Clojure and Markdown, too). We could go nuts and write it in highfalutin' style, and then it would be much bigger, more elaborate, and easier to explain to a Haskell programmer. It might also be less of a toy. However, we want this toy Tangledown for now to be:
very short
independent of rich libraries like beautiful soup and parser combinators
completely obvious to anyone
We'll just use iteration and array indices, but in a tasteful way so our functional friends won't puke. This is Python, after all, not Haskell! We can just get it done, with grace, panache, and aplomb.
The function accumulate_contents
accumulates the contents of left-justified noweb
or tangle
tags. The function starts at line i
of the input, then figures out whether a tag's first non-blank line is triple backtick, in which case it won't snip four spaces from the beginning of every line, and finally keeps going until it sees the closing fencepost, </noweb>
or </tangle>
. It returns a tuple of the line index after the closing fencepost, and the contents, possibly de-dented. The function manipulates line numbers to skip over triple backticks.
def accumulate_contents (
lines: Lines, i: LineNumber, end_re: re) -> LinesTuple:
r"""Harvest contents of a noweb or tangle tag. The start
taglet was consumed by caller. Consume the end taglet."""
i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines)
snip = 0 if yes else 4
contents_lines: Lines = []
for j in range (i, len(lines)):
if (end_re.match(lines[j])):
return j + 1, language, id_, contents_lines # the only return
if not triple_backtick_re.match (lines[j]):
contents_lines.append (lines[j][snip:])
The old accumulate_lines
has reached the end of its life. It ignores raw cells, except for some hacks for raw noweb and tangle tags. The new_accumulate_lines
must parse several kinds of line-sequences explicitly. Let's be careful not to call line-sequences blocks so we don't confuse line-sequences with block tags.
The function accumulate_lines
calls accumulate_contents
to suck up the contents of all the left-justified noweb
tags and tangle
tags out of a file, but doesn't expand any block
tags that it finds. It just builds up dictionaries, noweb_blocks
and tangle_files
, keyed by name
or file
attributes it finds inside noweb
or tangle
tags.
<block name="normalize-file-path"></block>
raw_start_re = re.compile("<!-- #raw -->")
raw_end_re = re.compile("<!-- #endraw -->")
from pprint import pprint
def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]:
tracer = Tracer()
tracer.fp = fp
nowebs: Nowebs = {}
tangles: Tangles = {}
i = 0
while i < len(lines):
noweb_start_match = noweb_start_re.match (lines[i])
tangle_start_match = tangle_start_re.match (lines[i])
if noweb_start_match:
<block name="acclines_handle_noweb"></block>
elif tangle_start_match:
<block name="acclines_handle_tangle"></block>
elif raw_start_re.match (lines[i]):
<block name="acclines_handle_raw"></block>
else:
<block name="acclines_handle_markdown"></block>
if in_between: # Close out final markdown.
tracer._end_betweens(i)
return tracer, nowebs, tangles
pass
in_between = True
tracer.add_markdown(i, lines[i])
i += 1
in_between = False
key: NowebName = noweb_start_match.group(1)
(i, language, id_, nowebs[key]) = \
accumulate_contents(lines, i + 1, noweb_end_re)
tracer.add_noweb(i, language, id_, key, nowebs[key])
in_between = False
key: TangleFileName = \
str(normalize_file_path(tangle_start_match.group(1)))
if not (key in tangles):
tangles[key]: Liness = []
(i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re)
tangles[key] += [things]
tracer.add_tangle(i, language, id_, key, tangles[key])
There is a lot that can go wrong. We can have all kinds of mal-formed contents:
We'll get to error handling someday, maybe. Tangledown is just a little toy at the moment, but I thought it interesting to write about. If it's ever distributed to hostile users, then we will handle all the bad cases. But not now. Let's get the happy case right.
Iterate over all the noweb
or tangle
tag contents and expand the
block
tags we find in there, recursively. That means keep going until there are no more block
tags, because nowebss are allowed (encouraged!) to refer to other nowebs via block
tags. If there are cycles, this will hang.
We're doing the happy cases first, and will get to cycle detection someday, maybe.
First, we need to detect that some list of lines contains a block
tag, left-justified or not. That means we must keep running the expander on that list.
def there_is_a_block_tag (lines: Lines) -> bool:
for line in lines:
block_start_match = block_start_re.match (line)
if (block_start_match):
return True
return False
If there is a block
tag, we must eat the tag and its meaningless contents:
def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber:
for j in range (i, len(lines)):
end_match = block_end_re.match (lines[j])
# DUDE! Check leading whitespace against block_start_re
if (end_match):
return j + 1
else: # DUDE!
pass
The following function does one round of block expansion. The caller must test whether any block
tags remain, and keep running the expander until there are no more block
tags. Our functional fu grandmaster might be appalled, but sometimes it's just easier to iterate than to recurse.
def expand_blocks (tracer: Tracer, nowebs: Nowebs, lines: Lines,
language: str = "python") -> Lines:
out_lines = []
block_key: NowebName = ""
for i in range (len (lines)):
block_start_match = block_start_re.match (lines[i])
if (block_start_match):
leading_whitespace: str = block_start_match.group (1)
block_key: NowebName = block_start_match.group (2)
block_lines: Lines = nowebs [block_key] # DUDE!
i: LineNumber = eat_block_tag (i, lines)
for block_line in block_lines:
out_lines.append (leading_whitespace + block_line)
else:
out_lines.append (lines[i])
return out_lines
For TangleUp, we'll need to trace the entire operation of Tangledown, first and second passes. TangleUp reverses Tangledown, so we will want a best-effort reconstruction of the original Markdown file.
Our first approach will be a sequential list of dictionaries with all the needed information.
from dataclasses import dataclass, field
from typing import Union ## TODO
@dataclass
class Tracer:
trace: List[Dict] = field(default_factory=list)
line_no = 0
current_betweens: Lines = field(default_factory=list)
fp: Path = None
# First Pass
<block name="tracer.add_markdown"></block>
<block name="tracer.add_raw"></block>
<block name="tracer._end_betweens"></block>
<block name="tracer.add_noweb"></block>
<block name="tracer.add_tangle"></block>
<block name="tracer.dump"></block>
# Second Pass
<block name="tracer.add_expanded_noweb"></block>
<block name="tracer.add_expanded_tangle"><block>
def add_raw(self, i, between: Line):
self.line_no += 1
self.current_betweens.append((self.line_no, between))
def add_markdown(self, i, between: Line):
self.line_no += 1
self.current_betweens.append((self.line_no, between))
def _end_betweens(self, i):
if self.current_betweens:
self.trace.append({"ending_line_number": self.line_no, "i": i,
"language": "markdown", "kind": 'between',
"text": self.current_betweens})
self.current_betweens = []
def add_noweb(self, i, language, id_, key, noweb_lines):
self._end_betweens(i)
self.line_no = i
self.trace.append({"ending_line_number": self.line_no, "i": i,
"language": language, "id_": id_,
"kind": 'noweb', key: noweb_lines})
def add_tangle(self, i, language, id_, key, tangle_liness):
self._end_betweens(i)
self.line_no = i
self.trace.append({"ending_line_number": self.line_no, "i": i,
"language": language, "id_": id_,
"kind": 'tangle', key: tangle_liness})
def add_expandedn_noweb(self, i, language, id_, key, noweb_lines):
self._end_betweens(i)
self.line_no = i
self.trace.append({"ending_line_number": self.line_no, "i": i,
"language": language, "id_": id_,
"kind": 'expanded_noweb', key: noweb_lines})
def add_expanded_tangle(self, i, language, id_, key, tangle_liness):
self._end_betweens(i)
self.line_no = i
self.trace.append({"ending_line_number": self.line_no, "i": i,
"language": language, "id_": id_,
"kind": 'expanded_tangle', key: tangle_liness})
def dump(self):
pr = self.fp.parent
fn = self.fp.name
fn2 = fn.translate(str.maketrans('.', '_'))
# Store the trace in the dir where the input md file is:
vr = f'tangledown_trace_{fn2}'
np = pr / (vr + ".py")
with open(np, "w") as fs:
print(f'sequential_structure = (', file=fs)
pprint(self.trace, stream=fs)
print(')', file=fs)
Ok, you saw at the top that the code in this here Markdown document, README.md, when run as a script, will read in all the lines in ... this here Markdown document, README.md
. Bootstrapping!
But you have to run something first. For that, I tangled the code manually just
once and provide tangledown.py
in the repository. The chicken definitely comes
before the egg.
But if you have the chicken (tangledown.py
), you can import it as a module and execute the following cell, a copy of the one at the top. That should overwrite tangledown.py
with the contents of this notebook or Markdown file. So our little bootstrapping technique will forever update the Tangledown compiler if you change it in this here README.md that you're reading right now!
from tangledown import get_lines, accumulate_lines, tangle_all
tangle_all(*accumulate_lines(*get_lines("README.md")))
examples/asr
.noweb
and tangle
cells.tangledown
so we can run noweb
and block
tags that have block
tags in them.Some people write "TODO" in their code so they can find all the spots where they thought they might have trouble but didn't have time to write the error-handling (prophylactic) code at the time. I like to write "DUDE" because it sounds like both "TODO" but is more RUDE (also sounds like "DUDE") and funny. This story is supposed to be amusing.
I must apologize once again, but this is just a toy at this point! Recall the DISCLAIMER. The following are stackranked from highest to lowest priority.
~
does not work. We know one dirty way to fix it, but proper practice with pathlib is a better answer.We'll start with a non-real-time solution. You'll manually run tangleup
to put modified source back into the Markdown. Later, we'll do something that can track changes on disk and update the Markdown in real time.
When you modify your source tree, tangleup
puts the modified code back into the Markdown file with reminders to detangle and to write. There are two cases:
You modified some source that corresponds to an existing noweb block in the Markdown.
You added some source that doesn't yet correspond to a noweb block in the Markdown.
To assist TangleUp, Tangledown records unique names for existing noweb blocks along with the tangled source. Tangledown also records robust locations for existing blocks. Robust means that the boundary locations are flexible: starting and ending line and character positions in a source file are not enough because changing an early one invalidates all later ones.
We don't need the trace file for this case.
Enumerate all the files in a directory tree. Pair each file name with a short, unique name for the nowebs. TODO: ignore files and directories listed in the .gitignore
.
%pip install gitignore-parser
<block name="tangleup imports"></block>
def files_list(dir_name: str) -> List[str]:
dir_path = Path(dir_name)
files_result = []
nyms_result = []
file_count = 0
<block name="unique-names"></block>
<block name="ignore files in .gitignore"></block>
<block name="recurse a dir"></block>
find_first_gitignore()
recurse_a_dir(dir_path)
assert file_count == len(nyms_collision_check)
return list(zip(files_result, nyms_result))
The only complexity, here, is ignoring .git
and files in .gitignore
def recurse_a_dir(dir_path: Path) -> None:
for p in dir_path.glob('*'):
q = p.absolute()
qs = str(q)
try: # don't skip files in dirs above .gitignore
ok = not in_gitignore(qs)
except ValueError as e: # one absolute and one relative?
ok = True
if p.name == '.git':
ok = False
if not ok:
pprint(f'... IGNORING file or dir {p}')
if ok and q.is_file():
nonlocal file_count # Assignment requires 'nonlocal'
file_count += 1
nyms_result.append(gsnym(q)) # 'nonlocal' not required
files_result.append(qs) # because not ass'gt but mutation
elif ok and p.is_dir:
recurse_a_dir(p)
Correct for collisions, which will be really rare, so there is a negligible effect on speed.
nyms_collision_check = set()
def gsnym(p: Path) -> str:
"""Generate a short, unique name for a path."""
nym = gsnym_candidate(p)
while nym in nyms_collision_check:
nym = gsnym_candidate(p)
nyms_collision_check.add(nym)
return nym
def gsnym_candidate(p: Path) -> str:
"""Generate a candidate short, unique name for a path."""
return p.stem + '_' + uuid.uuid4().hex[:6].upper()
Find the first .gitignore
in a directory tree. Parse it to produce a function that tests whether a file must be ignored by TangleUp.
in_gitignore = lambda _: False
def find_first_gitignore() -> Path:
p = dir_path
for p in dir_path.rglob('*'):
if p.name == '.gitignore':
in_gitignore = parse_gitignore(str(p.absolute()))
break;
return p
from pathlib import Path
from typing import List
import uuid
from gitignore_parser import parse_gitignore
from pprint import pprint
Now write the contents of each Python or Clojure file to a noweb block with its ginned-up name and a corresponding tangle block. Parenthetically, this just screams for the Writer monad, but we'll just do it by hand in an obvious, kindergarten way.files_result
WARNING: The explicit '\n' newlines probably won't work on Windows.
from typing import Tuple
from pprint import pprint
<block name="wrap one as raw"></block>
<block name="wrap several with blank lines"></block>
<block name="wrap lines with triple backticks"></block>
<block name="indent four spaces"></block>
def write_noweb_to_lines(lines: List[str],
file_gsnym_pair: Tuple[str],
language: str) -> None:
path = Path(file_gsnym_pair[0])
wrap_n_blank(lines, [f'## {path.name}\n'])
wrap_1_raw(lines, f'<noweb name="{file_gsnym_pair[1]}">\n')
with open(file_gsnym_pair[0]) as f:
try:
inlines = f.readlines()
except UnicodeDecodeError as e:
pprint(f'... SKIPPING UNDECODABLE FILE {path}')
return
pprint(f'DETANGLING file {path}')
bound = [] ## Really want the monadic bind, here.
if language == "markdown":
indent_4(bound, inlines)
else:
wrap_triple_backtick(bound, inlines, language)
wrap_n_blank(lines, bound)
wrap_1_raw(lines, '</noweb>\n')
lines.append(BLANK_LINE)
BEGIN_RAW = '<!-- #raw -->\n'
END_RAW = '<!-- #endraw -->\n'
def wrap_1_raw(lines: List[str], s: str) -> None:
lines.append(BEGIN_RAW)
lines.append(s)
lines.append(END_RAW)
BLANK_LINE = '\n'
def wrap_n_blank(lines: List[str], ss: List[str]) -> None:
lines.append(BLANK_LINE)
for s in ss:
lines.append(s)
lines.append(BLANK_LINE)
def wrap_triple_backtick(lines: List[str],
ss: List[str],
language: str) -> None:
lines.append(f'```{language}\n')
for s in ss:
lines.append(s)
lines.append(f'```\n')
def indent_4(lines: List[str], ss: List[str]):
for s in ss:
lines.append(' ' + s)
def write_tangle_to_lines(lines: List[str],
file_gsnym_pair: Tuple[str],
language: str) -> List[str]:
wrap_1_raw(lines, f'<tangle file="{file_gsnym_pair[0]}">\n')
bound = []
wrap_triple_backtick(bound,
[f'<block name="{file_gsnym_pair[1]}"></block>\n'],
language)
wrap_n_blank(lines, bound)
wrap_1_raw(lines, f'</tangle>\n')
Test the whole magillah, the up direction. You may have to backpatch some 'language' names when you open the markdown, but 'language' only affects syntax coloring.
<block name="tangleup-files-list"></block>
<block name="tangleup-write-noweb-to-lines"></block>
<block name="tangleup-write-tangle-to-lines"></block>
def tangleup_overwrite_markdown(
output_markdown_filename: str,
input_directory: str,
title: str = "Untitled") -> None:
pprint(f'WRITING LITERATE MARKDOWN to file {output_markdown_filename}')
file_gsnym_pairs = files_list(input_directory)
lines: List[str] = [f'# {title}\n\n']
for pair in file_gsnym_pairs:
p = Path(pair[0])
if p.suffix == '.clj':
language = f'clojure id={uuid.uuid4()}'
elif p.suffix == '.py':
language = f'python id={uuid.uuid4()}'
elif p.suffix == '.md':
language = 'markdown'
else:
language = ''
write_noweb_to_lines(lines, pair, language)
write_tangle_to_lines(lines, pair, language)
import json
with open(output_markdown_filename, "w") as f:
for line in lines:
f.write(line)
pass
If there are no changes to the tangled files on disk, then we must merely reassemble the nowebs, tangles, and block tags from the files on disk. On its first pass, Tangledown recorded the structure of nowebs and tangles and of the Markdown that surrounds them. When detangling a file:
PRO TIP: For the Tangldown Kernel, if your little scripts contain noweb tags, surround them with tangle to /dev/null
, reload the kernel spec, restart the kernel, then you can run them in the notebook.
from pprint import pprint
from tangledown_trace_foobar_md import sequential_structure as cells
pprint(cells)
fn = "tanglup_foobar.md"
line_no = 0
for cell in cells:
if cell["kind"] == "between":
<block name="tangleup_write_between"></block>
elif cell["kind"] == "noweb":
<block name="tangleup_write_noweb"></block>
elif cell["kind"] == "tangle":
<block name="tangleup_write_tangle"></block>
else:
assert False, f"unknown kind: {cell['kind']}"
pass
pass
pass
Run these at the console for now.
<block name="tangleup-overwrite-markdown"></block>
if __name__ == "__main__":
tangleup_overwrite_markdown(
"asr_tangleup_test.md",
"./examples",
title="This is a First Test of the Emergency Tangleup System")
<block name="tangleup-overwrite-markdown"></block>
if __name__ == "__main__":
tangleup_overwrite_markdown(
"tangleup-test.md",
".",
title="This is a Second Test of the Emergency Tangleup System")
If you change the code in this README.md and you want to test it by running the cell in Section Tangle It, Already!, you usually must restart whatever Jupyter kernel you're running because Jupytext caches code. If things continue to not make sense, try restarting the notebook server. It rarely but occasionally produces incorrect answers for more obscure reasons.
The Tangledown kernel is OPTIONAL, but nice. Everything I talked about so far works fine without it, but the Tangledown Kernel lets you evaluate Jupytext notebook cells that have block
tags in them. For example, you can run Tangledown on Tangledown itself in this notebook just by evaluating the cell that contains all of Tangledown, including the source for the kernel, here.
The Tangledown Compiler writes the full path of the current Markdown file corresponding to the current notebook to fixed place in the home directory, and the Tangledown Kernel reads gets all the nowebs from there.
If you run more than one instance of the Tangledown Kernel at one time on your machine, you must RETANGLE THE FILE AND RESTART THE TANGLEDOWN KERNEL WHEN YOU SWITCH NOTEBOOKS because the name of the current file is a fixed singleton. The Tangledown Kernel has no way to dynamically know what file you're working with. Sorry about that!
After you tangle the code out of this here README.md at least once, you will have two new files
./tangledown_kernel/tangledown_kernel.py
./tangledown_kernel/kernel.json
You must inform Jupyter about your new kernel. The following works for me on the Mac. It might be different on your machine:
jupyter kernelspec install --user tangledown_kernel
You must put the source for the Tangledown Kernel somewhere Python can find it before you start Jupyter Lab. One way is to modify the PYTHONPATH
environment variable. The following works for me on the Mac:
PYTHONPATH=".:/Users/brian/Library/Jupyter/kernels/tangledown_kernel" jupyter lab
Once the kernel is installed, there are multiple ways to run it in Jupyter Lab. When you first open a notebook, you get a menu. The default is the regular Python 3 kernel, and it works fine, but you won't be able to run cells that have block
tags in them. If you choose the Tangledown Kernel, you can run such cells.
If you modify the kernel:
Most of the time, you don't have to restart Jupyter Lab itself, but sometimes after a really bad bug, you might have to.
Adapted from these official docs.
The kernel calls expand_tangles
after reformatting the lines a little. We learned about the reformatting by experiment. We explain expand_tangles
here in the section about Tangledown itself. The rest of this is boilerplate from the official kernel documentation. There is no point, by the way, in running the cell below in any kernel. It's meant for the Jupyterlab startup engine, only. You just need to tangle it out and install it, as above.
NOTE: You will get errors if you run this cell in the notebook.
TODO: plumb a Tracer through here?
<block name="kernel-imports">
class TangledownKernel(IPythonKernel):
<block name="kernel-instance-variables">
async def do_execute(self, code, silent, store_history=True, user_expressions=None,
allow_stdin=False):
if not silent:
cleaned_lines = [line + '\n' for line in code.split('\n')]
# HERE'S THE BEEF!
expanded_code = expand_tangles(None, [cleaned_lines], self.nowebs)
reply_content = await super().do_execute(
expanded_code, silent, store_history, user_expressions)
stream_content = {
'name': 'stdout',
'text': reply_content,
}
self.send_response(self.iopub_socket, 'stream', stream_content)
return {'status': 'ok',
# The base class increments the execution count
'execution_count': self.execution_count,
'payload': [],
'user_expressions': {},
}
if __name__ == '__main__':
from ipykernel.kernelapp import IPKernelApp
IPKernelApp.launch_instance(kernel_class=TangledownKernel)
from ipykernel.ipkernel import IPythonKernel
from pprint import pprint
import sys # for version_info
from pathlib import Path
from tangledown import \
accumulate_lines, \
get_lines, \
expand_tangles
These get indented on expansion because the block
tag is indented. You could do it the other way: indent the code here and DON'T indent the block tag, but that would be ugly, wouldn't it?
Notice this kernel runs Tangledown on the full file path that's stored in current_victim_file.txt
. That file path got written to that special place when you tangled the file the first time. This may explain why you must tangle the file once and then restart the kernel whenever you switch notebooks that are running the Tangledown Kernel.
current_victim_filepath = ""
with open(Path.home() / '.tangledown/current_victim_file.txt') as v:
fp = v.read()
tracer_, nowebs, tangles_ = accumulate_lines(*get_lines(fp))
implementation = 'Tangledown'
implementation_version = '1.0'
language = 'no-op'
language_version = '0.1'
language_info = { # for syntax coloring
"name": "python",
"version": sys.version.split()[0],
"mimetype": "text/x-python",
"codemirror_mode": {"name": "ipython", "version": sys.version_info[0]},
"pygments_lexer": "ipython%d" % 3,
"nbconvert_exporter": "python",
"file_extension": ".py",
}
banner = "Tangledown kernel - expanding 'block' tags"
{"argv":["python","-m","tangledown_kernel", "-f", "{connection_file}"],
"display_name":"Tangledown"
}