A Python package for creating scientific and engineering documents via pandoc including inline-executable Python code.
MIT License
A python package for creating scientific and engineering documents
via pandoc
including inline-executable Python code.
This is inspired by pweave
,
codebraid
,
knitr
, and cousins, but I always seemed to have
to do some pre/post-processing to get things the way I want them. I already use
other pandoc filters (e.g. pandoc-citeproc, pandoc-crossref), so why not simply
have another pandoc filter that will execute inline code and insert the results?
Another key is getting quality diagrams from scientific python code. For example, pweave automatically inserts generated images, but there doesn't seem to be a way to get SVG images without, again, pre- and post-processing in another script. SVG plots are, obviously, scalable and work much better for web and PDF outputs.
Use poetry
for local environment management.
After cloning the repository:
$ cd <project-repo>
$ poetry install
$ poetry shell
To package and release:
$ poetry build
$ poetry publish
Be sure to configure your credentials prior to publishing.
See also this page.
An example Pandoc markdown file can be found in example
. To process this
file, you need to have pandoc
installed and in your
path. You also need to install the Pandoc filters
pandoc-crossref and
pandoc-citeproc which provide nice
cross-referencing and reference/bibliography handling.
When working with macOS or Linux or
Linux on Windows via WSL
pandoc
and the filters can be installed via Homebrew. (On
Linux/WSL, install linuxbrew.) Then
simply run:
$ brew install pandoc
$ brew install pandoc-crossref
$ brew install pandoc-citeproc
$ brew install librsvg
Then, of course, you need to install this filter and some other helpers for the example. The example helpers can be installed into your Python virtual environment by running:
$ poetry install -E examples
To set up an environment for Windows from scratch including terminals, editors,
Python, etc., see
this gist.
Additional installation steps to use this library include installing pandoc
and additional filters and utilities.
Install pandoc
by downloading the installer
and following the standard instructions. This should also get you
pandoc-citeproc.exe
for managing citations.
Install pandoc-crossref
(for managing intra-document cross-references) by
downloading the zipped
Windows release. Unzip it, and move pandoc-crossref.exe
to a location that is
on your system path. For example, you can move to next to pandoc-citeproc.exe
in C:\Program Files\Pandoc
.
Finally, to handle embedding SVG images in PDF documents, this library relies on
rsvg-convert
. This can be installed via
Chocolatey. Install the Chocolatey package manager
if you do not already have it, and then run:
$ choco install rsvg-convert
Instead of (or in addition to) Chocolately, you can also install the
Scoop installer. Scoop does not currently have a formula
for rsvg-convert
, but it can also be installed from
SourceForge
if you do not want to use Chocolatey.
The underlying Pandoc filter for executing Python code embedded in your documents relies on inter-process communication with a Python REPL behind the scenes. The default inter-process character encoding for Python on Windows is CP-1252, and this can cause problems if your Python scripts generate output with special characters (and if you are doing any scientific or engineering writing, they definitely will).
Fortunately, this is easily worked-around by setting a Windows environment
variable PYTHONIOENCODING
to utf-8
. After setting this, be sure to restart
any open terminal windows for the change to take effect.
If you use matplotlib
for generating plots in inline Python code in your
document, you should explicity set the Agg
backend early in your document (see
the example/example.md
in this repo). Without this, document conversion can
hang when the svg_figure
helper function is called.
Somewhere near the top of your Markdown document, add an executable Python code
block (without .echo
so it won't appear in the output) that includes:
import matplotlib
matplotlib.use('Agg')
This plugin relies on the
panflute
Python package as a
bridge between Python and pandoc
's Haskell. The panflute
README
lists API compatibility requirements between versions of panflute
and versions
of pandoc
. Double-check this if you run into errors that mention panflute
when compiling a document.
If you are running an older version of pandoc
(e.g. 2.9.2) and start a new
project, you will need to explicitly install the compatible panflute
version
in your environment with e.g. poetry add [email protected]
. Or
alternatively install a pandoc
version 2.11.x or later.
To generate PDF files through Pandoc, you need to have xelatex
installed.
On Linux/WSL:
$ sudo apt-get install texlive-xetex
On macOS:
$ brew install --cask mactex
On Windows (without WSL):
Download the MikTeX installer and install as
usual. Then ensure that the binary folder is in your path (e.g.
C:\Users\<username>\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\
). Note
that the first time you generate a document, MikTex will prompt you to install a
lot of packages, so watch for a MikTeX window popping up (possibly behind other
windows) and follow the prompts.
The example templates rely on having a few fonts installed. The fonts to get are the Google Source Sans Pro, Source Code Pro, and Source Serif Pro families.
On macOS or Windows (without WSL), these can simply be downloaded and installed
as you would any other font. On Linux via WSL, you can install these normally
on the Windows side and then synchronize the Windows font folder to the Linux
side. To do this, edit (using sudo
) /etc/fonts/local.conf
and add:
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<dir>/mnt/c/Windows/Fonts</dir>
</fontconfig>
Then update the font cache on the Linux side:
$ sudo fc-cache -fv
The example file uses an HTML template that includes a CSS stylesheet that is generated from SCSS. To compile this automatically, you need to have SASS installed.
On macOS, this can be installed via Homebrew:
$ brew install sass/sass/sass
On macOS/Linux/WSL/Windows it can be installed as a Node.js package (assuming you already have Node.js/npm installed):
$ npm install -g sass
This Python library provides a script, compiledoc
, that will appear in your
poetry
or pipenv
virtual environment's path (or globally) once the library
is installed. In general, you provide an output directory and an input markdown
file, and it will build an HTML output when the --html
flag is used (and also
by default).
$ compiledoc -o output --html mydoc.md
To build a PDF (via xelatex
):
$ compiledoc -o output --pdf mydoc.md
To build a Markdown file with executable Python output included (e.g. for
debugging purposes), specify --md
. This will generate a file in the output
directory with (perhaps confusingly) the same name as the input:
$ compiledoc -o output --md mydoc.md
To build everything, specify --all
:
$ compiledoc -o output --all mydoc.md
To see all available command line options (for specifying templates, paths to required external executables, static files like images and bibliography files, etc.):
$ compiledoc --help
Once everything is setup, compile the example HTML file by running:
$ cd example
$ compiledoc -o output example.md
Open example/output/example.html
in your browser or use e.g. the Live
Server
plugin for VS Code.
To autoregenerate the document (e.g. the HTML version, the output of which is watched by the Live Server ), you can use Watchman.
To create a trigger on a particular directory (doc/
in this example) with a
notebook.md
file (change this to suit your purposes), copy the following into
a temporary trigger.json
file:
[
"trigger",
"doc/",
{
"name": "build_html",
"expression": [
"anyof",
[
"match",
"notebook.md",
"wholename"
]
],
"command": [
"poetry",
"run",
"compiledoc",
"-o",
"output",
"--html",
"notebook.md"
]
}
]
Then from your project root directory run:
watchman -j < trigger.json
rm trigger.json
It is also recommended that you add a .watchmanconfig
file to the watched
directory (e.g. doc/
; also add .watchmanconfig
to your .gitignore
) with
the following contents:
{
"settle": 3000
}
The settle parameter is in milliseconds.
To turn off watchman:
watchman shutdown-server
To turn it back on:
cd <project-root>
watchman watch doc/
To watch the Watchman:
tail -f /usr/local/var/run/watchman/<username>-state/log
(Note that on Windows/WSL, to get tail
to work the way you expect, you need to
add ---disable-inotify
to the command; and yes, that's three -
for some
reason.)
For pandoc
2.9 and earlier, the citation manager pandoc-citeproc
was a
separate filter that gets added to the compliation pipeline. The path to this
filter can be specified on the command line to compiledoc
with the
--pandoc-citeproc PATH
flag.
In newer versions of pandoc
(2.11 and beyond), the citeproc filter is built-in
to pandoc and is run by adding --citeproc
to the pandoc
command-line. The
compiledoc
script adds this by default unless the flag --use-pandoc-citeproc
is added, in which case the older filter will be used.
If you do not with to run citeproc
at all, you can add the flag
compiledoc --no-citeproc
to skip citation processing altogether.