This commandline tool exports data from selected Notion pages and databases into YAML and markdown files. Internally, it converts the Notion pages into a Pandoc AST, which enables fine-grained customization of the conversion process.
We use it at Innolitics to generate pages for our website, thus allowing us to use Notion as a content management system. We also use it to generate PDFs and Word Documents from Notion pages.
Install first pandoc
and mermaid
CLIs. Please note that n2y
has only been tested with pandoc 2.19.2
and mermaid-cli 9.4.0
. This pandoc link will take you to their 2.19.2
github releases page. This mermaid link will take you to their github page where you will find installation instructions. If using npm
to install mermaid
, append @9.4.0
to the npm
command to install that specific version.
Finally, install n2y
:
pip install n2y
Before you'll be able to export any content from your Notion account you'll first need to give n2y permission to access the pages. You'll need to be an admin.
To do this, go to the "Settings and Members" page in Notion. You should see an "Integrations" option in the side bar. Click the link that says "Develop your own integrations" and follow the instructions on the page. Copy the "Internal Integration Token" into the NOTION_ACCESS_TOKEN
environment variable.
Finally, in Notion you'll need to share the relevant pages with your internal integration---just like you'd share a page with another person.
N2y is configured using a single YAML file. This file contains a few top-level keys:
Top-level key | Description |
---|---|
media_url | Sets the base URL for all downloaded media files (e.g., images, videos, PDFs, etc.) |
media_root | The directory where media files should be downloaded to |
exports | A list of export configuration items, indicating how a notion page or database is to be exported. See below for the keys. |
export_defaults | Default values for the export configuration items. |
The export configuration items may contain the following keys:
Export key | Description |
---|---|
id | The notion database or page id, taken from the "share URL". |
node_type | Either "database_as_yaml", "database_as_files", or "page". |
output | The path the output file, or directory, where the data will be written. |
pandoc_format | The pandoc format that we're generating. |
pandoc_options | A list of strings that are writer options for pandoc. |
content_property | When set, it indicates the property name that will contain the content of the notion pages in that databse. If set to None , then only the page's properties will be included in the export. (Only applies to the database_as_files node type.) |
yaml_front_matter | Only used when exporting to text files. Indicates if the page properties should be exported as yaml front matter. |
id_property | When set, this indicates the property name in which to place the page's underlying notion ID. |
url_property | When set, this indicates the property name in which to place the page's underlying notion url. |
filename_template | This key is only used for the "database_as_files" node type; when set, it provides a format string that is evaluated against the page's properties to generate the file name. Note that the filenames are sanitized. When not set, the title property is used and the extension is deduced from the pandoc_format . A special "TITLE" property may be used to access the title property in the template string. |
plugins | A list of python modules to use as plugins. |
notion_filter | A notion filter object to be applied to the database. |
notion_sorts | A notion sorts object to be applied to the database. |
property_map | A mapping between the name of properties in Notion, and the name of the properties in the exported files. Set the new value to null to discard the property. |
The command is run using n2y configuration.yaml
.
A notion database (e.g., with a share URL like this https://www.notion.so/176fa24d4b7f4256877e60a1035b45a4?v=130ffd3224fd4512871bb45dbceaa7b2) could be exported into a YAML file using this minimal configuration file:
exports:
- id: 176fa24d4b7f4256877e60a1035b45a4
node_type: database_as_yaml
output: database.yml
The same database could be exported into a set of markdown files as follows:
exports:
- id: 176fa24d4b7f4256877e60a1035b45a4
node_type: database_as_files
output: directory
filename_template: "{Name}.md"
Each page in the database will generate a single markdown file, named according to the filename_template
. This process will automatically skip pages whose "Name" property is empty.
An individual notion page (e.g., with a share URL like this https://www.notion.so/All-Blocks-Test-Page-5f18c7d7eda44986ae7d938a12817cc0) could be exported to markdown with this minimal configuration file:
exports:
- id: 5f18c7d7eda44986ae7d938a12817cc0
node_type: page
output: page.md
Sometimes it is useful to ensure that a root Notion page, and it's child-pages, don't contain links to any notion pages outside the hierarchy. The n2yaudit
tool can be used to audit a page hierarchy for any of these links.
n2yaudit PAGE_LINK
This example shows how you can use the export_defaults
property to avoid duplicated configuration between export items. It also shows now you can use notion filters to export pages from the same database into two different directories.
media_root: "media"
media_url: "./media/"
export_defaults:
plugins:
- "n2y.plugins.mermaid"
- "n2y.plugins.rawcodeblocks"
- "n2y.plugins.removecallouts"
- "n2y.plugins.deepheaders"
- "n2y.plugins.expandlinktopages"
content_property: null
id_property: id
url_property: url
exports:
- output: "documents/dhf"
node_type: "database_as_files"
filename_template: "{Name}.md"
id: e24f839e724848d69342d43c07cb5f3e
plugins:
- "n2y.plugins.mermaid"
- "n2y.plugins.rawcodeblocks"
- "n2y.plugins.removecallouts"
- "n2y.plugins.deepheaders"
- "n2y.plugins.expandlinktopages"
- "plugins.page"
- "plugins.idmentions"
notion_filter:
property: "Tags"
multi_select: { "contains": "DHF" }
- output: "documents/510k"
id: e24f839e724848d69342d43c07cb5f3e
filename_template: "{Name}.md"
node_type: "database_as_files"
plugins:
- "n2y.plugins.mermaid"
- "n2y.plugins.rawcodeblocks"
- "n2y.plugins.removecallouts"
- "n2y.plugins.deepheaders"
- "n2y.plugins.expandlinktopages"
- "plugins.page"
- "plugins.idmentions"
notion_filter:
property: "Tags"
multi_select: { "contains": "510(k)" }
- output: "data/Roles.yml"
id: b47a694953714222810152736d9dc66c
node_type: "database_as_yaml"
content_property: "Description"
- output: "data/Glossary.yml"
id: df6bef74e2372118becd93e321de2c69
node_type: "database_as_yaml"
At the core of n2y are a set of python classes that represent the various parts of a Notion workspace:
Notion Object Type | Description |
---|---|
Page | Represents a Notion page (which may or may not be in a database) |
Database | A Notion database, which can also be though of as a set of Notion pages with some structured meta data, or properties |
Property | A type descriptor for a property (or column) in a Notion database |
PropertyValue | A particular value that a particular page in database has for a particular Property |
Block | A bit of content within a Page |
RichTextArray | A sequence of formatted text in Notion; present in many blocks and property values |
RichText | A segment of text with the same styling |
Mention | A reference to another Notion object (e.g., a page, database, block, user, etc. ) |
User | A notion user; used in property values and in page, block, and database metadata |
File | A file |
Emoji | An emoji |
The Property
, PropertyValue
, Block
, RichText
, and Mention
classes have subclasses that represent the various subtypes. E.g., there is a ParagraphBlock
that represents paragraph.
These classes are responsible for converting the Notion data into pandoc abstract syntax tree objects. We use a python wrapper library that makes it easier to work with pandoc's AST. See here for details. See the Notion API documentation for details about their data structures.
The default implementation of these classes can be modified using a plugin system. To create a plugin, follow these steps:
to_pandoc
method as desiredplugins
property in your export config to the module name (e.g., n2y.plugins.deepheaders
)See the builtin plugins for examples.
You can use multiple plugins. If two plugins provide classes for the same notion object, then the last one that was loaded will be instantiated first.
Often you'll want to use a different class only in certain situations. For example, you may want to use a different Page class with its own unique behavior only for pages in a particular database. To accomplish this you can use the n2y.errors.UseNextClass
exception. If your plugin class raise the n2y.errors.UseNextClass
exception in its constructor, then n2y will move on to the next class (which may be the builtin class if only one plugin was used).
You may use different plugins for different export items, but keep in mind that the plugin module is imported only once. Also, if you export the same Page
or Database
multiple times with different plugins, due to an internal cache, the plugins that were enabled during the first run will be used.
Here are the default block classes that can be extended:
Class Name | Noteworthy Behavior |
---|---|
BookmarkBlock | Converts visual bookmark into plain text link in markdown, using the caption as the link text. |
BreadcrumbBlock | These blocks are ignored |
BulletedListItemBlock | |
CalloutBlock | The content of the callout block is extracted, but the emoji and background color are ignored. |
ChildDatabaseBlock | These blocks are ignored |
ChildPageBlock | These blocks are ignored |
ColumnBlock | |
ColumnListBlock | Converts into a table where each column is such. |
DividerBlock | |
EmbedBlock | These blocks are ignored |
EquationBlock | Converted to "display math" using LaTeX; see the pandoc documentation. |
FencedCodeBlock | |
FileBlock | Acts the same way as the ImageBlock, except that in the documents it only ever shows the URL. |
HeadingOneBlock | |
HeadingTwoBlock | |
HeadingThreeBlock | |
ImageBlock | It uses the URL for external images, but downloads uploaded images to the MEDIA_ROOT and replaces the path with a relative url based off of MEDIA_URL . The "caption" is used for the alt text. |
LinkToPageBlock | Transcribes the block into a plain text link |
NumberedListItemBlock | |
ParagraphBlock | |
PdfBlock | Acts the same way as the Image block |
QuoteBlock | |
RowBlock | |
SyncedBlock | Transcribe the contents of the synced block at the time it was constructed |
TableBlock | |
TableOfContentsBlock | These blocks are ignored |
TemplateBlock | These blocks are ignored |
ToDoItemBlock | |
ToggleBlock | Convert the toggles into a bulleted list. |
VideoBlock | Acts the same way as the Image block |
Most of the Notion blocks can generate their pandoc AST from only their own data. The one exception is the list item blocks; pandoc, unlike Notion, has an encompassing node in the AST for the entire list. The ListItemBlock.list_to_pandoc
class method is responsible for generating this top-level node.
N2y provides a few builtin plugins. These plugins are all turned off by default. Brief descriptions are provided below, but see the code for details.
CodeBlocks whose caption begins with "{jinja=pandocformat}" will be rendered using Jinja into text and then read into the AST using the specified pandoc input format. Note that, other than the special "plain" format, the pandocformat must be available both for reading and writing.
Any databases that are mentioned in the codeblock's caption will be made available in the jinja render context within the databases
dictionary.
Take the following code block for example:
<table>
<tr><th>Name</th><th>Email</th></tr>
{% for person in databases["People"] %}
<tr><td>{{person.Name}}</td><td>{{person.Email}}</td></tr>
{% endfor %}
</table>
The caption would be, "{jinja=html} @People," where "Name" and "Email" are properties in the "People" database.
Note that any rich text properties are rendered into the pandoc input format specified.
The codeblock's parent page's properties are made available in the page
context variable.
The jinja context has a few special filters available to it:
Name | Type | Description |
---|---|---|
join_to |
filter | Make it possible to join notion databases together. |
first_pass_output |
object | Makes it possible to access the full text of the initial page's render. Useful if you want to render a glossary of terms that are used on the page. |
See the code for details.
Notion only support three levels of headers, but sometimes this is not enough. This plugin enables support for h4 and h5 headers in the documents exported from Notion. Any Notion h3 whose text begins with the characters "= " is converted to an h4, and any h3 that begins with "== " is converted to an h5, and so on.
Completely remove all callout blocks. It's often helpful to include help text in callout blocks, but usually this help text should be stripped out of the final generated documents.
Any code block whose caption begins with "{=pandocformat}" will be made into a raw block for pandoc to parse. This is useful if you need to drop into Raw HTML or other output formats that pandoc supports. See the pandoc documentation for more details on the raw code blocks.
Adds support for generating mermaid diagrams from codeblocks with the "mermaid" language, as supported in the Notion UI.
This plugin assumes that the mmdc
mermaid commandline tool is available, and will throw an exception if it is not.
If there are errors with the mermaid syntax, it is treated as a normal codeblock and the warning is logged.
Replace headers with links back to the originating notion block.
Adds support for Pandoc-style footnotes. Any text
rich texts that contain footnote references in the format [^NUMBER]
(eg: ...some claim [^2].
) will be linked to the corresponding footnote paragraph block starting with [NUMBER]:
(eg: [2]: This is a footnote.
).
Adds general footnote support (not specialized for markdown) with the expectation that all footnote content is placed in an inline database on the original page wherein footnote references are made. Specific footnote references are made with Notion's page mention feature, mentioning a specific footnote page in the database. For example, each page in the database can simply be titled with a footnote number and contain the footnote text as the page content. The inline database must have a title that ends with "Footnotes."
When this plugin is enabled, any "link to page" block (which can be created using the /link
command in the Notion UI), will be replaced with the content of the page that is linked to. This makes it possible to use the "link to page" block to include repeated content in multiple locations. It is like a "synced content block" in this way, but unlike "synced content blocks" which don't play well when duplicating child pages, the "link to page" blocks can be duplicated more easily.
Note that any link to a page that the integration doesn't have access to will be skipped entirely (Notion returns an "Unsupported Block" in this case).
When this plugin is enabled, any "blue" colored toggle block will have it's children directly rendered. The text on the toggle itself will be ignored.
An n2y run is divided into four stages:
config.py
)Page
, Block
, RichText
, etc.)block.to_pandoc()
)export.py
)Every page object has a parent
property, which may be a page, a database, or a workspace.
Every block has a page
property which refers to the page that contains it.
Every rich text object and mention has a block
property that is either refers
back to the block that contains it, or is None if the rich text is within a
property value or some other location.
During development, it can be convenient to cache requests to Notion. This feature can be enabled by setting the N2Y_CACHE
environment variable to the location of the sqlite file you'd like to use as your Notion request cache. To flush the cache, simply delete the file. Note that this feature is only available if you include dev options.
Any git commit tagged with a string starting with "v" will automatically be pushed to pypi.
Be sure to update the setup.py version number first.
You can create the tag using, e.g., git tag v0.3.4
and you can push it using git push && git push --tags
.
Before pushing such commits, be sure to update the change log below.
To work on the repository, clone the git repo. From within the repo (and ideally within a python virtual environment), install n2y
from the local copy, including the dev dependencies:
pip install -e '.[dev]'
You can then run the tests and the linter as follows:
flake8 .
pytest tests
n2y
is built on top of pandoc
, using it to build intermediate representations and for output generation. Being familiar with the pandoc AST is important as an n2y
developer. Documentation is available for the native Haskell AST and for the corresponding Python wrapper library used directly by n2y
.
As part of a typical development cycle, you can use the pandoc
CLI to inspect the relationship between, for example, GitHub-flavored Markdown and the native pandoc
AST. The following CLI invocation of pandoc
takes in the GitHub-flavored Markdown file "example.md", converts it to the pandoc
AST in JSON format, and pipes this output to the command line jq
JSON processor program for viewing in the terminal:
pandoc -f gfm -t json example.md | jq .
The following command performs a full round-trip from GitHub-flavored Markdown to the pandoc
AST and back to GitHub-flavored Markdown:
pandoc -f gfm -t json example.md | pandoc -f json -t gfm
Here are some features we're planning to add in the future:
_get_child_blocks
to the ExpandingLinkToPageBlock
in the expandlinktopages.py
Client.get_child_blocks
inside of the to_pandoc
method.page
argument to the self.page
of the ExpandingLinkToPageBlock
instance,ExpandingLinkToPageBlock
and not the linked page itself.UniqueIdProperty
class to represent database properties of the "unique_id" typeUniqueIdPropertyValue
class to represent page property values of the "unique_id" typeAPIErrorCode
class so that the in
keyword can be used on it and to have two newinternallinks.py
plugin that adds a resolver for internal linksConnectionThrottled
exception inherit the HTTPResponseError
exception and update tests104
error status code from the list of error status codes that initate a retry in theretry_api_call
wrapper function.Client.retry
attribute to determine wether or not API calls should be retried afterretry_api_call
wrapper function.ConnectionThrottled
exceptionn2y.logger
module, pass it as an argument whereverrich_text.py
pandoc_options
setting to wherever pandoc.write
is called for consistency. Also,Client().page_class_is_in_use
to Client().class_is_in_use
Client().get_notion_user
in favor of Client().wrap_notion_user
request_id
to the list of keys that are not to be copied inClient()._copy_notion_database_child_page
request_id
to the list of keys that are not to be copied inClient()._copy_notion_database_child_database
mock_comment
to notion_mocks.py
downloadfileproperty.py
to plugins which downloads the files in each file property value toutils.py
property_map
optionfilename_property
configuration option with the more generic filename_template
.jinjarenderpage
so that it no pulls databases mentioned in thejinjarenderpage
plugin to render jinja in code blocksClient.append_child_notion_blocks()
get_children()
method to databases and pages in order to update Database._children
and Page._children
manuallyClient.copy_notion_database_children()
which allows users to copy a list of children (pages) into another databaseClient.append_child_notion_blocks()
(it now copies database children the appended child_databases)init.py
file must have been deleted).Client.append_child_notion_blocks()
had a typo that is now fixed. It also limits the amount of children sent into the Notion API to be appended to 100 at a time, per their parametersBlock
class to allow for the updating of children lists after children have been appended.child_page
and child_database
block types.Block
class objects is now stored by them in the Block.notion_data
attribute and what used to be stored there (the notion type data) is now stored in a property called Block.notion_type_data
pandoc_format
and pandoc_options
fields, making it possible to output to any format that pandoc supports.property_map
option--url-property
.n2y.plugins.footnotes
pluginn2yaudit
tool.id_property
commandline argument--database-config
propertyn2y.plugins.removecallouts
plugin