Alfred Workflow to extract annotations from PDF files.
MIT License
A Workflow for Alfred to extract annotations as Markdown file. Primarily for scientific papers, but can also be used for non-academic PDF files.
Automatically determines correct page numbers, inserts them as Pandoc citations, merges highlights across page breaks, prepends a YAML header with bibliographic information, and more.
pdfannots2json
by running the following command into your terminal:brew install mgmeyers/pdfannots2json/pdfannots2json
PDF Annotation Extractor
works on any PDF that has valid annotations
saved in the PDF file. Some PDF readers like Skim or Zotero 6 do not
store annotations in the PDF itself by default.
This workflow automatically determines the citekey of based on the filename of your PDF file.
PDF Annotation Extractor
PDF Annotation Extractor
extracts the annotations without@
)._
).{citekey}_{title}.pdf
. It MUST NOT be followed by anythingGrieser2023_Interdependent Technologies.pdf
, theGrieser2023
.[!TIP] You can achieve such a filename pattern with automatic renaming rules of most reference managers, for example with the ZotFile plugin for Zotero or the AutoFile feature of BibDesk.
Use the hotkey to
trigger the Annotation Extraction on the PDF file currently selected in Finder.
The hotkey also works when triggered from PDF Expert
or Highlights. Alternatively, use the
anno
keyword to search for PDFs and select one.
Annotation Types extracted
Reminders.app
as a task due today in the default listInstead of the PDF page numbers, this workflow retrieves information about the real page numbers from the BibTeX library and inserts them. If there is no page data in the BibTeX entry (for example, monographies), you are prompted to enter the page number manually.
1
often occurs later in the PDF. If-10
, you enter the value -10
when prompted for a pageInsert the following codes at the beginning of an annotation to invoke special actions on that annotation. Annotation codes do not apply to strikethroughs.
+
: Merge this highlight with the previous highlight or underline. Works for? foo
(free comments): Turns "foo" into a Question> ![QUESTION]
) and move up. (Callouts are Obsidian-specific##
: Turns highlighted text into a heading that is added at that#
determines the heading level. If the annotation is#
is used as heading instead. (The#
required).=
: Adds highlighted text as tags to the YAML frontmatter. If the=
. In both cases, the annotation is removed afterward._
: A copy of the annotation is sent Reminders.app
as a task due today[!TIP] You can run the Alfred command
acode
to display a cheat sheet of all annotation codes.
attachments
sub-folder of the output{citekey}_image{n}.png
.![[ ]]
syntax, for![[filename.png|foobar]]
rectangle
type annotation in the PDF is extracted as image.pdfannots2json
by runningbrew upgrade pdfannots2json
in your terminal.[!NOTE] As a fallback, you can use
pdfannots
as extraction engine, as a different PDF engine sometimes fixes issues. This requires installing pdfannots viapip3 install pdfannots
, and switching the fallback engine in the settings. Note thatpdfannots
does not support image extraction and the extraction quality is slightly worse, so generally you want to usepdfannots2json
.
If you want to mention this software project in an academic publication, please cite it as:
Grieser, C. (2023). PDF Annotation Extractor [Computer software].
https://github.com/chrisgrieser/pdf-annotation-extractor-alfred
For other citation styles, use the following metadata:
pdfannots
In my day job, I am a sociologist studying the social mechanisms underlying the digital economy. For my PhD project, I investigate the governance of the app economy and how software ecosystems manage the tension between innovation and compatibility. If you are interested in this subject, feel free to get in touch.