percollate

A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.

MIT License

Downloads
339
Stars
4.1K
Committers
18

Bot releases are visible (Hide)

percollate -

Published by danburzo over 3 years ago

Bug fixes

  • Fixes the rendering of non-utf-8-encoded web pages (#118, #119, thanks @yashha!).
percollate -

Published by danburzo over 3 years ago

Bugfixes

  • Fixes the error thrown when converting multiple URLs to PDF without providing an explicit --author (#120, thanks @yashha).
percollate - v1.2.0

Published by danburzo almost 4 years ago

New features

Hyphenation

Added support for hyphenation with Hyphenopoly, based on the explicit document language or best guess with franc. Hyphenation is enabled by default for PDF, and disabled for EPUB and HTML. The --hyphenate and --no-hyphenate flags let you explicitly opt in or out of the feature.

Thanks @yashha for this feature!

Bug fixes

  • Wrap <pre> elements in <figure> elements to make sure Readability doesn't strip them out (#66);
  • await for the EPUB file to be fully generated in the epub() call (Thanks @pascalw!).

Other

  • Added README section on community-maintained packages; linked to nodejs-percollate AUR package (Thanks @pedrolucasp!)
  • Upgraded dependencies to their respective latest versions.
percollate -

Published by danburzo about 4 years ago

Note: I botched v1.1.1 — never release at midnight!

percollate -

Published by danburzo about 4 years ago

This release sees a few new features and improvements from @yashha:

  • We now use pdf-lib to add a title and author to the PDF metadata (#88);
  • Added an --author option to the CLI to use for populating PDF and EPUB metadata (#104);
  • Prevent headings from appearing by themselves at the end of a page through some clever CSS (#110).
percollate -

Published by danburzo about 4 years ago

  • Escaped ampersands in EPUB metadata;
  • Added percollate User Agent string to all requests;
  • When using the --debug flag, log fetches for EPUB resources (images, etc.)
percollate -

Published by danburzo about 4 years ago

Functionally identical to 0.8.2, but declaring an official 1.0.0 release!

percollate -

Published by danburzo about 4 years ago

Bug fixes:

  • EPUB: Fixes a regression introduced in v0.8.0 due to DOMPurify serializing the content as HTML rather than XHTML.
  • EPUB: Make process more resilient to errors in fetching remote resources
  • Ignore URLs that don't point to HTML files
percollate -

Published by danburzo about 4 years ago

Bump the required Node version to 10.18.1 as needed by the puppeteer package. (#101)

percollate -

Published by danburzo about 4 years ago

  • De-vendorize Readability, use the freshly-minted @mozilla/readability npm package instead; (thanks @gijsk!)
  • Sanitize the metadata extracted with Readability with DOMPurify
  • Make slugify() stricter about which characters to leave in the file name
  • Configure Puppeteer to produced tagged PDF files (See #47)
  • Accept the file:// protocol, and absolute/relative paths to files on disk (See #34)
  • Make JSON-LD extraction handle more cases
  • Log, but don't break on, invalid srcset attributes
percollate -

Published by danburzo about 4 years ago

CLI changes:

  • the --cover flag is enabled implicitly when using the --title option or when bundling more than one item; disable the cover page with --no-cover;
  • the --toc flag is enabled implicitly when bundling more than one item; disable the ToC with --no-toc;

Improvements:

  • When bundling multiple items, ignore failing conversions and bundle the rest of the items
  • Readability: prefer JSON-LD Schema.org metadata (#72)
  • Fixes GitHub heading anchors (#49)
  • Lazy loaded images: support more common scenarios (attributes using the data-lazy- prefix)
percollate -

Published by danburzo about 4 years ago

Adds support for a cover page (#44) using the --cover flag.

percollate -

Published by danburzo about 4 years ago

  • Use the value of the --output option as a prefix for file names when converting multiple files with the --individual flag.
  • Switched fetch implementation from got to the smaller node-fetch package.
percollate -

Published by danburzo about 4 years ago

You can now generate (basic) EPUB and HTML with the epub and html commands, respectively.

Added the ability to read HTML content from stdin with the - operand; you can pass the original URL with the -u / --url option.

percollate -

Published by danburzo about 4 years ago

Enhancement: expand <details> elements.

Chore: upgrade dependencies to their latest version.

percollate - Percollate 0.3.0

Published by danburzo almost 6 years ago

0.3.0

Prefer the AMP version of an article, if available. Details here.

Support for lazy-loaded images. (#71)

Increased Puppeteer navigation timeout to 2 minutes. (#80, thanks @butu5!). Also added a --debug flag to print more information about the process.

Fixed URL encoding before fetching it. (#83, thanks @ncsing!)

Generate a Table of Contents page (#81, thanks @guybedo!) when using the --toc option.