parse5

HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.

MIT License

Downloads
245.7M
Stars
3.6K
Committers
39

Bot releases are visible (Hide)

parse5 - Latest Release

Published by fb55 almost 2 years ago

This release includes parse5 and parse5-parser-stream.

Full Changelog: https://github.com/inikulin/parse5/compare/v7.1.0...v7.1.2

parse5 - v7.1.0

Published by fb55 about 2 years ago

This release is only for the parse5 module.

What's Changed

New Contributors

Full Changelog: https://github.com/inikulin/parse5/compare/v7.0.0...v7.1.0

parse5 -

Published by fb55 over 2 years ago

Welcome to [email protected]! ✨ This is a huge release with many changes, features and fixes.

From an organisational perspective, the most important change is that parse5 is now maintained by a team, consisting of James (@43081j), Titus (@wooorm) and me (@fb55). We come from three projects that rely on parse5 — namely Cheerio, rehype, and Lit.

We need your support to continue the project! If you care about parse5, please support us financially on OpenCollective.

Headlining features of this release are ES Modules, TypeScript, and performance improvements: 7.0.0 is 45% faster than 6.0.1 with default options, and 167% faster with location information enabled (for the bench/perf benchmark, on an M1 Mac). Version 7.0.0 is a revamp of every part of the library. There are too many changes to list them all here, so here is a high-level overview:

Breaking: ESM

All of parse5’s packages are now ECMAScript Modules. We are providing dual packages for parse5 and parse5-htmlparser2-tree-adapter for now (see https://github.com/inikulin/parse5/pull/418 and https://github.com/inikulin/parse5/pull/496).

To migrate, please read this Gist on how to update. Note that private internals are no longer available; instead, everything that you need should be imported from the main package.

Implemented by @43081j in #351

Breaking: TypeScript

The codebase has been ported to TypeScript. This helped uncover a number of subtle logic bugs, such as dc4e269022ebbae0767d8f790a29d6be1835fe1e, b4b5d4ad6f90b3c9fd03a90e2ed5267929979a11, or a0aff9578bb44511bc169c1d7f9e2f2780f7f8a0. TypeScript also helps us refactor with confidence and a lot of the changes in this release would have been much harder to do without it.

To migrate, please remove @types/parse5* as we now ship our own types.

Implemented by @fb55 in #362

Potentially breaking changes

If you are using deep imports for any parts of the codebase, you will likely encounter some breakages:

Other changes

  • minor add hooks for stack events to tree adapter interface #385
  • minor add support for fragments in parse5-parser-stream #487
  • minor add serializeOuter (like .outerHTML), scriptingEnabled option #383
  • patch fix parsing of << in comments parsed wrongly as <! (#326)
  • patch fix position of endTag for mixed-case foreign elements (#353)
  • patch fix end position of html, body (#436)
  • docs: parse5 has a new documentation website at parse5.js.org #443

New Contributors

Thanks @anko, @TrySound, @samouri, @alan-agius4, and @pmdartus!

Full Changelog: https://github.com/inikulin/parse5/compare/v6.0.1...v7.0.0

parse5 -

Published by fb55 almost 3 years ago

  • Added (breaking): Tree adapter interface now has updateNodeSourceCodeLocation method which enables usage of custom location info formats (GH #314) (by @DMartens).
parse5 -

Published by fb55 almost 3 years ago

  • Fixed: Handling of self-closing <hr> tags (by @43081j).
  • Fixed: Broken link in TreeAdapter document (GH #317) (by @ursm).
  • Fixed: SAXParser example (GH #316) (by @mvasilkov).
parse5 - v5.1.1

Published by inikulin almost 5 years ago

  • Fixed: Serialization of attributes in non-standard namespaces (by @Zirro).
  • Fixed: Quirks and limited-quirks mode detection by doctype (by @squidfunk).
parse5 - v5.1.0

Published by RReverser about 6 years ago

  • Fixed: Location info for text events in SAXParser and RewritingStream now contains
    correct endCol and endLine covering all concatenated raw tokens (GH #266).
  • Fixed: SAXParser and RewritingStream now flush last buffered chunk when calling .end() with
    no parameters (GH #271).
  • Updated (breaking): ParserStream, SAXParser and RewritingStream no longer assume that
    each binary chunk is a valid finished UTF-8 chunk, and instead accept only decoded strings (GH #269).
parse5 - v5.0.0

Published by inikulin over 6 years ago

Starting from this release parse5 functionality will be shipped in separate packages. With parse5 package contatining only basic functionality. Please, refer to the list of packages for more info.

  • Updated (breaking): source code location now inserted by tree adapter, so tree adapter developers have control over location info property name. Tree adapters should implement setNodeSourceCodeLocation and getNodeSourceCodeLocation methods. Location info property name added by currently implemented tree adapters has been renamed from __location to sourceCodeLocation (GH #189).

  • Updated (breaking): Location info line and col properties have been renamed to startLine and
    startCol
    respectively.

  • Updated (breaking): SAXParser now passes token objects to event handlers instead of separate arguments. See SAXParser documentation for more info.
    (GH #247).

  • Added: endLine and endCol location info
    properties.

  • Added: scriptingEnabled flag to the ParserOptions which controls how <noscript> tags are handled by the parser. (GH #192).

  • Added: HTML rewriting stream.
    (GH #222).

  • Removed (breaking): parse5 no longer ship TypeScript definitions. Existing TypeScript definitions have been moved to DefinitelyTyped repo. Please, track the PR in the DefinitelyTyped repo for the updates.

parse5 - v4.0.0

Published by inikulin almost 7 years ago

This is a major release that delivers few minor (but breaking) changes to workaround recently appeared issues with TypeScript Node.js typings versioning and usage of parse5 in environments that are distinct from Node.js (see https://github.com/inikulin/parse5/issues/235 for the details).

  • Updated (breaking): TypeScript were disabled by default. See TypeScript definitions section for the details on how to enable them.
  • Updated: API that depends on Node.js specific (namely ParserStream, PlainTextConversionStream, SerializerStream, SAXParser) is now lazily loaded. That enables bundling of the basic functionality for other platforms (e.g. for browsers via webpack).
parse5 - v3.0.3

Published by inikulin almost 7 years ago

  • Fixed: Loosen the dependency version of @types/node (by @gfx).
  • Fixed: Incorrect AST generated if empty string fed to ParserStream (GH #195) (by @stevenvachon).
parse5 - v3.0.2

Published by inikulin over 7 years ago

  • Fixed: location.startTag is not available if end tag is missing (GH #181).
parse5 - v3.0.1

Published by inikulin almost 8 years ago

  • Fixed: MarkupData.Location.col description in TypeScript definition file (GH #170).
parse5 - v3.0.0

Published by inikulin almost 8 years ago

  • Added: parse5 now ships with TypeScript definitions from which new documentation website is generated (GH #125).
  • Added: PlainTextConversionStream (GH #135).
  • Updated: Significantly reduced initial memory consumption (GH #52).
  • Updated (breaking): Added support for limited quirks mode. document.quirksMode property was replaced with document.mode property which can have
    'no-quirks', 'quirks' and 'limited-quirks' values. Tree adapter setQuirksMode and isQuirksMode methods were replaced with setDocumentMode and getDocumentMode methods (GH #83).
  • Updated (breaking): AST collections (e.g. attributes dictionary) don't have prototype anymore (GH #119).
  • Updated (breaking): Doctype now always serialized as <!DOCTYPE html> as per spec (GH #137).
  • Fixed: Incorrect line for __location.endTag when the start tag contains newlines (GH #166) (by @webdesus).
parse5 - v2.2.3

Published by inikulin about 8 years ago

  • Fixed: Fixed incorrect LocationInfo.endOffset for non-implicitly closed elements (refix for GH #109) (by @wooorm).
parse5 - v2.2.2

Published by inikulin about 8 years ago

  • Fixed: Incorrect location info for text in SAXParser (GH #153).
  • Fixed: Incorrect LocationInfo.endOffset for implicitly closed <p> element (GH #109).
  • Fixed: Infinite input data buffering in streaming parsers. Now parsers try to not buffer more than 64K of input data. However, there are still some edge cases left that will lead to significant memory consumption, but they are quite exotic and extremely rare in the wild (GH #102, GH #130)
parse5 - v2.2.1

Published by inikulin about 8 years ago

  • Fixed: SAXParser HTML integration point handling for adjustable SVG tags.
  • Fixed: SAXParser now adjust SVG tag names for end tags.
  • Fixed: Location info line calculation on tokenizer character unconsumption (by @ChadKillingsworth).
parse5 - v2.2.0

Published by inikulin about 8 years ago

  • SAXParser (by @RReverser)

  • Latest spec changes

    • Updated: <isindex> now don't have special handling (GH #122).
    • Updated: Adoption agency algorithm now preserves lexical order of text nodes (GH #129).
    • Updated: <menuitem> now behaves like <option>.
  • Fixed: Element nesting corrections now take namespaces into consideration.

parse5 - v2.1.5

Published by inikulin over 8 years ago

  • Fixed: ParserStream accidentally hangs up on scripts (GH #101).
parse5 - v2.1.4

Published by inikulin over 8 years ago

  • Fixed: Keep ParserStream sync for the inline scripts (GH #98 follow up).
parse5 - v2.1.3

Published by inikulin over 8 years ago

  • Fixed: Synchronously calling resume() leads to crash (GH #98).