HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.
MIT License
Bot releases are hidden (Show)
This release includes parse5
and parse5-parser-stream
.
ERR
as ErrorCodes
by @milahu in https://github.com/inikulin/parse5/pull/704
parser-stream
by @fb55 in https://github.com/inikulin/parse5/pull/716
Full Changelog: https://github.com/inikulin/parse5/compare/v7.1.0...v7.1.2
Published by fb55 about 2 years ago
This release is only for the parse5 module.
</button>
close <p>
by @fb55 in https://github.com/inikulin/parse5/pull/534
Full Changelog: https://github.com/inikulin/parse5/compare/v7.0.0...v7.1.0
Welcome to [email protected]! ✨ This is a huge release with many changes, features and fixes.
From an organisational perspective, the most important change is that parse5 is now maintained by a team, consisting of James (@43081j), Titus (@wooorm) and me (@fb55). We come from three projects that rely on parse5 — namely Cheerio, rehype, and Lit.
We need your support to continue the project! If you care about parse5, please support us financially on OpenCollective.
Headlining features of this release are ES Modules, TypeScript, and performance improvements: 7.0.0 is 45% faster than 6.0.1 with default options, and 167% faster with location information enabled (for the bench/perf
benchmark, on an M1 Mac). Version 7.0.0 is a revamp of every part of the library. There are too many changes to list them all here, so here is a high-level overview:
All of parse5’s packages are now ECMAScript Modules. We are providing dual packages for parse5
and parse5-htmlparser2-tree-adapter
for now (see https://github.com/inikulin/parse5/pull/418 and https://github.com/inikulin/parse5/pull/496).
To migrate, please read this Gist on how to update. Note that private internals are no longer available; instead, everything that you need should be imported from the main package.
Implemented by @43081j in #351
The codebase has been ported to TypeScript. This helped uncover a number of subtle logic bugs, such as dc4e269022ebbae0767d8f790a29d6be1835fe1e, b4b5d4ad6f90b3c9fd03a90e2ed5267929979a11, or a0aff9578bb44511bc169c1d7f9e2f2780f7f8a0. TypeScript also helps us refactor with confidence and a lot of the changes in this release would have been much harder to do without it.
To migrate, please remove @types/parse5*
as we now ship our own types.
Implemented by @fb55 in #362
parse5-serializer-stream
package was removed https://github.com/inikulin/parse5/pull/481
serialize
function exported by parse5
.domhandler
’s node interface (https://github.com/inikulin/parse5/pull/327 by @TrySound)
If you are using deep imports for any parts of the codebase, you will likely encounter some breakages:
5d7a780
(#362)
OpenElementStack
now uses callbacks https://github.com/inikulin/parse5/pull/429
getNextToken
was removed https://github.com/inikulin/parse5/pull/461
_bootstrap
method was removed https://github.com/inikulin/parse5/pull/384
entities
module for encoding and decoding entities, sharing maintenance & optimisation work with projects such as htmlparser2 (2b92054
(#362), https://github.com/inikulin/parse5/pull/486)
entities
adopted a variant of parse5’s approach of decoding entities. As a result, decoding performance is equivalent, while memory consumption is slightly lower.parse5-parser-stream
#487serializeOuter
(like .outerHTML
), scriptingEnabled
option #383<<
in comments parsed wrongly as <!
(#326)endTag
for mixed-case foreign elements (#353)html
, body
(#436)parse5.js.org
#443Thanks @anko, @TrySound, @samouri, @alan-agius4, and @pmdartus!
Full Changelog: https://github.com/inikulin/parse5/compare/v6.0.1...v7.0.0
Published by inikulin almost 5 years ago
Published by RReverser about 6 years ago
text
events in SAXParser
and RewritingStream
now containsendCol
and endLine
covering all concatenated raw tokens (GH #266).SAXParser
and RewritingStream
now flush last buffered chunk when calling .end()
withParserStream
, SAXParser
and RewritingStream
no longer assume thatPublished by inikulin over 6 years ago
Starting from this release parse5
functionality will be shipped in separate packages. With parse5
package contatining only basic functionality. Please, refer to the list of packages for more info.
Updated (breaking): source code location now inserted by tree adapter, so tree adapter developers have control over location info property name. Tree adapters should implement setNodeSourceCodeLocation and getNodeSourceCodeLocation methods. Location info property name added by currently implemented tree adapters has been renamed from __location
to sourceCodeLocation
(GH #189).
Updated (breaking): Location info line
and col
properties have been renamed to startLine and
startCol
respectively.
Updated (breaking): SAXParser
now passes token objects to event handlers instead of separate arguments. See SAXParser documentation for more info.
(GH #247).
Added: scriptingEnabled flag to the ParserOptions
which controls how <noscript>
tags are handled by the parser. (GH #192).
Added: HTML rewriting stream.
(GH #222).
Removed (breaking): parse5
no longer ship TypeScript definitions. Existing TypeScript definitions have been moved to DefinitelyTyped repo. Please, track the PR in the DefinitelyTyped repo for the updates.
Published by inikulin almost 7 years ago
This is a major release that delivers few minor (but breaking) changes to workaround recently appeared issues with TypeScript Node.js typings versioning and usage of parse5 in environments that are distinct from Node.js (see https://github.com/inikulin/parse5/issues/235 for the details).
ParserStream
, PlainTextConversionStream
, SerializerStream
, SAXParser
) is now lazily loaded. That enables bundling of the basic functionality for other platforms (e.g. for browsers via webpack).Published by inikulin almost 8 years ago
document.quirksMode
property was replaced with document.mode
property which can have'no-quirks'
, 'quirks'
and 'limited-quirks'
values. Tree adapter setQuirksMode
and isQuirksMode
methods were replaced with setDocumentMode
and getDocumentMode
methods (GH #83).<!DOCTYPE html>
as per spec (GH #137).__location.endTag
when the start tag contains newlines (GH #166) (by @webdesus).Published by inikulin about 8 years ago
LocationInfo.endOffset
for implicitly closed <p>
element (GH #109).Published by inikulin about 8 years ago
Published by inikulin about 8 years ago
SAXParser (by @RReverser)
\n
in <pre>
, <textarea>
and <listing>
.<image>
.Latest spec changes
Fixed: Element nesting corrections now take namespaces into consideration.