htmlparser2

The fast & forgiving HTML and XML parser

MIT License

Downloads
135M
Stars
4.3K
Committers
66

Bot releases are hidden (Show)

htmlparser2 - Latest Release

Published by fb55 10 months ago

Fixes

  • Fixed onattribend's endIndex (#1540 by @DimaIT)
  • Treat textarea as special tag (#1719 by @DimaIT)

Features

  • Export QuoteType (#1543 by @DimaIT) and Handler interface (#1690 by @benkroeger)
htmlparser2 - v9.0.0

Published by fb55 over 1 year ago

Breaking Changes

  • The tokenizer now uses the EntityDecoder from the entities module https://github.com/fb55/htmlparser2/pull/1480
    • Parsing of entities in attributes is now aligned with the HTML spec, and some inputs will produce different results. Eg. in <a href='&amp=boo'> the attribute value won't be modified any more.
    • The ontextentity tokenizer callback now has an endIndex argument; if you use the tokenizer directly, make sure indices are still the same.
  • Stacks inside the parser have been reversed. https://github.com/fb55/htmlparser2/pull/1511

Features

Full Changelog: https://github.com/fb55/htmlparser2/compare/v8.0.2...v9.0.0

htmlparser2 - v8.0.2

Published by fb55 over 1 year ago

Bug Fixes

Other changes

New Contributors

Full Changelog: https://github.com/fb55/htmlparser2/compare/v8.0.1...v8.0.2

htmlparser2 -

Published by fb55 over 2 years ago

  • Added missing WritableStream export in the package.json 6923fca

https://github.com/fb55/htmlparser2/compare/v8.0.0...v8.0.1

htmlparser2 - v8.0.0

Published by fb55 over 2 years ago

Breaking

Features

Other changes

New Contributors

Full Changelog: https://github.com/fb55/htmlparser2/compare/v7.2.0...v8.0.0

htmlparser2 - v7.2.0

Published by fb55 almost 3 years ago

What's Changed

Fixes:

Docs

Refactors:

The refactors lead to a combined ~5% speed-up.

New Contributors

Full Changelog: https://github.com/fb55/htmlparser2/compare/v7.1.2...v7.2.0

htmlparser2 -

Published by fb55 about 3 years ago

  • Fix indices of self-closing tags in XML (#949, reported in #941) 3287ef2
  • Bump domhandler from 4.2.0 to 4.2.2 (#935) 45b2cfe

https://github.com/fb55/htmlparser2/compare/v7.1.1...v7.1.2

htmlparser2 -

Published by fb55 about 3 years ago

  • Fixed a bug where implied close tags would be misreported (#933) 903fb43
  • Fixed endIndex of text events being off by 1 (#932) 78ef1b7

https://github.com/fb55/htmlparser2/compare/v7.1.0...v7.1.1

htmlparser2 -

Published by fb55 about 3 years ago

Features:

  • Added an isImplied flag to the onopentag/onclosetag events (#930) f917004
  • It is now possible to get indices for attributes (#929) 28c162b

Fixes:

  • [email protected] changed how indices were computed. Unfortunately, a lot of edge-cases weren't handled correctly. This version fixes this.
    • refactor: Fix how indices are computed, add attrib indices (#929) 28c162b
    • fix(parser): Fix indices for end, CDATA, add indices to tests (#928) 4e25252
    • fix(parser): Don't override position for implied opening tags (#917) fac221d
    • fix(parser): Index of closing tag was misaligned (#913) 04c411c
  • .pause would lead to data being wrongfully discarded (#927) 78af88d
  • The tokenizer would still emit some data after an error (#923) 08b2040
  • Issue in foreign content: The tag name foreignObject will always be lowercased in HTML e852205

Refactors:

  • refactor(feeds): Move getFeed to domutils (#931) f10dc03
  • refactor(tokenizer): Use explicit empty buffer if we have reached the end 9c30fe6
  • chore(tests): Add test for error without a listener 0eb0067
  • chore(tests): Use proxies to collect events (#920) a2b0bf3
  • chore(tests): Move stream tests into WritableStream.spec (#916) da67eba
  • refactor(tokenizer): Remove unused branches, improve test coverage (#914) a2eae51
  • docs(readme): Update benchmark results d45fc82

https://github.com/fb55/htmlparser2/compare/v7.0.0...v7.1.0

htmlparser2 -

Published by fb55 about 3 years ago

[email protected] changes a lot of internals, resulting in an 20% overall performance improvement in AndreasMadsen's htmlparser-benchmark.

Breaking changes:

  • Fixed how start & end index positions are calculated (#910) 5ab080e
    • Some indices, especially end indices, will now have changed. Most importantly, end indices will now always be greater or equal than start indices (whoops!).

Features:

  • Added an isVoidElement method to the parser (#785) 00ce57a

Refactors:

  • Use a trie to decode HTML & XML entities in the tokenizer (#863) 9a47a55
    • Leads to large speed-ups when dealing with entities.
  • Iterate over char codes in the tokenizer (#894) f5aed75
    • Improved tokenizer performance by ~40%.
  • Use Map for openImpliesClose in the parser (#911) 39a8109
  • Moved logic of FeedHandler to a function (#912) 3a672ff
htmlparser2 -

Published by fb55 over 3 years ago

Features:

  • Export tokenizer callback interface from main module (#751) ab0b3fc f59473a

Fixes:

  • Allow XML tags to start with any character (#778) 0b94ab5

Upgrades:

  • Bump domhandler from 4.0.0 to 4.1.0 e64e8e5
  • Bump domelementtype from 2.1.0 to 2.2.0 8bc1719
  • Bump domutils from 2.4.4 to 2.5.2 8b91d97 cf77476 7c233de

https://github.com/fb55/htmlparser2/compare/v6.0.1...v6.1.0

htmlparser2 -

Published by fb55 over 3 years ago

  • Fix parsing special closing tags (#746) 214ab08
    • Thanks to @BenoitZugmeyer for the report (#745)!

https://github.com/fb55/htmlparser2/compare/v6.0.0...v6.0.1

htmlparser2 -

Published by fb55 almost 4 years ago

Breaking:

  • Bump domhandler, domutils 4dd4233 0d278fd
    • The new version of domhandler now comes with an actual root element for the document. This might break tests in a few cases. See the domhandler release notes for more details.
  • Make some private properties actually private 1c71e60

Features:

  • Add a parseDocument method 4653f23
    • This returns the root node of the document, instead of an array of the first nodes. You likely want to use this instead of the now deprecated getDOM method.
  • Improve docs df7ea98 1ce1d3b 0437d9c

Minor:

  • FeedHandler: Slightly restructure code b6b4382

https://github.com/fb55/htmlparser2/compare/v5.0.1...v6.0.0

htmlparser2 -

Published by fb55 almost 4 years ago

  • Fix: Parse entities in <title> tags (#614, #615 by @billneff79) 3295a8b
  • Fix: Remove @types/node as a peer dependency 1ace384

Also pulls in a new version of the entities module, which features more compact entity maps.

htmlparser2 -

Published by fb55 about 4 years ago

Breaking changes:

  • Default the decodeEntities option to true 8ac01e0
  • Removes underscores in front of many private properties & methods. 6e296d2
  • Removes EVENTS, WritableStream and CollectingHandler exports from module import. The latter two are still part of the module, but now have to be imported explicitly. 6e296d2
  • The parser no longer extends EventEmitter f30f13c
  • HTML <title> tag content is now processed as text (#483 by @billneff79) 0189e56

Features:

  • Add media content parsing to FeedHandler (#560 by @gcandal) a85e4e0
  • Expose the quotes that were used in the onattribute event 3c86256
  • Add "sideEffects: false" to package.json (#474 by @ericjeney) d90dd64
  • Explain stream usage in README (#446 by @mnmkng) 4c0fba8

Bug Fixes:

  • Properly back out of numeric entities, decode entities in attributes (fixes #276) eaf2872
  • Fix broken parsing after self-closing special tags (#515 by @warriordog) 4ec596f
  • Fix parse bug when tag name is not ASCII alpha (#497 by @Zuckjet) bc010de

Diverse:

  • Improve Coverage (#540 by @brettz9) 6d8a2ff
  • Check missing elem with getOneElement (#543 by @brettz9) 1cf297e
  • Add test for #125 40d9556

Thanks to everyone that contributed to this release!

Commit Range:
https://github.com/fb55/htmlparser2/compare/v4.1.0...v5.0.0

htmlparser2 -

Published by fb55 over 4 years ago

  • Don't fail when parsing <__proto__> (fixes #387)
  • Add types field to package.json
  • Update dependencies
htmlparser2 -

Published by fb55 about 5 years ago

  • Port to TypeScript, Jest
  • Remove the Stream and ProxyHandler exports
  • Order some conditionals in Tokenizer by their likelihood to be hit
  • Fix implicit closing of certain tags — @voithos
  • Fix: options.Tokenizer modified outer scope — @thorn0
htmlparser2 -

Published by fb55 about 11 years ago

Package Rankings
Top 7.44% on Bower.io
Top 9.85% on Repo1.maven.org
Top 0.55% on Npmjs.org
Top 7.56% on Proxy.golang.org
Badges
Extracted from project README
NPM version Downloads Node.js CI Coverage