meeseeks

An Elixir library for parsing and extracting data from HTML and XML with CSS or XPath selectors.

MIT License

Downloads
181.1K
Stars
315
Committers
8
meeseeks - v0.17.0 Latest Release

Published by mischov over 1 year ago

Compatibility

  • No longer support Elixir versions under 1.12 or Erlang/OTP versions under 23.0
  • Support Elixir 1.13 and 1.14 and Erlang/OTP 25.0

Enhancements

  • Update to meeseeks_html5ever v0.14.3, which supports NIF precompilation
meeseeks - v0.16.1

Published by mischov about 3 years ago

Compatibility

  • Use meeseeks_html5ever v0.13.1, which supports compilation on Apple M1
meeseeks - v0.16.0

Published by mischov over 3 years ago

Compatibility

  • No longer support Elixir 1.6 or Erlang/OTP 20
  • Support Elixir 1.12 and Erlang/OTP 24
  • Use meeseeks_html5ever v0.13.0, which supports Erlang/OTP 24
meeseeks - v0.15.1

Published by mischov over 4 years ago

Fixes

  • [Select] Support unicode characters in XPath selectors
meeseeks - v0.15.0

Published by mischov over 4 years ago

Compatibility

  • Support Elixir 1.10

Enhancements

  • [Parse] Prevent tuple tree parser from accepting invalid input
  • [Select] Prohibit XPath attributes steps outside of predicates
meeseeks - v0.14.0

Published by mischov about 5 years ago

Breaking

  • [Extract] The private Document.Node behaviour was removed, so any direct use of its callbacks with nodes will be broken
  • [Extract] Comments no longer have spaces added around the content when encoding to HTML, so html output may be slightly different than before
  • [Extract] A space is now only added between nodes by text extractors if the previous sibling's text didn't end in whitespace, so data, own_text, and text output may be slightly different than before

Enhancements

  • [Extract] Refactor extractors, removing the Document.Node behaviour and adding that functionality to modules under Meeseeks.Extractor
  • [Extract] Use iodata in string building extractors instead of string concatenation
  • [Extract] Optimize how whitespace is collapsed by text extractors
  • [Extract] Document which extractors collapse whitespace and make it optional (on by default)

Fixes

  • [Extract] Remove incorrectly added whitespace when encoding comments to HTML
  • [Extract] No longer add space between nodes when extracting text if the previous sibling's text ended in whitespace
meeseeks - v0.13.1

Published by mischov about 5 years ago

Enhancements

  • [Parse] Update to meeseeks_html5ever v0.12.1, which uses a dirty scheduler for the NIF instead of working asynchronously
meeseeks - v0.13.0

Published by mischov about 5 years ago

Compatability

  • No longer support Elixir 1.4, Elixir 1.5, or Erlang/OTP 19 (minumum tested compatiblity is now Elixir 1.6 and Erlang/OTP 20)
  • Support Elixir 1.9 and Erlang/OTP 22

Fixes

  • [Parse] Update to meeseeks_html5ever v0.12.0, which supports Erlang/OTP 22
meeseeks - v0.12.0

Published by mischov about 5 years ago

Breaking

  • [Extract] Meeseeks.html/1 now escapes problematic characters when encoding attribute values and text, so its output may be slightly different than before

Fixes

  • [Extract] Always use double quotes and escape & and " when encoding attribute values with Meeseeks.html/1
  • [Extract] Escape <, >, and & when encoding text with Meeseeks.html/1
meeseeks - v0.11.2

Published by mischov over 5 years ago

Fixes

  • [Select] Support escaped characters in CSS selector names, idents, and strings
  • [Select] Support Elixir-style unicode code points in CSS selector names, idents, and strings
  • [Select] Add better errors when parsing CSS selectors
meeseeks - v0.11.1

Published by mischov over 5 years ago

Deprecations

  • [Parse] Deprecate parsing tuple trees with parse/1

Enhancements

  • [Parse] Add :tuple_tree type to parse/2

Fixes

  • [Parse] Update to meeseeks_html5ever v0.11.1, which returns a better error when provided with non-UTF-8 input
  • [Parse] Return parser errors if parsing an invalid tuple tree
meeseeks - v0.11.0

Published by mischov over 5 years ago

Compatibility

  • No longer support Elixir 1.3 (minimum tested compatibility is now Elixir 1.4 and Erlang/OTP 19.3)
  • Support Elixir 1.8

Enhancements

  • [Parse] Update to meeseeks_html5ever v0.11.0, which is faster and more memory efficient on Erlang/OTP 21
meeseeks - v0.10.1

Published by mischov about 6 years ago

  • [Meta] Test more Elixir+OTP combinations with Travis CI
meeseeks - v0.10.0

Published by mischov over 6 years ago

Fixes

  • [Parse] Update to meeseeks_html5ever v0.10.0, which supports OTP 21
meeseeks - v0.9.5

Published by mischov over 6 years ago

Fixes

  • [Select] Remove optimization in Select.handle_match that could indirectly cause matches stored in the context for filtering to be prematurely cleared
meeseeks - v0.9.4

Published by mischov over 6 years ago

Fixes

  • [Select] Fix error in how context was updated in Select.filter_nodes
  • [Select] Fix error in how context was updated in XPath.Expr.Step.eval
  • [Select] Fix error in how nodes were filtered in XPath.Expr.Step.eval
  • [Select] Include filters when transpiling absolute XPaths to root selectors
meeseeks - v0.9.3

Published by mischov over 6 years ago

Fixes

  • [Parse] Update to meeseeks_html5ever v0.9.0, which resolves a Dialyzer error
meeseeks - v0.9.2

Published by mischov over 6 years ago

Enhancements

  • [Select] The css and xpath macros now accept vars
meeseeks - v0.9.1

Published by mischov over 6 years ago

Fixes

  • [Select] Fix inconsistency in Document.get_nodes/1
  • [Select] Fix bug in Document.get_nodes/2, courtesy of @asonge
  • [Select] Fix various typespecs, courtesy of @asonge
meeseeks - v0.9.0

Published by mischov over 6 years ago

Breaking

  • [Errors] Returned and raised errors throughout the project have been updated to use Meeseeks.Error instead of whatever assorted formats they were using before

Enhancements

  • [Errors] Add Meeseeks.Error, a generic error struct implementing Exception
  • [Select] Add Meeseeks.fetch_all and Meeseeks.fetch_one

Fixes

  • [Extract] Fix bug in Meeseeks.html when encoding element attribute values that contain double quotes