graby

Graby helps you extract article content from web pages

MIT License

Downloads
282.7K
Stars
365
Committers
21

Bot releases are hidden (Show)

graby - 1.8.0

Published by j0k3r over 7 years ago

This release add support for headers being return to the client. This allow the client to have all headers related to the url they want content from.
Also, since we retrieve all headers we also add support to find the language using headers if none of others methods worked.

All changes:

  • Blogpost now redirect to https #95
  • Return all headers #97
  • Retrieve language from header #98
graby - 1.7.1

Published by j0k3r over 7 years ago

  • Fix multiple authors #93
graby - 1.7.0

Published by j0k3r over 7 years ago

  • Add support of date from site config #68
  • Add support of author from site config #89
  • Fix iframe removal when using xss_fitler parameter #91
graby - 1.6.2

Published by j0k3r over 7 years ago

  • Use parseContent method for parsing PDFs #87
graby - 1.6.1

Published by j0k3r over 7 years ago

  • Fixed xss-filter part which was useless #82
graby - 1.6.0

Published by j0k3r over 7 years ago

  • Add ability to add pre/post filters to Readability #76
  • Add support for native_ad_clue #77
  • Use wallabag/tcpdf #80
graby - 1.5.4

Published by j0k3r over 7 years ago

  • Fix urlencoded header #73
graby - 1.5.3

Published by j0k3r over 7 years ago

  • Fix importNode error in some case #72
graby - 1.5.2

Published by j0k3r almost 8 years ago

  • Make preview image absolute #70
graby - 1.5.1

Published by j0k3r almost 8 years ago

  • Update regex for html IE conditional #66
graby - 1.5.0

Published by j0k3r almost 8 years ago

  • Clean IE conditional comments #63
  • Add support of http_header property in site config #61 #65
  • Add support of referer from site config #64

Thanks to @Kdecherf for his work on the header part ! 🤗

graby - 1.4.5

Published by j0k3r almost 8 years ago

  • Support lazy-load image from Vice #58
  • Share ConfigBuilder between classes #59 (should eat less memory)
  • Remove forceutf8 (and fix SinglePage issue) #60
graby - 1.4.4

Published by j0k3r about 8 years ago

  • Fix UTF16 content #57
graby - 1.4.3

Published by j0k3r about 8 years ago

  • Minor CS #52
  • Handle ZIP files #55
  • Handle a new img lazyload #56
graby - 1.4.2

Published by j0k3r about 8 years ago

  • Fix tests without tidy #43
  • Fix spaces in url #51
graby - 1.4.1

Published by j0k3r over 8 years ago

  • Revert #45, need to find an other solution ...
graby - 1.4.0

Published by j0k3r over 8 years ago

  • Add support for accent in host & path #46
  • Avoid sending encoded query string #45
  • Avoid data:image in open graph data #47
graby - 1.3.0

Published by j0k3r over 8 years ago

  • Add a default title when no title are found
  • Fix a non-object in some case
  • Fingerprints now handle simple quote
graby - 1.2.0

Published by j0k3r over 8 years ago

  • add timeout (default 10s) on each request
  • add max redirect (default to 10)
graby - 1.1.0

Published by j0k3r almost 9 years ago

  • allows customization of the guzzle client's cookies handling (see #34)
  • add login settings support (see #35) in prevision of handling content that needs user to be logged in
  • some refactoring & cleanup
Package Rankings
Top 2.11% on Packagist.org
Badges
Extracted from project README
Join the chat at https://gitter.im/j0k3r/graby Coverage Status Total Downloads License