graby

Graby helps you extract article content from web pages

MIT License

Downloads
282.7K
Stars
365
Committers
21

Bot releases are visible (Hide)

graby - 2.1.1

Published by j0k3r over 4 years ago

  • Lock to PHPUnit 7 #220
  • Handle meta refresh url when attributes are reversed #221
graby - 2.1.0

Published by j0k3r almost 5 years ago

Graby got a logo!

🎉 Thanks to @caneco, Graby has a logo! 🎉

Formatted date

This release include a fix for date. Before, we validated a date and if the date wasn't valid, we return null. Now, if the date is valid, we return it as a W3C formatted date: Y-m-d\TH:i:sP.

List of PR merged

  • Fix tests #213
  • Fix typo #215
  • Convert date to a known format #216
  • Add brand new logo to the project #214
  • JSON LD multiple authors #218
  • Improve readme with logo & credits #217
graby - 2.0.2

Published by j0k3r about 5 years ago

  • Enable inferPrivatePropertyTypeFromConstructor on PHPStan #208
  • Add data-srcset as lazyload attributes #209
graby - 2.0.1

Published by j0k3r over 5 years ago

  • Applied some changes from Full-Text RSS #206
  • Change the way to merge config for find & replace string #207
graby - 2.0.0

Published by j0k3r over 5 years ago

✨ Major changes ✨

The 2.0 started almost 2 years ago on an initiative of @aaa2000 👏 to add support for HTTPlug.

The 1.x was coupled to Guzzle 5 which is quite old now and some people started to complain about how coupled it was. It was obvious that the best solution was to switch to HTTPlug to let the final user decide which HTTP client to use.

The 2.0.0 is tested using these clients:

  • Guzzle 6
  • Guzzle 5
  • cURL

There are many more available HTTP client which should work too.

Other changes are:

  • dropping support PHP < 7.1
  • Tidy extension is now required
  • enable PHPStan level 7
  • open graph & json-ld are fetched first, information from the html overide them

If you want to upgrade, follow these steps.

Finally, huge thanks to @aaa2000 to have started the work 2 years ago. Also thanks to @jtojnar for helping me on that release.

graby - 1.20.1

Published by j0k3r over 5 years ago

  • Logger infos added for JsonLd parsing #203
graby - 1.20.0

Published by j0k3r over 5 years ago

  • Fix tests on multipage (clubic) #202
  • Add support of referrerpolicy for img tags #201
graby - 1.19.1

Published by j0k3r over 5 years ago

  • Rework JsonLd extraction: ignore some objects and some names #196
graby - 2.0.0-alpha.0

Published by j0k3r over 5 years ago

That 2.0 is using HTTPlug 1.0 which means you can use any HTTP client implementation from it.
Also, this is the latest version supporting PHP 5.

The final 2.0.0 will use HTTPlug 2.0 and required PHP >= 7.1.

graby - 1.19.0

Published by j0k3r over 5 years ago

  • Handle "if_page_contains" for "next_page_link" #193
graby - 1.18.1

Published by j0k3r over 5 years ago

  • Debug mode #184
  • Avoid bad date to be send #192
graby - 1.18.0

Published by j0k3r almost 6 years ago

  • Handle "if_page_contains" for "single_page_link" #190
graby - 1.17.0

Published by j0k3r almost 6 years ago

  • Add “Accept” http_header in site_config #188
  • Avoid json-ld date to be an array #189
  • Delete empty lines in retrieved HTML code #187
graby - 1.16.0

Published by j0k3r almost 6 years ago

  • Add ability to reload config files #186
graby - 1.15.5

Published by j0k3r almost 6 years ago

  • New travis infra #179 (Drop HHVM support & add PHP 7.3 to the build)
  • Update iframe regex #178 #183
  • ContentExtractor: fix xpath query with concat/normalize #181
  • Avoid deprecated message to fail the build #182
graby - 1.15.4

Published by j0k3r almost 6 years ago

  • Meta name author auto detection #158
  • Move some functional tests to mocked tests #175
graby - 1.15.3

Published by j0k3r almost 6 years ago

  • Use own solution to parse cookie #174
graby - 1.15.2

Published by j0k3r almost 6 years ago

  • Adding missing cookie parser #173
graby - 1.15.1

Published by j0k3r almost 6 years ago

  • Allow symfony/phpunit-bridge ~4.0 #171
  • Fix cookies injection into request #172
graby - 1.15.0

Published by j0k3r almost 6 years ago

  • Use of name from JsonLd as title #166
  • Add ability to skip getting data from json-ld #170
Package Rankings
Top 2.11% on Packagist.org
Badges
Extracted from project README
Join the chat at https://gitter.im/j0k3r/graby Coverage Status Total Downloads License