article-extractor

To extract main article from given URL with Node.js

MIT License

Downloads
14.8K
Stars
1.6K
Committers
13

Bot releases are hidden (Show)

article-extractor - v8.0.9 Latest Release

Published by ndaidong 6 months ago

  • Stop using purified HTML to extract content (#388)
  • Update examples & test with pupperteer (#389)
article-extractor - v8.0.8

Published by ndaidong 6 months ago

  • Decode content using detected charset
  • Update dependencies
    • Update eslint config

Related issues: #386, #320

Thanks to the advices from @martinrotter 🤝

article-extractor - v8.0.7

Published by ndaidong 7 months ago

  • Update dependencies

Related issue: #382

article-extractor - v8.0.6

Published by ndaidong 8 months ago

  • Update dependencies
  • Update security email
article-extractor - v8.0.5

Published by ndaidong 9 months ago

  • Fix error while parsing ldjson (#378)
  • Update dependencies
article-extractor - v8.0.4

Published by ndaidong 11 months ago

  • Merge pr #374 by @andremacola (issue #373)
  • Update examples
  • Update dependencies
  • Update CI config
  • Fix function call in evaluation script
article-extractor - v8.0.3

Published by ndaidong about 1 year ago

  • Merge pr #369
  • Update deno example (#368)
  • Stop ci test with node <= 16, this version is in its end-of-life
article-extractor - v8.0.2

Published by ndaidong about 1 year ago

  • Use childNodes instead of children to get the same behaviour as Deno DOM
  • Update dependencies
article-extractor - v7.3.1

Published by ndaidong about 1 year ago

  • Build CJS version and export it for the outdated platforms

Try to fix the issues: #359 #360

article-extractor - v8.0.1

Published by ndaidong about 1 year ago

  • Update dependencies
  • Fix imports section

Related issues: #345 #357

article-extractor - v8.0.0

Published by ndaidong over 1 year ago

  • Add deno.json & import sections
  • Update deps
  • Improve README
article-extractor - v7.3.0

Published by ndaidong over 1 year ago

  • Add support to signal
  • Stop support Node < 15
  • Stop support commonjs version
    • Remove build script
  • Update examples code
  • Update dependencies

Example with signal

import { extract } from '@extractus/article-extractor'

const url = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'

const article = await extract(url, null, {
  signal: AbortSignal.timeout(5000),
})
console.log(article)
article-extractor - v7.2.18

Published by ndaidong over 1 year ago

  • Add test for proxy agent
  • Fix README issue
  • Update dependencies
article-extractor - v7.2.17

Published by ndaidong over 1 year ago

  • Merge pr #350 by @LarchLiu
  • Add agent to fetchOptions
  • Update CI to test with Node 20
  • Update dependencies
  • Update README

Example article extraction via proxy server with agent

import { extract } from '@extractus/article-extractor'

import { HttpsProxyAgent } from 'https-proxy-agent'

const proxy = 'http://abc:[email protected]:31113'

const url = 'https://www.cnbc.com/2022/09/21/what-another-major-rate-hike-by-the-federal-reserve-means-to-you.html'

const article = await extract(url, {}, {
  agent: new HttpsProxyAgent(proxy),
})
console.log('Run article-extractor with proxy:', proxy)
console.log(article)
article-extractor - v7.2.16

Published by ndaidong over 1 year ago

  • Fix issue #347
  • Update dependencies
article-extractor - v7.2.15

Published by ndaidong over 1 year ago

  • Merge with changes from pr #341
  • Fix unsupported package string-similarity
  • Update deps
article-extractor - v7.2.14

Published by ndaidong over 1 year ago

  • Add support parsely meta tags

Maybe it comes from Parse.ly. Our users found that serveral websites such as TheVerge start using the strange meta tags that may break the extraction process. With these non-standard resources, this release should be helpful.

Screenshot from 2023-04-18 08-21-39

article-extractor - v7.2.13

Published by ndaidong over 1 year ago

  • Fix issue while fetching data from some websites (Deno platform only)
article-extractor - v7.2.12

Published by ndaidong over 1 year ago

  • Set default user-agent
  • Avoid error if parserOptions is null
  • Update dependencies
article-extractor - v7.2.11

Published by ndaidong over 1 year ago

  • Merge pr #333 (thanks to @willwashburn)
  • Update dependencies