article-extractor | Node.js Ecosystem Directory

Bot releases are visible (Hide)

article-extractor - v7.2.10

Published by ndaidong over 1 year ago

Fix issue #331
Update dependencies
Remove unnecessary watermark

article-extractor - v7.2.9

Published by ndaidong over 1 year ago

Fix issue #329
Update dependencies
Improve unit test

article-extractor - v7.2.8

Published by ndaidong almost 2 years ago

Expose new API method extractFromHtml()
Update dependencies
Change coding style (remove standardjs)

Related issues: #321, #326

article-extractor - v7.2.7

Published by ndaidong almost 2 years ago

Update dependencies
Fix CI issues
Update docs & links

article-extractor - v7.2.6 - Change name

Published by ndaidong almost 2 years ago

Change package name from article-parser to @extractus/article-extractor
Move to new organization Extractus

article-extractor - v7.2.5

Published by ndaidong almost 2 years ago

Update dependencies
Improve meta data extraction
Add security policy

article-extractor - v7.2.4

Published by ndaidong about 2 years ago

Improve space/newline processing
- no longer remove all linebreaks but multi empty lines are stripped
- similar to spaces, muti spaces will be replaced with single space

article-extractor - v7.2.3

Published by ndaidong about 2 years ago

Optimize performance

By removing HTML validation step, we increased the performance to about 4x - 5x faster.

Before, article-parser checks if the extract's input is URL or valid HTML to decide next step.
Now when receiving the input, if that isn't URL, it assumes that's a HTML string and start extracting immediately.

v7.2.2 - Before

v7.2.3 - After

article-extractor - v7.2.2

Published by ndaidong about 2 years ago

Add options to extract method
- Replace global config with on-request parserOptions
- Add new param fetchOptions to extract() method
  - Allow to pass request to proxy
Remove unnecessary dependencies for reduce bundle size
Fix problem while building esm version for browser
Add demo for running on browser

article-extractor - v7.2.1

Published by ndaidong about 2 years ago

Use external string-similarity
Improve fetch control
Update build script
Fix typo error on example packages

article-extractor - v7.2.0

Published by ndaidong about 2 years ago

Refactor some parts to run on deno, bun and tsnode
- Use internal string-similarity file to by pass bun.js resolve error
- Stop depending on urlpattern-polyfill to by pass deno/bun error
  - Replace URLPattern syntax with regular RegExp
Add some examples for each platform
Remove rarely used configuration methods

article-extractor - v7.1.0

Published by ndaidong about 2 years ago

The first step to get it work on deno and bun environment

Replace axios with cross-fetch
Remove 4 API methods relating to axios and htmlcrush

article-extractor - v7.0.3

Published by ndaidong about 2 years ago

Update dependencies
Remove depending on tldts
Use conditional exports
Improve pre-defined options

article-extractor - v7.0.2

Published by ndaidong about 2 years ago

Update dependencies
Add button "Deploy to Deta"
Use Deta service for example faas
Copy types definition to cjs dist (#287)

article-extractor - v7.0.1

Published by ndaidong about 2 years ago

Fix potential logic error while generating description
Update dependencies

article-extractor - v7.0.0

Published by ndaidong about 2 years ago

Release v7.0.0 for production.

This version use transformations instead of queryRules. The missing of pre-defined rules may break some article sources. But you can easily fix there problems with a little knowledge about DOM manipulation. After that, transformations can help you improve extraction result in a completely new way.

article-extractor - v7.0.0rc4

Published by ndaidong over 2 years ago

Use tldts to get domain, used this value as source (for a consistent format)
- with domain as source, you can access to its favicon with https://www.google.com/s2/favicons?domain={DOMAIN.TLD}
Increase description length, tend to take summary from content, remove unneccessary parts