node-website-scraper | JavaScript Ecosystem Directory

Bot releases are hidden (Show)

node-website-scraper - v3.3.0

Published by s0ph1e about 7 years ago

2ec63543065d1b4d2db91dc944e90856538c19c7 - Add requestConcurrency option

node-website-scraper - v3.2.0

Published by s0ph1e about 7 years ago

40c2cec71b3d87428694af39603d09d00beac96f - add media subdirectory
0eb38295966d9944bcdf643429efe8b9d0726e72 move to organization

node-website-scraper - v3.1.0

Published by s0ph1e over 7 years ago

874e5606462bc2bf312cdea220c72801e092c84a add option updateMissingSources

node-website-scraper - v3.0.0

Published by s0ph1e over 7 years ago

2d7a193060caf744e3dcd21e645b6e19cca76cc7 Add domain directory in bySiteStructure filenameGenerator

Breaking changes in bySiteStructure filename generator

Before (no domain directory in generated path)

/ => DIRECTORY/index.html
/about => DIRECTORY/about/index.html
https://another-site.com/about => DIRECTORY/about/index.html

Now (add domain directory to generated path)

/ => DIRECTORY/example.com/index.html
/about => DIRECTORY/example.com/about/index.html
https://another-site.com/about => DIRECTORY/another-site.com/about/index.html

node-website-scraper - v2.4.1

Published by s0ph1e over 7 years ago

04e95ce54ca8ce5a9802019d655f4d48d9f02707 Remove SRI check for loaded resources

node-website-scraper - v2.4.0

Published by s0ph1e over 7 years ago

13b4e527a5c88167ccbe97ae747773fb09122907 - Add maxRecursiveDepth option
7f57b303dfe740eb6e88b7fee2713b37f1160ae7 - Download more resources by default (add frame, iframe)

node-website-scraper - v2.3.0

Published by s0ph1e over 7 years ago

3b8a025a5c2c11f5b7093d123a8f9d933893714d - add custom resource saver
cbc9f00ec08bc228498b75284b20502fee8cd7a9 - download more resources by default (add ogp, audio, video tags)

node-website-scraper - v2.2.2

Published by s0ph1e over 7 years ago

6ea1d6ce721c8118f36811a43b6cd6fde7a0e675 - Add info about website-scraper-phantom to Readme

node-website-scraper - v2.2.1

Published by aivus over 7 years ago

4c848d8316b368e0ba353ea6c7ba5f093c5f0449 - Fix ignoring different string cases in extensions
65c45d26bf5f067fa57cc95f4b1186a0f585a8e5 - Export default options

node-website-scraper - v2.2.0

Published by s0ph1e over 7 years ago

52fc13ea8856cf8c8c707e1ea6627bc81e08390c - add custom httpResponseHandler
4b92006242d49d23dd7d99187d3240bde6d5c7e4 - add onResourceSaved and onResourceError callbacks
423792d059d471c10a2f145fdfd3017729eca8d1 - update readme structure
c2fb99d3369fb4fea8620f40d668b245bbcc2628 - update author name in license and package.json

node-website-scraper - v2.1.1

Published by s0ph1e over 7 years ago

c28bf7c5633723a105365ba8c70c887dfdc8dee3 - Support picture srcset attr, save .webp files to images dir

node-website-scraper - v2.1.0

Published by s0ph1e over 7 years ago

b2a18e3770875d0a6f1f07ea2c1f8b4e232e0862 decode url-based filenames
caf894e2661e80627ed488fe430c6d2f50b55905 correctly handle errors which occur on downloading main urls
1af32749268a820ec2a11d76a658b531a2e2dfd3 refactor code duplicates

node-website-scraper - v2.0.0

Published by s0ph1e over 7 years ago

Breaking changes

3aacd8e9d74a270de06927b04ff03207d96c161c - drop nodejs < 4 support
23e76c0e49f4419e8de066bdab71062a84f788b4 - export function instead of object
c10a4d2cc321cb358825a2a21b8020bb04007e20 - rework css handling
3e698b8ae06054abf03a405ce8185b7441044264 - use mime-types to determine resource's type
bfb55d7 - rename assets to children in result object
40043ba - ignore errors by default

Non-breaking changes

7600e64549ac524ab0f1b8e30f79792d75c7fba3 - correctly handle different URI-schemas (mailto:, skype:, etc.)
3bb47eaab4af7ec9f278aa96d429be83acf8832b - handle svg external links by default
a2b5c56 - send referer in each request
a2b5c56 - decode html entities in url found in html resource

Migrate from v1.*

Call exported function instead of scrape method. Example:

// old usage
var scraper = require('website-scraper');
scraper.scrape({/*options*/});

// new usage
var scrape = require('website-scraper');
scrape({/*options*/});

Css text will be handled by default but if you use source option and want to keep previous behavior need to add next objects to sources.

{ selector: 'style' },
{ selector: '[style]', attr: 'style' }

node-website-scraper - v1.2.3

Published by s0ph1e almost 8 years ago

8b094bc69265592014543ec052193767d8c1a84d - update debug version (prevent using broken [email protected])

node-website-scraper - v1.2.2

Published by s0ph1e almost 8 years ago

589deb6c9162c04f355c9257743170969b2105b6 Fix error handling (#151)

node-website-scraper - v1.2.1

Published by s0ph1e almost 8 years ago

a8546f9cd0a19bb2cac72b18c00953b0cd71f066 - fix error for urls with emails (#140)

node-website-scraper - v1.2.0

Published by s0ph1e about 8 years ago

137413ac79fc317f05b1d2242912910531a9c552 - add ignoreErrors option
001c231db2d307157b8eb5353aeac48e110b9d68 - fix too long filename error, shorten filename if needed
70d02256977511bc644b0e3fd4bb671df5bb2d1e - add logger
eeb09148ede3736fcdec5075c899924d0efa00e2 - save woff2 to fonts directory by defaults