node-website-scraper

Download website to local directory (including all css, images, js, etc.)

MIT License

Downloads
44.5K
Stars
1.6K
Committers
19

Bot releases are hidden (Show)

node-website-scraper - v3.3.0

Published by s0ph1e about 7 years ago

  • 2ec63543065d1b4d2db91dc944e90856538c19c7 - Add requestConcurrency option
node-website-scraper - v3.2.0

Published by s0ph1e about 7 years ago

  • 40c2cec71b3d87428694af39603d09d00beac96f - add media subdirectory
  • 0eb38295966d9944bcdf643429efe8b9d0726e72 move to organization
node-website-scraper - v3.1.0

Published by s0ph1e over 7 years ago

  • 874e5606462bc2bf312cdea220c72801e092c84a add option updateMissingSources
node-website-scraper - v3.0.0

Published by s0ph1e over 7 years ago

  • 2d7a193060caf744e3dcd21e645b6e19cca76cc7 Add domain directory in bySiteStructure filenameGenerator

Breaking changes in bySiteStructure filename generator

Before (no domain directory in generated path)
  • / => DIRECTORY/index.html
  • /about => DIRECTORY/about/index.html
  • https://another-site.com/about => DIRECTORY/about/index.html
Now (add domain directory to generated path)
  • / => DIRECTORY/example.com/index.html
  • /about => DIRECTORY/example.com/about/index.html
  • https://another-site.com/about => DIRECTORY/another-site.com/about/index.html
node-website-scraper - v2.4.1

Published by s0ph1e over 7 years ago

  • 04e95ce54ca8ce5a9802019d655f4d48d9f02707 Remove SRI check for loaded resources
node-website-scraper - v2.4.0

Published by s0ph1e over 7 years ago

  • 13b4e527a5c88167ccbe97ae747773fb09122907 - Add maxRecursiveDepth option
  • 7f57b303dfe740eb6e88b7fee2713b37f1160ae7 - Download more resources by default (add frame, iframe)
node-website-scraper - v2.3.0

Published by s0ph1e over 7 years ago

  • 3b8a025a5c2c11f5b7093d123a8f9d933893714d - add custom resource saver
  • cbc9f00ec08bc228498b75284b20502fee8cd7a9 - download more resources by default (add ogp, audio, video tags)
node-website-scraper - v2.2.2

Published by s0ph1e over 7 years ago

  • 6ea1d6ce721c8118f36811a43b6cd6fde7a0e675 - Add info about website-scraper-phantom to Readme
node-website-scraper - v2.2.1

Published by aivus over 7 years ago

  • 4c848d8316b368e0ba353ea6c7ba5f093c5f0449 - Fix ignoring different string cases in extensions
  • 65c45d26bf5f067fa57cc95f4b1186a0f585a8e5 - Export default options
node-website-scraper - v2.2.0

Published by s0ph1e over 7 years ago

  • 52fc13ea8856cf8c8c707e1ea6627bc81e08390c - add custom httpResponseHandler
  • 4b92006242d49d23dd7d99187d3240bde6d5c7e4 - add onResourceSaved and onResourceError callbacks
  • 423792d059d471c10a2f145fdfd3017729eca8d1 - update readme structure
  • c2fb99d3369fb4fea8620f40d668b245bbcc2628 - update author name in license and package.json
node-website-scraper - v2.1.1

Published by s0ph1e over 7 years ago

  • c28bf7c5633723a105365ba8c70c887dfdc8dee3 - Support picture srcset attr, save .webp files to images dir
node-website-scraper - v2.1.0

Published by s0ph1e over 7 years ago

  • b2a18e3770875d0a6f1f07ea2c1f8b4e232e0862 decode url-based filenames
  • caf894e2661e80627ed488fe430c6d2f50b55905 correctly handle errors which occur on downloading main urls
  • 1af32749268a820ec2a11d76a658b531a2e2dfd3 refactor code duplicates
node-website-scraper - v2.0.0

Published by s0ph1e over 7 years ago

Breaking changes

  • 3aacd8e9d74a270de06927b04ff03207d96c161c - drop nodejs < 4 support
  • 23e76c0e49f4419e8de066bdab71062a84f788b4 - export function instead of object
  • c10a4d2cc321cb358825a2a21b8020bb04007e20 - rework css handling
  • 3e698b8ae06054abf03a405ce8185b7441044264 - use mime-types to determine resource's type
  • bfb55d7 - rename assets to children in result object
  • 40043ba - ignore errors by default

Non-breaking changes

  • 7600e64549ac524ab0f1b8e30f79792d75c7fba3 - correctly handle different URI-schemas (mailto:, skype:, etc.)
  • 3bb47eaab4af7ec9f278aa96d429be83acf8832b - handle svg external links by default
  • a2b5c56 - send referer in each request
  • a2b5c56 - decode html entities in url found in html resource

Migrate from v1.*

  • Call exported function instead of scrape method. Example:
// old usage
var scraper = require('website-scraper');
scraper.scrape({/*options*/});

// new usage
var scrape = require('website-scraper');
scrape({/*options*/});
  • Css text will be handled by default but if you use source option and want to keep previous behavior need to add next objects to sources.
{ selector: 'style' },
{ selector: '[style]', attr: 'style' }
node-website-scraper - v1.2.3

Published by s0ph1e almost 8 years ago

  • 8b094bc69265592014543ec052193767d8c1a84d - update debug version (prevent using broken [email protected])
node-website-scraper - v1.2.2

Published by s0ph1e almost 8 years ago

  • 589deb6c9162c04f355c9257743170969b2105b6 Fix error handling (#151)
node-website-scraper - v1.2.1

Published by s0ph1e almost 8 years ago

  • a8546f9cd0a19bb2cac72b18c00953b0cd71f066 - fix error for urls with emails (#140)
node-website-scraper - v1.2.0

Published by s0ph1e about 8 years ago

  • 137413ac79fc317f05b1d2242912910531a9c552 - add ignoreErrors option
  • 001c231db2d307157b8eb5353aeac48e110b9d68 - fix too long filename error, shorten filename if needed
  • 70d02256977511bc644b0e3fd4bb671df5bb2d1e - add logger
  • eeb09148ede3736fcdec5075c899924d0efa00e2 - save woff2 to fonts directory by defaults
node-website-scraper - v1.1.1

Published by s0ph1e over 8 years ago

  • a74927dea50f97d8b67bdc08ebeef60c43d37285 - Fix deadlock in css handler
node-website-scraper - v1.1.0

Published by s0ph1e over 8 years ago

  • 479b8ecfca4844c73dd37ee3cf6d3f6a1cd01e1d - Improve mechanism for re-using already loaded / loading resources
node-website-scraper - v1.0.3

Published by s0ph1e over 8 years ago

  • 623e63e04c10c2c0e8bfedc5c04e3c2e6bae0923 - Fix exceptions in path.extname for node v6