node-website-scraper

Download website to local directory (including all css, images, js, etc.)

MIT License

Downloads
44.5K
Stars
1.6K
Committers
19

Bot releases are hidden (Show)

node-website-scraper - v1.0.2

Published by s0ph1e over 8 years ago

  • Get rid of createOutputObject and return Resource as-is
node-website-scraper - v1.0.1

Published by s0ph1e over 8 years ago

  • 8c3f1faf6d0aaea9e9e4a9389e92b995e5b73ed8 - Fix promise bug
  • 8ec9aa78873a2477c31a56c62ec595e093508a79 - Fix bug with recursion when 2 html objects have links to each other
node-website-scraper - v1.0.0

Published by s0ph1e over 8 years ago

  • 92eb86fd8e6e24f328e3544c338cc622158b552f - Change output format: now it returns full tree of assets for each resource
  • a1b347b418370cbf5935b6fc7a4f2a5e9612fde8 - Add prettifyUrls feature
  • e7f8b80770a96f65329e370631869de208aecb44 - Add urlFilter feature
  • 00443b691bab205778cb95a3bc7c0666a9688d9e - Add filnameGenerator feature
  • dc4ab93d211df25bb031ef70373ebd70e35c3beb - Use lodash instead of underscore
  • e88abb2b0ba6e0964a8fefdb3bd2e155be58b84e - Add missing extensions for html and css resources

Breaking changes
Changed output format.
Earlier - flat array of root resources was returned

[ { url: 'http://example.com', filename: 'index.html' } ];

Now - tree of resources

[ { 
  url: 'http://example.com', 
  filename: 'index.html',
  assets: [ // dependencies of index.html
    { 
      url: 'http://example.com/style.css', 
      filename: 'style.css', 
      assets: [ // dependencies of style.css
        { url: 'http://example.com/img-from-styles.png', filename: 'img-from-styles.png', assets: [] },
      ] 
    }
    /* other dependencies of index.html */
  ]
} ];
node-website-scraper - v0.3.6

Published by s0ph1e over 8 years ago

  • 9732eb1b71a502f7fc696a2586729a2c985dc555 Fix similar css urls not updated
node-website-scraper - v0.3.5

Published by s0ph1e over 8 years ago

  • bf0a25e68eddc942ef7cb2539e725c178d288b48 Fixed bug with unspecified protocol in resources on https page
node-website-scraper - v0.3.4

Published by s0ph1e over 8 years ago

  • 8e85c7318696dce2ea293d8d23ec5f74ee07ddbf - Fix loading from <img srcset="">
node-website-scraper - v0.3.3

Published by s0ph1e over 8 years ago

  • a8871eb844290e917441fa669988930a4f1abcb0 - Accept gzip
node-website-scraper - v0.3.2

Published by s0ph1e over 8 years ago

  • 448514fe61f9c48a5f8df2d165852979567bb09c Handle hash anchors
node-website-scraper - v0.3.1

Published by s0ph1e over 8 years ago

  • 16a28ef11c4efd980810515214c52bbd86c0e3d8 update dependencies
  • bef139f2bf398cca08195c8c47ef557adfdb23ad add options maxDepth and recursive
node-website-scraper - v0.3.0

Published by s0ph1e about 9 years ago

  • 69ab9ebf557ee08864e3530d6577f3ad1d1b9726 - refactor
  • 9636962b1fdba61e60c6308dcec150689e4f8d52 - improve detection of duplicated urls
  • b2d2bed8302410d5b1689a2cb64e957f6edd19af - improve recognizing of resource type
  • remove log from options
  • cover with tests

Breaking changes

  • filename returned by scrape was changed - now it contains relative to directory path
var options = {
  urls: 'http://example.com',
  directory: '/path/to/save'
};
scrape(options).then(console.log); 
// earlier: [ { url: 'http://example.com', filename: '/path/to/save/index.html' } ];
// now:  [ { url: 'http://example.com', filename: 'index.html' } ];
node-website-scraper - v0.2.4

Published by s0ph1e over 9 years ago

  • 181a4d84542ff3c4069082952bcf97a7df6412d6 - Fix _.extend issue, which overrides options
  • add more tests
node-website-scraper - v0.2.3

Published by s0ph1e over 9 years ago

  • 7f66a8f5036c9505cbf5c86cf80a942b0ad817b7 Fix regexp for css
  • bd1a778acbc92b505b00dab2c05c56d9cc24aaf5 Add custom request object to options
node-website-scraper -

Published by s0ph1e almost 10 years ago

  • 107773d9a03bcba6a2012a4c9d66da5612f8a091 Use cookies in request, correct behavior on redirect
node-website-scraper - v0.2.1

Published by s0ph1e almost 10 years ago

node-website-scraper - v0.2.0-beta

Published by aivus almost 10 years ago

  • Add multiple urls support (remove url, add urls)
  • Rename options' properties:
    • path -> directory
    • scrToLoad -> sources
    • indexFile -> defaultFilename
  • Start using unit-test
node-website-scraper - v0.1.4

Published by s0ph1e about 10 years ago

  • c6007718db155d06bea36cae7e07d4ae66b80833 Fix updating options.url on <base> found
node-website-scraper - v0.1.3

Published by aivus about 10 years ago

node-website-scraper - v0.1.0

Published by s0ph1e about 10 years ago

First public release