node-crawl-mf2

Crawl microformats2 data for h-entry and h-feeds

LGPL-3.0 License

Stars
1

crawl-mf2

Crawl a microformats2 site to find things like canonical URLs for h-entrys

Note: this module does not really handle pages with more than one top-level microformats2 nodes.

Installation

npm install crawl-mf2

Example

Start a crawl and log canonical h-entry URLs found on https://strugee.net/blog/:

var crawl = require('crawl-mf2');

var crawler = crawl('https://strugee.net/blog/');

crawler.on('h-entry', function(url, mf2node) {
	console.log(url);
});

API

The module exports a single function, crawlMf2, which takes a single argument, the base URL to crawl from.

It returns an EventEmitter.

Events

'error'

Emitted when an error occurs. Currently this means either the microformats2 parser failed or an HTTP error occurred.

Note: treated specially by Node.js.

'urlDisco'

  • String The URL being discovered

Emitted when a new URL is discovered, including the initial base URL.

'mf2Parse'

Emitted when a URL is parsed for microformats2 markup.

'h-feed'

Emitted when an h-feed page is discovered.

'h-entry'

Emitted when an h-entry page is discovered.

License

LGPL 3.0+

Author

AJ Jordan [email protected]