baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

MIT License

Downloads
14
Stars
3

baha-crawler

baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

(Overview)

Javascript Node.jsjavascript

Bahamut Forum is the most famous and biggest game forum in Taiwan and game plays are well-know forum. Just search a while, Bahamut Forum crawler modules are not easy to be found especially written by javascript. In order to scrap data from Bahamut Forum by Node.js, I just create a simple Bahamut Forum crawler module by javascript and share it to everyone to use.

(What can it do ?)


  • Scraping posts of any board on

Support to scrape pages in one time


Support to skip fixed upper posts

  • ********(fork and PR)
    Scraped posts contain titles and hyperlinks.(Other data like authers, dates, likes,... are not implimented yet and welcome to fork send PR)

(How to use it in your project ?)

  • npm
    Use npm to install
npm install @waynechang65/baha-crawler
  • baha-crawler
    Include @waynechang65/baha-crawler package in your project
const baha_crawler = require('@waynechang65/baha-crawler');
  • async
    Add programs below in an async function in your project
// *** Initialize ***  
await baha_crawler.initialize();

// *** GetResult  ***
    let baha = await baha_crawler.getResults({
        board: '23805',
        pages: 3,
        skipTPs: true
    }); // ToS Board(23805), 3 pages, skip fixed upper posts

// *** Close      ***
await baha_crawler.close();
  • getResults()
    Scraped data will be returned with an object by getResults() function, it shows below.
{ titles[], urls[] }

(How to run the example ?)

  • Githubbaha-crawler
    Clone baha-crawler from GitHub
git clone https://github.com/WayneChang65/baha-crawler.git
  • baha-crawler
    Get into baha-crawler directory
cd baha-crawler

Install dependencies in the cloned baha-crawler folder

npm install
  • npm ( ./examples/demo.js)
    Run it with npm. (the demo example is in ./examples/demo.js)
npm run start

(Base Methods)

  • initialize(): , initialize baha-crawler object
  • getResults(options): scrape data

options.board: , board name of baha options.pages: , pages options.skipTPs: , skip fixed upper posts or not

  • close(): , close baha-crawler object

(Reference)

(Contribution)

baha-crawler (bug)Issue Pull Request:)

Even though baha-crawler is a small project, I hope it can be improving. If there is any issue, please comment and welcome to fork and send Pull Request. Thanks. :)