This scraper, developed in Node.js using the Puppeteer library, is designed to scrape 100 pages, collecting approximately 8,000 records. Each record contains a quote, the writer's name, and an image link.
This scraper is straightforward and specifically designed to extract quotes from the Goodreads website. Built in Node.js using the Puppeteer library, it scrapes 100 pages at once, collecting approximately 8,000 records. Each record includes the main quote, the writer's name, and their image. Simply follow the steps provided to set up this project on your local machine, and you can modify and use it as needed.
Clone this repository:
git clone https://github.com/NomanSiddiqui0000/goodreads_quotes-scraper.git
Navigate to the project directory:
cd goodreads_quotes scraper
Install the required npm packages:
npm install
Open the Scraper_script.js
file and adjust the maxPages
variable if you want to scrape fewer pages. By default, it is set to scrape 100 pages.
Run the script:
node Scraper_script.js
The script will navigate through the Goodreads quotes pages, scrape quotes and image sources, and save the data to quotes_and_images.csv
.
{ headless: false }
to { headless: true }
if you want to run it in headless mode.quotes_and_images.csv
in the root directory of the project.This project is licensed under the MIT License.
Muhammad Noman