imdb-web-scraper

IMDB web scraper using Scrapy framework. Flask server for data visualization

Stars

1

View Code on GitHub

Ecosystems: Python

IMDB Web Scraper

Scrapy is a python framework for scraping data and crawling websites. I have created various crawlers to learn Scrapy and improve my Python skills

Project

This repository contains various Scrapy demo spiders.

Quotes Scraper
IMDB movies scraper
Books Scraper

It also contains a simple http server to view the scraped data from the spiders.

The spiders save their data to an SQLite3 database. The website queries data from the database.

Setup

I recommend using virtualenv to isolate your project dependencies

Install virtualenv
- sudo pip3 install --user virtualenv
- May have to use sudo -H with newer versions
Create a new virtual environment with venv
- virtualenv env
Active the virtual environment
- source env/bin/activate
Install the package dependencies
- pip install -r requirements.txt
- Scrapy + Dependencies (Spiders)
- Flask + Dependencies (Server)

Running the Scrapy Spiders

Run a spider using the name defined within the class
- scrapy crawl movies
- Current List of Available Spiders:
  - movies
    - Scrapes IMDB movie data and saves it to an SQLite3 database
  - quotes
  - books
Run scrapy interactively to test html selectors
- scrapy shell [url]
- You can then execute selections
  - Ex) response.css('div.summary::text').get()

Running the Flask Server

Set the shell environment variables
- set FLASK_APP=server
- set FLASK_ENV=development
Start the server
- flask run
Site Pages
- /movies
  - List of all movies contained in database
  - Supports title search and pagination

Simple Flask Site to View and Search Scraped Data

Related Projects

WebScrape

Web + Command Line Webscraper Tool!

flickr-scrape

seleniumcrawler

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to c...

26 Aug 2013 127

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

31 Aug 2020 6,197

Web-Database-Analytics

Web scrapping and related analytics using Python tools

18 Feb 2018 267

Django-Movie-Recommendor

Movie Website built on python Django framework; Uses Content Based Predictive Model approach to p...

SpiderKeeper

admin ui for scrapy/open source scrapinghub

18 Jan 2016 2,738

anime_spiders

A collection of self-using anime-related crawlers.

web-scraper

Crawl and scrape dynamic Web sites. Scrape Web sites that dynamically load content or sites that ...

quotesbot

This is a sample Scrapy project for educational purposes

27 Sep 2016 1,296

ECommerceCrawlers

实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学...

29 Mar 2019 4,682

scrapy-scraper

Web crawler and scraper based on Scrapy and Playwright's headless browser.

fa-scraper

A FilmAffinity web scraper compatible with Letterboxd

scraping_tutorial

Basics of scraping with python, requests, beautifulsoup4, selenium, etc.