seo_crawler

Bare-bones Basic SEO Crawler using Python Scrapy | check out the new version -->

Stars

6

View Code on GitHub Visit Website View on X

Ecosystems: Python

SEO Crawler

Bare-bones Basic SEO Crawler using Python Scrapy

This project has become part of the advertools package, checkout the documentation page

Using Scrapy, get the main SEO elements for exploratory analysis of a website. It works by supplying a list of known URLs to crawl and return structured results.

The main elements include:

url: the actual URL
slug: the URI part of the URL
directories: splits the URI by slashes to return the different folders (directories) in each URI
title: the tag
h1, h2, h3, h4: header tags
description: the meta description
link_urls: not activated, needs special configuration to make sure you are getting links to certain sites
link_text: depends on the above, extracts the anchor text of each link
link_count: number of links on page (based on your criteria)
load_time: page load time in seconds
status_code: response code of page 200, 301, 404, etc.

Many other elements should be added to the list but they differ from site to site, some examples:

publishing date
product price
content category
tags of an article
whether or not a certain keyword is in a certain location
type of content (inferred from a URL directory, or from certain content on page)
etc.

Related Projects

crawl4takeover

Crawler to crawl all the external links from a website

scraping_tutorial

Basics of scraping with python, requests, beautifulsoup4, selenium, etc.

advertools_crawler_ui

advertools crawler UI

scrapy-examples

Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.

11 Jan 2014 3,171

scrappy

scrapy best practice

python-seo-analyzer

An SEO tool that analyzes the structure of a site, crawls the site, count words in the body of th...

28 Dec 2012 1,162

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). In...

06 Dec 2013 2,630

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

31 Aug 2020 6,197

anime_spiders

A collection of self-using anime-related crawlers.

Broken-Link-Crawler

Python bot that crawls your website looking for dead stuff

advertools

advertools - online marketing productivity and analysis tools

14 May 2017 1,129

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable i...

31 May 2023 490

webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scrapi...

26 May 2022 1,533

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

21 Nov 2018 1,487