crawl4takeover

Crawler to crawl all the external links from a website

MIT License

Stars

3

View Code on GitHub View on X

Ecosystems: Python

crawl4takeover

Crawler to crawl all the external links from a website

Setup

$ pip install -r requirements.txt

Usage

$ python scan.py {URL}

Working

Script scans the page and get all the URLs from the page and corresponding JS files
Stores the same domain links in memory to scan further
It filters only selected links as configured in scan.py
Script creates two output files
- output.txt: It contains all the links which are found after filter
- broken.txt: It contains all the links which are broken from the above list

Related Projects

creepy

Dead simple web crawler for Python

python-seo-analyzer

An SEO tool that analyzes the structure of a site, crawls the site, count words in the body of th...

28 Dec 2012 1,162

LinkFinder

A python script that finds endpoints in JavaScript files

09 Jun 2017 3,653

arxiv_crawler

Move arxiv.org articles to the Great web

Broken-Link-Crawler

Python bot that crawls your website looking for dead stuff

pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web s...

24 Jun 2014 142

seo_crawler

Bare-bones Basic SEO Crawler using Python Scrapy | check out the new version -->

BBScan

A fast vulnerability scanner helps pentesters pinpoint possibly vulnerable targets from a large n...

13 Nov 2015 2,176

crau

Easy-to-use Web archiver

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

21 Nov 2018 1,487

blc

Broken link checker

Link-scraper-in-python

A Python script to scrap all links in a given website using requests and Beautiful soup

wiki-crawler

w2-2-gw moving tool

robotScraper

RobotScraper is a simple tool written in Python to check each of the paths found in the robots.tx...

gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories