What is CrawleMe! ?

CrawleMe! is is easy way of crawling image or link urls from any web site.

How It Works ?

Create your web page wrapper class.

from crawleme.base import BasePage

class MyPage(BasePage):
	url = 'http://www.mysite.com'
	item_path = '//*[@id="campaign_list"]/div/a'
	item_attribute = 'href'

Create a instance of wrapper class and call crawle method.

crawler = MyPage()
urls = crawler.crawle()

for url in urls:
	print url

Result:

http://www.mysite.com/id/5
http://www.mysite.com/aboutus/
http://www.mysite.com/foo/
http://www.mysite.com/bar/
http://www.mysite.com/baz/

Also, you can pass or override the url or item_path of wrapper class on creating class instance.

crawler = MyPage(url='http://www.mysite.com/id/112312')

Properties:

url: Url of page that will be crawled. item_path: X-Path of selected DOM element(s). item_attribute: Attribute of selected DOM element(s). has_only_single_item (default=False): crawle method returns only single value when there is True fix_urls (default=True): Sometimes may be DOM object attributes contains only path value without hostname and protocol. This attributes fix the parsed value as full url.

Methods:

crawle([timeout=crawleme.conf.REQUEST_TIMEOUT],[renew=False]): Parses value list or single value from the page by the specified attributes.

get_filename([timeout=crawleme.conf.REQUEST_TIMEOUT]): Returns requested filename.

read([timeout=crawleme.conf.REQUEST_TIMEOUT]): read data from stream.

Related Projects

crawlpy

Scrapy python crawler/spider with post/get login (handles CSRF), variable level of recursions an...

27 Jul 2016 55

Broken-Link-Crawler

Python bot that crawls your website looking for dead stuff

31 Mar 2019 42

scrapy-examples

Multifarious Scrapy examples. Spiders for alexa / amazon / douban / douyu / github / linkedin etc.

11 Jan 2014 3,171

PythonSpiderNotes

Python入门网络爬虫之精华版

19 Aug 2015 6,877

scrapy_rss

Tools to easy generate RSS feed that contains each scraped item using Scrapy framework.

02 Feb 2017 30

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

31 Aug 2020 6,197

WebCrawler

A web crawler based on requests-html, mainly targets for url validation test.

24 Mar 2017 32

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extr...

10 Jan 2024 4,020

pylinkvalidator

pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web s...

24 Jun 2014 142

talospider

talospider - A simple,lightweight scraping micro-framework

03 Jun 2017 54

seo_crawler

Bare-bones Basic SEO Crawler using Python Scrapy | check out the new version -->

12 Nov 2017 6

pkulaw_spider

爬取北大法宝网http://www.pkulaw.cn/Case/

20 Jan 2017 159

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). In...

06 Dec 2013 2,630

webium

Webium is a Page Object pattern implementation library for Python (http://martinfowler.com/bliki/...

16 Jan 2015 161

AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

21 Nov 2018 1,487