Open Source Ecosystems

Uscrapper Vanta:

Dark Web Support: Uscrapper Vanta now has the capability to handle .onion or dark web links. This expanded functionality enables users to extract crucial information from previously inaccessible sources, providing a more comprehensive view of the digital landscape.
Keyword-Based Scraping: With the introduction of a new model, Uscrapper Vanta now allows users to scrape web pages for specific keywords or a list of keywords. This tailored approach enhances the tool's versatility, enabling users to focus on extracting only the information relevant to their needs.

Uscrapper extracts the following details from the provided website:

Email Addresses: Displays email addresses found on the website.
Social Media Links: Displays links to various social media platforms found on the website.
Author Names: Displays the names of authors associated with the website.
Geolocations: Displays geolocation information associated with the website.
Non-Hyperlinked Details: Displays non-hyperlinked details found on the website including email addresses phone numbers and usernames.
Keyword Based Extraction: Displays relevant data by specifying terms or curating comprehensive keyword lists.

📽 Preview:

git clone https://github.com/z0m31en7/Uscrapper.git

cd Uscrapper/install/ 
chmod +x ./install.sh && ./install.sh      #For Unix/Linux systems

🔮 Usage:

python Uscrapper-vanta.py [-h] [-u URL] [-O] [-ns] [-c CRAWL] [-t THREADS] [-k KEYWORDS [KEYWORDS ...]] [-f FILE]

Arguments:

-u URL, --url URL (URL of the website)
-O, --generate-report (Generate a report)
-ns, --nonstrict (Display non-strict usernames (may show inaccurate results))
-c CRAWL, --crawl (CRAWL) specify max number of links to Crawl and scrape within the same scope
-t THREADS, --threads THREADS (Number of threads to utilize while crawling (default=4))
-k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...] (Keywords to search for (as space-separated arguments)
-f FILE, --file FILE (Path to a text file containing keywords)

📜 Note:

Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.
The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.
To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.

💌 Contribution: Want a new feature to be added?

Make a pull request with all the necessary details and it will be merged after a review.
You can contribute by making the regular expressions more efficient and accurate, or by suggesting some more features that can be added.

Related Projects

webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scrapi...

26 May 2022 1,533

file-scraper

Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.

01 Apr 2023 8

uk-blogs

04 Aug 2022 4

snscrape

A social networking service scraper in Python

09 Sep 2018 4,266

scraping_tutorial

Basics of scraping with python, requests, beautifulsoup4, selenium, etc.

17 Oct 2019 1

dirsearch

Web path scanner

30 Apr 2013 11,267

GoogleScraper

A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). In...

06 Dec 2013 2,630

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

20 Feb 2018 92

GlobalAntiScamOrg-blocklist

Machine-readable .txt blocklist of scam URLs and IP Addresses from the Global Anti Scam Organizat...

21 Feb 2022 30

scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & comp...

22 Feb 2018 456

recipe-scrapers

Python package for scraping recipes data

14 Sep 2015 1,695

autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

31 Aug 2020 6,197

ScrapPY

ScrapPY is a Python utility for scraping manuals, documents, and other sensitive PDFs to generate...

04 Nov 2022 189

robotScraper

RobotScraper is a simple tool written in Python to check each of the paths found in the robots.tx...

16 Jun 2021 10

scrapy-scraper

Web crawler and scraper based on Scrapy and Playwright's headless browser.

13 Apr 2023 9