Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

MIT License

Stars
490
Committers
4

Uscrapper Vanta:

  • Dark Web Support: Uscrapper Vanta now has the capability to handle .onion or dark web links. This expanded functionality enables users to extract crucial information from previously inaccessible sources, providing a more comprehensive view of the digital landscape.

  • Keyword-Based Scraping: With the introduction of a new model, Uscrapper Vanta now allows users to scrape web pages for specific keywords or a list of keywords. This tailored approach enhances the tool's versatility, enabling users to focus on extracting only the information relevant to their needs.

Uscrapper extracts the following details from the provided website:

  • Email Addresses: Displays email addresses found on the website.
  • Social Media Links: Displays links to various social media platforms found on the website.
  • Author Names: Displays the names of authors associated with the website.
  • Geolocations: Displays geolocation information associated with the website.
  • Non-Hyperlinked Details: Displays non-hyperlinked details found on the website including email addresses phone numbers and usernames.
  • Keyword Based Extraction: Displays relevant data by specifying terms or curating comprehensive keyword lists.

📽 Preview:

git clone https://github.com/z0m31en7/Uscrapper.git
cd Uscrapper/install/ 
chmod +x ./install.sh && ./install.sh      #For Unix/Linux systems

🔮 Usage:

python Uscrapper-vanta.py [-h] [-u URL] [-O] [-ns] [-c CRAWL] [-t THREADS] [-k KEYWORDS [KEYWORDS ...]] [-f FILE]

Arguments:

  • -u URL, --url URL (URL of the website)
  • -O, --generate-report (Generate a report)
  • -ns, --nonstrict (Display non-strict usernames (may show inaccurate results))
  • -c CRAWL, --crawl (CRAWL) specify max number of links to Crawl and scrape within the same scope
  • -t THREADS, --threads THREADS (Number of threads to utilize while crawling (default=4))
  • -k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...] (Keywords to search for (as space-separated arguments)
  • -f FILE, --file FILE (Path to a text file containing keywords)

📜 Note:

  • Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.

  • The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.

  • To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.

💌 Contribution: Want a new feature to be added?

  • Make a pull request with all the necessary details and it will be merged after a review.
  • You can contribute by making the regular expressions more efficient and accurate, or by suggesting some more features that can be added.
Related Projects