PPS-22-Scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

APACHE-2.0 License

Stars
7

PPS-22-Scooby 🔍

Team:

👨‍💻 Giovanni Antonioni - [email protected]

👨‍💻 Valerio Di Zio - [email protected]

👨‍💻 Francesco Magnani - [email protected]

👨‍💻 Luca Rubboli - [email protected]

Technologies:

🔄 Scrum

🛠 SBT

🔗 Git

🎯 YouTrack

🚀 Github Actions

Overview:

PPS-22-Scooby is a web scraping and crawling application. It enables users to extract data from web pages by crawling through links and scraping specific content according to predefined rules.

Features:

🕷 Crawling: The application navigates web pages, follows links, and retrieves content.

🔍 Scraping: Relevant data is extracted from HTML/XML pages using XPath, CSS selectors, or regular expressions.

🛠 Customization: Users can define custom scraping and crawling rules to suit their specific needs.

⚙️ Parallel Processing: Aspects of parallel programming are integrated for efficient execution.

📤 Export: Users can export extracted data in various formats according to their preferences.

Implementation:

PPS-22-Scooby is built using Scala with Actor libraries for concurrency management. The application utilizes Git for version control, YouTrack for project management, and Github Actions for continuous integration.

Get Started:

To use PPS-22-Scooby, have a look at the section Get Started at https://pps-22-scooby.github.io/