A sophisticated web scraper for tracking GPU prices across e-commerce platforms, featuring proxy rotation, CAPTCHA handling, and data visualization.
MIT License
GPU Price Tracker is a sophisticated web scraping project that monitors and analyzes GPU prices across multiple e-commerce platforms. Built with scalability and efficiency in mind, this project demonstrates advanced scraping techniques, data management, and full-stack development skills.
The project follows a modular architecture, separating concerns for improved maintainability and scalability:
src/api.js
: RESTful API endpointssrc/db/
: Database connection and schema definitionssrc/models/
: Mongoose models for data structuressrc/repositories/
: Data access layersrc/scheduler.js
: Orchestrates scraping jobssrc/scraper/
: Custom scrapers for each e-commerce platformsrc/services/
: Core business logic, including proxy management and CAPTCHA handlingsrc/telegram/
: Telegram bot integration for notifications and manual interventionssrc/web/my-app/
: Next.js frontend applicationClone the repository:
git clone https://github.com/vedovati-matteo/gpu-price-tracker.git
Install dependencies:
cd PriceCompare
npm install
Set up environment variables: Craete the .env file in the root directory and add the following variables:
MONGO_INITDB_ROOT_USERNAME=...
MONGO_INITDB_ROOT_PASSWORD=...
MONGO_PRICECOMPARE_USERNAME=...
MONGO_PRICECOMPARE_PASSWORD=...
TELEGRAM_BOT_TOKEN=...
PORT=3000
Replace the ...
with your actual values. These variables are crucial for:
Start the application:
docker-compose up -d
Access the application:
http://localhost:3000
http://localhost:3001
The project implements a smart proxy rotation system to ensure optimal performance and avoid detection:
When encountered, CAPTCHAs are solved through a unique system leveraging Telegram bot notifications and noVNC for remote desktop access, allowing for manual intervention without breaking the scraping flow.
Implements various techniques to mimic human behavior, including:
The Telegram bot serves as a powerful tool for monitoring and controlling the scraping process:
Command List:
/start
: Initiates the bot with a welcome message and prompts to explore commands./help
: Provides a concise guide to the bot's capabilities./status
: Displays the current status of the scraping process, including active runs and next scheduled runs./execute [source]
: Triggers a scraping run. Can focus on specific sources or test CAPTCHA functionality./captcha
: Signals successful CAPTCHA resolution, allowing the scraper to resume.Additional Functionality:
The frontend provides intuitive visualizations of GPU prices, including:
Server Environment:
Frontend Access:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
For any queries or suggestions, please open an issue or contact the maintainer at [email protected].
Built with by Matteo Vedovati