GPU Price Tracker

Overview

GPU Price Tracker is a sophisticated web scraping project that monitors and analyzes GPU prices across multiple e-commerce platforms. Built with scalability and efficiency in mind, this project demonstrates advanced scraping techniques, data management, and full-stack development skills.

Key Features

Scrapes GPU prices from eBay, Mediaworld, and Hardware-planet
Stores historical price data in MongoDB
Implements proxy rotation with free proxy lists
Handles CAPTCHAs through innovative user intervention via Telegram bot
Visualizes price trends and comparisons through a reactive Next.js frontend
Containerized with Docker for easy deployment and scaling

Technologies

Backend: Node.js, Express.js
Scraping: Puppeteer
Database: MongoDB with Mongoose
Frontend: Next.js, Shadcn/UI
DevOps: Docker, Docker Compose, Nginx
Bot Integration: Telegram Bot API

Architecture

The project follows a modular architecture, separating concerns for improved maintainability and scalability:

src/api.js: RESTful API endpoints
src/db/: Database connection and schema definitions
src/models/: Mongoose models for data structures
src/repositories/: Data access layer
src/scheduler.js: Orchestrates scraping jobs
src/scraper/: Custom scrapers for each e-commerce platform
src/services/: Core business logic, including proxy management and CAPTCHA handling
src/telegram/: Telegram bot integration for notifications and manual interventions
src/web/my-app/: Next.js frontend application

Getting Started

Clone the repository:

git clone https://github.com/vedovati-matteo/gpu-price-tracker.git

Install dependencies:
```
cd PriceCompare
npm install
```
Set up environment variables: Craete the .env file in the root directory and add the following variables:
```
MONGO_INITDB_ROOT_USERNAME=...
MONGO_INITDB_ROOT_PASSWORD=...
MONGO_PRICECOMPARE_USERNAME=...
MONGO_PRICECOMPARE_PASSWORD=...
TELEGRAM_BOT_TOKEN=...
PORT=3000
```
Replace the ... with your actual values. These variables are crucial for:
- Connecting to your MongoDB instance
- Authenticating your Telegram bot
- Setting the port for your application
Start the application:
```
docker-compose up -d
```
Access the application:

Backend server: http://localhost:3000
Frontend interface: http://localhost:3001

Advanced Features

Proxy Rotation

The project implements a smart proxy rotation system to ensure optimal performance and avoid detection:

Proxy Source: Free proxies are obtained from ProxyScrape, a reliable source for free proxy lists.
Proxy Testing: Each proxy is rigorously tested before use to ensure functionality.
Categorization: Proxies are categorized based on their performance:
- Functional proxies are used for regular scraping operations.
- Proxies that encounter CAPTCHAs are segregated into a separate list for strategic use.
Fallback Mechanism: When all functional proxies are exhausted, the system cleverly falls back to the CAPTCHA-prone list, balancing scraping speed with CAPTCHA challenges.

CAPTCHA Handling

When encountered, CAPTCHAs are solved through a unique system leveraging Telegram bot notifications and noVNC for remote desktop access, allowing for manual intervention without breaking the scraping flow.

Bot Detection Avoidance

Implements various techniques to mimic human behavior, including:

Dynamic user agent rotation
Realistic scrolling patterns
Randomized delays between actions

Telegram Bot Integration

The Telegram bot serves as a powerful tool for monitoring and controlling the scraping process:

Command List:

/start: Initiates the bot with a welcome message and prompts to explore commands.
/help: Provides a concise guide to the bot's capabilities.
/status: Displays the current status of the scraping process, including active runs and next scheduled runs.
/execute [source]: Triggers a scraping run. Can focus on specific sources or test CAPTCHA functionality.
/captcha: Signals successful CAPTCHA resolution, allowing the scraper to resume.

Additional Functionality:

CAPTCHA Requests: Notifies the developer when a CAPTCHA is encountered, providing a noVNC link for manual solving.
Status Updates: Keeps the developer informed about scraping progress across different platforms.
Run Completion Reports: Provides comprehensive summaries after each scraping run.
Reminders: Sends notifications before scheduled scraping runs.

Data Visualization

The frontend provides intuitive visualizations of GPU prices, including:

Current prices across different platforms
Historical price trends
Comparative analysis tools

Deployment

Server Environment:
- Deployed on a DigitalOcean droplet (VPS)
- Runs on a Linux operating system
Frontend Access:
- The live frontend application is accessible at: https://pricecoma.tech/
- Features up-to-date GPU price information, automatically updated daily