Steam Sales Analysis

Overview

Welcome to Steam Sales Analysis – an innovative project designed to harness the power of data for insights into the gaming world. We have meticulously crafted an ETL (Extract, Transform, Load) pipeline that covers every essential step: data retrieval, processing, validation, and ingestion. By leveraging the robust Steamspy and Steam APIs, we collect comprehensive game-related metadata, details, and sales figures.

But we don’t stop there. The culmination of this data journey sees the information elegantly loaded into a MySQL database hosted on Aiven Cloud. From this solid foundation, we take it a step further: the data is analyzed and visualized through dynamic and interactive Tableau dashboards. This transforms raw numbers into actionable insights, offering a clear window into gaming trends and sales performance. Join us as we dive deep into the data and bring the world of gaming to life!

`steamstore` CLI

Setup

Installing the package

For general use, setting up the environment and dependencies is straightforward:

# Install the python distribution from PyPI
pip install steamstore-etl

Setting up the environment variables

Create an .env file in a directory.

# Database configuration
MYSQL_USERNAME=<your_mysql_username>
MYSQL_PASSWORD=<your_mysql_password>
MYSQL_HOST=<your_mysql_host>
MYSQL_PORT=<your_mysql_port>
MYSQL_DB_NAME=<your_mysql_db_name>

Open a terminal at the specified location

For Ubuntu (or other Unix-like systems)
1. Load .env Variables into the Terminal
  
  To load the variables from the .env file into your current terminal session, you can use the export command along with the dotenv command if you have the dotenv utility installed.
  
  Using export directly (manual method):
```
export $(grep -v '^#' .env | xargs)
```
  - grep -v '^#' .env removes any comments from the file.
  - xargs converts the output into environment variable export commands.
  Using dotenv (requires installation):
  
  If you prefer a tool, you can use dotenv:
  - Install dotenv if you don't have it:
```
sudo apt-get install python3-dotenv
```
  - Then, use the following command to load the .env file:
```
dotenv
```
  Using source (not typical for .env but useful for .sh files):
  
  If your .env file is simple, you can use source directly (this method assumes no special parsing is needed):
```
source .env
```
  Note that source works well if your .env file only contains simple KEY=VALUE pairs.
2. Verify the Variables
  
  After loading, you can check that the environment variables are set:
```
echo $MYSQL_USERNAME
```
For Windows
1. Load .env Variables into PowerShell
  
  You can use a PowerShell script to load the variables from the .env file.
  
  Create a PowerShell script (e.g., load_env.ps1):
```
Get-Content .env | ForEach-Object {
   if ($_ -match "^(.*?)=(.*)$") {
      [System.Environment]::SetEnvironmentVariable($matches[1], $matches[2], [System.EnvironmentVariableTarget]::Process)
   }
}
```
  - This script reads each line from the .env file and sets it as an environment variable for the current PowerShell session.
  Run the script:
```
.\load_env.ps1
```
  Verify the Variables:
```
echo $env:MYSQL_USERNAME
```
2. Load .env Variables into Command Prompt
  
  The Command Prompt does not have built-in support for .env files. You can use a batch script to achieve this.
  
  Create a batch script (e.g., load_env.bat):
```
@echo off
for /f "tokens=1,2 delims==" %%A in (.env) do set %%A=%%B
```
  Run the batch script:
```
load_env.bat
```
  Verify the Variables:
```
echo %MYSQL_USERNAME%
```

CLI for Steam Store Data Ingestion ETL Pipeline

Usage:

$ steamstore [OPTIONS] COMMAND [ARGS]...

Options:

--install-completion: Install completion for the current shell.
--show-completion: Show completion for the current shell, to copy it or customize the installation.
--help: Show this message and exit.

Commands:

clean_steam_data: Clean the Steam Data and ingest into the Custom Database
fetch_steamspy_data: Fetch from SteamSpy Database and ingest data into Custom Database
fetch_steamspy_metadata: Fetch metadata from SteamSpy Database and ingest metadata into Custom Database
fetch_steamstore_data: Fetch from Steam Store Database and ingest data into Custom Database

Detailed Command Usage

`steamstore clean_steam_data`

Clean the Steam Data and ingest into the Custom Database

Usage:

$ steamstore clean_steam_data [OPTIONS]

Options:

--batch-size INTEGER: Number of records to process in each batch. [default: 1000]
--help: Show this message and exit.

`steamstore fetch_steamspy_data`

Fetch from SteamSpy Database and ingest data into Custom Database

Usage:

$ steamstore fetch_steamspy_data [OPTIONS]

Options:

--batch-size INTEGER: Number of records to process in each batch. [default: 1000]
--help: Show this message and exit.

`steamstore fetch_steamspy_metadata`

Fetch metadata from SteamSpy Database and ingest metadata into Custom Database

Usage:

$ steamstore fetch_steamspy_metadata [OPTIONS]

Options:

--max-pages INTEGER: Number of pages to fetch from. [default: 100]
--help: Show this message and exit.

`steamstore fetch_steamstore_data`

Fetch from Steam Store Database and ingest data into Custom Database

Usage:

$ steamstore fetch_steamstore_data [OPTIONS]

Options:

--batch-size INTEGER: Number of app IDs to process in each batch. [default: 5]
--bulk-factor INTEGER: Factor to determine when to perform a bulk insert (batch_size * bulk_factor). [default: 10]
--reverse / --no-reverse: Process app IDs in reverse order. [default: no-reverse]
--help: Show this message and exit.

Setup Instructions

Development Setup

For development purposes, you might need to have additional dependencies and tools:

Clone the repository:

git clone https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis.git
cd steam-sales-analysis

Create a virtual environment:

Using venv:

python -m venv game
source game/bin/activate  # On Windows use `game\Scripts\activate`

Using conda:

conda env create -f environment.yml
conda activate game

Install dependencies:
- Install general dependencies:
```
pip install -r requirements.txt
```
- Install development dependencies:
```
pip install -r dev-requirements.txt
```

Configuration:

Create an .env file in the root directory of the repository.

Add the following variables to the .env file:

# Database configuration
MYSQL_USERNAME=<your_mysql_username>
MYSQL_PASSWORD=<your_mysql_password>
MYSQL_HOST=<your_mysql_host>
MYSQL_PORT=<your_mysql_port>
MYSQL_DB_NAME=<your_mysql_db_name>

Database Integration

The project connects to a MySQL database hosted on Aiven Cloud using the credentials provided in the .env file. Ensure that the database is properly set up and accessible with the provided credentials.

Running Individual Parts of the ETL Pipeline

To execute the ETL pipeline, use the following commands:

To collect metadata:
```
steamstore fetch_steamspy_metadata
```

To collect SteamSpy data:

steamstore fetch_steamspy_data --batch-size 1000

To collect Steam data:

steamstore fetch_steamstore_data --batch-size 5 --bulk-factor 10

To clean Steam data:

steamstore clean_steam_data --batch-size 1000

This will start the process of retrieving data from the Steamspy and Steam APIs, processing and validating it, and then loading it into the MySQL database.

Dashboard

Explore the interactive Tableau dashboard.

Authors

Kayvan Shah | MS in Applied Data Science | USC
Sudarshana S Rao | MS in Electrical Engineering (Machine Learning & Data Science) | USC
Rohit Veeradhi | MS in Electrical Engineering (Machine Learning & Data Science) | USC

References:

API Used:

Repository

Nik Davis's Steam Data Science Project

LICENSE

This repository is licensed under the MIT License. See the LICENSE file for details.

Disclaimer

Package Rankings

Top 34.8% on Pypi.org

Related Projects

palmer-penguins

Mid-Bootcamp project for Core Code school Big Data & Machine Learning course.

24 Jul 2021 0

data_lakehouse_local_stack

Data Lakehouse local stack with PySpark, Trino, and Minio. Includes an example to process Raygun ...

21 Jun 2024 0

HPI

Human Programming Interface 🧑👽🤖

17 Sep 2019 1,459

chatgpt-telegram-bot

🤖 A Telegram bot that integrates with OpenAI's official ChatGPT APIs to provide answers, written ...

04 Dec 2022 2,998

refinery

The data scientist's open-source choice to scale, assess and maintain natural language data. Trea...

04 Jul 2022 1,393

DocsGPT

GPT-powered chat for documentation, chat with your documents

02 Feb 2023 14,124

NonSteamLaunchers-On-Steam-Deck

Installs the latest GE-Proton and Installs Non Steam Launchers under 1 Proton prefix folder and a...

27 Apr 2023 2,025

sci-pype

A Machine Learning API with native redis caching and export + import using S3. Analyze entire dat...

01 Aug 2016 100

fastapi_cloudrun_pubsub

Python API, queues and workers on GCP: FastAPI + CloudRun + PubSub

22 Aug 2023 3

ml-workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

27 May 2019 3,406

etl-data-pipelines-with-shell-airflow-kafka

Final Assignment Submission: Traffic Flow Optimization with Airflow and Kafka

12 Apr 2024 4

coronavirus-tracker-api

🦠 A simple and fast (< 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outb...

06 Feb 2020 1,592

Steam-Sales-Analysis

Steam Sales Analysis

Overview

steamstore CLI

Setup

Installing the package

Setting up the environment variables

For Ubuntu (or other Unix-like systems)

For Windows

CLI for Steam Store Data Ingestion ETL Pipeline

Detailed Command Usage

steamstore clean_steam_data

steamstore fetch_steamspy_data

steamstore fetch_steamspy_metadata

steamstore fetch_steamstore_data

Setup Instructions

Development Setup

Database Integration

Running Individual Parts of the ETL Pipeline

Dashboard

Authors

References:

API Used:

Repository

LICENSE

Disclaimer

Related Projects

palmer-penguins

data_lakehouse_local_stack

HPI

chatgpt-telegram-bot

refinery

DocsGPT

NonSteamLaunchers-On-Steam-Deck

sci-pype

fastapi_cloudrun_pubsub

ml-workspace

etl-data-pipelines-with-shell-airflow-kafka

coronavirus-tracker-api

`steamstore` CLI

`steamstore clean_steam_data`

`steamstore fetch_steamspy_data`

`steamstore fetch_steamspy_metadata`

`steamstore fetch_steamstore_data`