Energy Data Analytics

Description:

The project objective is to analyze global energy data, derive meaningful insights from it and visualize findings in a comprehensible format. In addition, a Streamlit app with an interactive dashboard was created and deployed, and also by using Machine Learning several models of predicting CO2 emissions were trained and evaluated.

You can find the dataset on Kaggle with a detailed description.

Python libraries used:

pandas: data cleaning and transformation
plotly: visualizing data, creating various charts
scikit-learn: ML models and evaluation metrics
streamlit: creating an online dashboard

Project overview

1. Data cleaning and transformation

Checking the dataset for missing values and duplicates, shortening column names and adding new derived columns.

2. Exploratory Data Analysis

Answering questions regaring the dataset by using data transformation and filtering, and creating various types of charts:

Top 10 countries by the amount of electricity generated from renewable sources
Correlation matrix of all numerical variables
Distribution of renewable energy share in countries on each continent
CO2 emissions per capita vs. GDP per capita
Distribution of the amount of renewable energy produced in European Union
Dynamics of electricity generation in the European Union
Energy intensity level of primary energy vs. renewable energy share in power generation in the EU

After that, the relevant dataset is downloaded in order to be used for the Streamlit dashboard.

3. Machine Learning - Predicting CO2 emissions

Pre-processing data: dropping irrelevant columns, removing rows with missing values and duplicates, encoding categorical data into numeric.
Feature Engineering: calculating features' correlations with the target variable and removing irrelevant ones, dropping some features that highly correlate with each other to avoid multicollinearity.
Modeling: splitting the data into training and testing sets, scaling the features, creating and evaluating machine learning models.

5 different Machine Learning models are trained: Linear Regression, Random Forest Regression, Gradient Boosting Regression, K Neighbors Regression, XGBoost Regression. The models are evaluated by several metrics: median absolute error, mean absolute error, mean squared error, root mean squared error, coefficient of determination.

Based on the evaluation metrics, Random Forest Regression outperformed others and demonstrated the lowest error and the highest prediction accuracy.

4. Streamlit Dashboard

Creating an interactive dashboard with:

Slider and checkbox filter with the options to choose a particular year, continent and type of energy
Colored map representing the amount of electricity generation in each country for a selected year
Pie chart with the energy distribution for a selected continent
Bar chart illustrating top 5 energy producers for a selected continent and year
Button to download the filtered dataset

Notes

How to run the code:

Ensure to download the necessary packages as in requirements.txt
Download the dataset global-data-on-sustainable-energy.csv from Kaggle
In case you change the file name, also change the data path in the code pd.read_csv('<YOUR DATA PATH>')
Download the processed dataset for the dashboard
Run the Streamlit dashboard by typing streamlit run energy_dashboard.py on the terminal

Badges

Extracted from project README

Related Projects

anomaly-detection-in-time-series-based-on-statistical-features-and-forcasting

Detects anomalies in time series using statistical features and forecasts future values with an L...

29 Aug 2024 0

SBSPS-Challenge-3925-Predicting-the-Energy-Output-of-wind-turbine-based-on-the-weather-conditions-

Predict Energy Output of a wind turbine at any geo-coordinate for a time-series of next 72 hours ...

22 Jun 2020 24

CO2Oracle

The CO2 Oracle project uses machine learning and AI to analyze and predict CO2 emissions for envi...

02 Aug 2024 1

energy-data-analysis

This is the cloud model analyzing real world dataset with BigQuery and other big-data analyzing t...

04 Aug 2024 0

Mutual-funds-Analysis-and-prediction

In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual...

24 Jul 2022 4

Data_Analysis

Repository to track Data Analysis done on various datasets available online

01 Aug 2024 0

ml-cheatsheet

A constantly updated python machine learning cheatsheet

11 Apr 2017 166

Fraud-Warden

Fraudulent Credit Transaction detection system using SMOTE, Random Forest Classifier and Streamlit

31 Jul 2024 0

vis-tutorial

Tutorial on visualising data in Python: matplotlib, pandas, seaborn, plotly...

19 Apr 2017 7

Power_Consumption_Prediction

Future power consumption prediction using LSTM, GRU and Transformer models

22 Aug 2024 4

electricitymaps-contrib

A real-time visualisation of the CO2 emissions of electricity consumption

21 May 2016 3,325

APACHE-SPARK-PYSPARK-DATABRICKS-MACHINE-LEARNING-MLIB

Apache Spark Machine Learning project using MLlib and Linear Regression on Databricks!

07 Aug 2024 0

Machine-Learning-for-Solar-Energy-Prediction

Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning

06 May 2018 220

Gemello

09 Oct 2015 18

CASS-PROPEL

Complete evaluation of traditional "SK-learn like" machine learning models for post-operative com...

11 Sep 2023 4

Energy-Data-Analytics-ML