Hey guys! My name is R M Srinivas. In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and linear model (Lasso).After which I have selected the best perorming model and performed Hyper parameter tuning and then deployed an interactive application which can generate the visualization and send an email with the visualization to the users email address.

Here is a gif of the application 📹

ETL

Extraction(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/scraper%20and%20extraction.py): - In the current file I used Beautiful soup to extract data from the most visited site to study/analyse/invest into Mutual funds. Extracted 20 columns from the website with 1064 mutual funds. I tried extract in a way such that there should not be much data cleaning afterwards. After which I saved the file as raw_data.xlsx.

Transform(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Tranformation.py): - As the data did not have much steps to clean I have cleaned the raw data that was taken in the above step and removed few columns that had more than 30% missing values(np.nan). Changed the column with the funds AUM in cr to float. Saved the file as cleaned_data.xlsx

Load(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/data_storage.py): - In this file I loaded/uploaded the data to Heroku server for storage of the data and I used Postgresql to send and save the data. Evertime the current file runs it takes the updated data drops the existing column if it exists and then add the updated table/data to the server.

EDA(Exploratory data analysis)

Go through the following links for individual ipynb files.

5 year retutns models testing - >

https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Model%20testing%20for%205%20year%20analysis.ipynb

3 year retutns models testing - >

https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Model%20testing%20for%203%20year%20analysis.ipynb

1 year retutns models testing - >

https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Model%20testing%20for%201%20year%20analysis.ipynb

Let's first understand the relation of our target variable(returns over the perior of 1,3 and 5 years) with the remaining variables. Let's first understand some basic definitions. AUM or Assets Under Management is the total funds that a mutual fund scheme holds.

What does NAV mean? The performance of a particular scheme of a Mutual Fund is denoted by Net Asset Value (NAV). In simple words, NAV is the market value of the securities held by the scheme. Mutual Funds invest the money collected from investors in securities markets.

Risk of the fund Mutual Fund Schemes are not guaranteed or assured return products. Investment in Mutual Fund Units involves investment risks such as trading volumes, settlement risk, liquidity risk, default risk including the possible loss of principal all of this is considered and rated accordingly.

Minium Investment Its the minimum amount limit for investing in a mutual fund.

Type of the fund. There are different funds based on there diversification in the investments they are classified. Equity fund, Debt fund , hybrid fund, Solution based funds, etc...

Outlier analysis and treatment.

Here are few basic information regarding the columns using describe function. We can see that there outliers in few columns lets go ahead and investigate those columns and treat them.

Here is box plot and dist plot of the AUM column before outlier treatment.

Here is box plot and dist plot of the debt percentage column before outlier treatment.

Here is box plot and dist plot of the 5 year returns column before outlier treatment.

Here is box plot and dist plot of the equity percentage column before outlier treatment.

Here is box plot and dist plot of the 3 year returns before outlier treatment.

Treatment of outliers

Tried removing the values greater than 0.85 with mean, median and normalized each column and compared the results which I have documented as a in bottom section of the table.

Here is box plot and dist plot of the AUM column after outlier treatment.

Here is box plot and dist plot of the debt percentage column after outlier treatment.

Here is box plot and dist plot of the 5 year returns column before after treatment.

Here is box plot and dist plot of the equity percentage column after outlier treatment.

Here is box plot and dist plot of the 3 year returns after outlier treatment.

Here is the correlation matrix of the data after outlier treatment

Here is an table which shows us testing scores of various models on the 5 Year returns target variable.

Here is an table which shows us testing scores of various models on the 3 Year returns target variable.

Here is an table which shows us testing scores of various models on the 1 Year returns target variable.

In of the above images for 1,3,5 retunrns model testing, the best model according to the scored obtained is the random forst regressor, and performed Hyper parameter tuning individually for the best results. After pickled the models for running it in the Deployment phase of the project.

Here is the final graphs Individually after hyper parameter optimization and feature importance graph.

Graphs for 1 year precitions

Graphs for 3 year precitions

Graphs for 5 year precitions

Front end application(https://github.com/srinivasRM/Mutual-funds-Analysis-and-prediction/blob/main/Deployment.py)

Using Streamlit Created the following application Animation

The above application has a sidebar that can be accessed for moving through the 5 different pages. Deinition page has the basic information about the various fund related information. After which there are series of 3 pages which can predict the returns based on inputs provided. In the back end after opening each page the respective models saved in pickle format is opened and the user inputs are normalized and converted for getting the prediction. The last page will have all the visualization and analysis with description. Created a requirements.txt for future deployment of the project onto a AWS or Heroku Cloud.

Let me know if you have any suggestions. You can contact me on this email - [email protected]

Related Projects

Machine_Learning

Some fundamental machine learning and data-analysis techniques are explained through realistic ex...

19 Sep 2018 118

Stock-Prediction-System-Application

Stock Prediction System is a ML based website designed using Django's Framework and CSS's BootSt...

30 Jun 2022 66

Forecasting_Mutual_Funds

⚡ This Project gives you an overall idea for Forecasting Mutual Funds

26 Sep 2020 43

data-science-portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

05 Sep 2016 1,089

Data-science

Collection of useful data science topics along with articles, videos, and code

17 Jul 2020 4,031

AIAlpha

Use unsupervised and supervised learning to predict stocks

07 Oct 2018 1,711

Financial-Time-Series

金融时间序列（预测分析 / 相似度 / 数据处理）

03 Jul 2019 167

anomaly-detection-in-time-series-based-on-statistical-features-and-forcasting

Detects anomalies in time series using statistical features and forecasts future values with an L...

29 Aug 2024 0

leverage_analysis_tool

Analyst tool for portfolio construction. How can levereged certificates be used to increase retur...

28 Mar 2022 4

Data_Analysis

Repository to track Data Analysis done on various datasets available online

01 Aug 2024 0

FinancePortfolio

Portfolio Management for Everyone

15 Oct 2023 9

Stock-Prediction

Technical and sentiment analysis to predict the stock market with machine learning models based o...

25 Feb 2021 137

clairvoyant

Software designed to identify and monitor social/historical cues for short term stock movement

12 Sep 2016 2,400

time-series-forecasting-pytorch

Acquiring data from Alpha Vantage and predicting stock prices with PyTorch's LSTM

29 Apr 2021 231

MachineLearningStocks

Using python and scikit-learn to make stock predictions

12 Feb 2017 1,743