Stock Recommendation System

============ =========== ======= Language Version IDE

Python 2.7 PyCharm ============ =========== =======

This is a simple Python project that offers stock investment recommendations based on Machine Learning predictions from last year's values of any market symbol and also based on Twitter sentiment analysis from retrieved tweets containing the symbol as text inside of it.

Installation

Python <https://www.python.org>_ >=2.7
pandas <https://pypi.org/project/pandas/>_ >= 0.23.4
numpy <https://pypi.org/project/numpy/>_ >= 1.15
tweepy <https://pypi.org/project/tweepy/>_ >= 3.6.0
fix_yahoo_finance <https://pypi.org/project/fix-yahoo-finance/>_ >= 0.0.22
textblob <https://pypi.org/project/textblob/>_ >= 0.15.1
scikit-learn <https://pypi.org/project/scikit-learn/>_ >= 0.19.2

Requirements

Once you have installed the packages, you will need to create a file inside the project folder named constants.py that has to look like this:

.. code:: python

consumer_key = '' consumer_secret = '' access_token = '' access_token_secret = ''

num_of_tweets = int()

This file contains the API keys from Twitter, that you can get from Twitter Developer Platflorm <https://developer.twitter.com>_ applying for them using you own Twitter account, and the number of tweets you want to retrieve for the sentiment analysis.

Explanation

To check if the stock symbol introduced via prompt is valid (or if it exists) I check if it appears in a Company List CSV <./companylist.csv>_ that contains every stock symbol available:

.. code:: python

flag = False df = pd.read_csv('companylist.csv', usecols=[0])

while flag is False: symbol = raw_input('Enter a stock symbol to retrieve data from: ').upper() for index in range(len(df)): if df['Symbol'][index] == symbol: flag = True

Now, the next step is to create the Pandas DataFrame of the introduced symbol stock market values from the last year from now. The information is retrieved from Yahoo! Finance using yahoo-finance-fix <https://github.com/ranaroussi/fix-yahoo-finance>_.

.. code:: python

actual_date = dt.date.today() past_date = actual_date - dt.timedelta(days=366)

actual_date = actual_date.strftime("%Y-%m-%d") past_date = past_date.strftime("%Y-%m-%d")

data = yf.download("AAPL", start=past_date, end=actual_date) df = pd.DataFrame(data=data)

Once the DataFrame is created, I start modelling it to use the prediction algorithms from scikit-learn, in this case I will use LinearRegression because it has a nice accuracy score overrall, but if you want you can use almost any of the regression algorithms for stock precition listed in sklearn <http://scikit-learn.org/stable/supervised_learning.html#supervised-learning>_.

Fist I set up the model for the forecast:

.. code:: python

df = df[['Open', 'High', 'Low', 'Close', 'Volume']] df['HighLoad'] = (df['High'] - df['Close']) / df['Close'] * 100.0 df['Change'] = (df['Close'] - df['Open']) / df['Open'] * 100.0

df = df[['Close', 'HighLoad', 'Change', 'Volume']]

forecast_col = 'Close' forecast_out = int(math.ceil(0.01*len(df))) df['Label'] = df[[forecast_col]].shift(-forecast_out)

Then I apply a preprocessing to the X values, that are the original values of the Close ones, but without the forecast out values, that are shifted up to the 10% of the whole dataset. So on I apply the cross-validation algorithm to establish the X and y values for trainning (20%) and testing.

.. code:: python

X = np.array(df.drop(['Label'], axis=1)) X = preprocessing.scale(X) X_forecast = X[-forecast_out:] X = X[:-forecast_out]

df.dropna(inplace=True) y = np.array(df['Label'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Finally I apply the LinearRegression algorithm to the data set and then I plot it to see the representation of the forecast made.

.. code:: python

clf = LinearRegression(n_jobs=-1) clf.fit(X_train, y_train) accuracy = clf.score(X_test, y_test) forecast = clf.predict(X_forecast)

df['Prediction'] = np.nan

last_date = df.iloc[-1].name last_date = dt.datetime.strptime(str(last_date), "%Y-%m-%d %H:%M:%S")

for pred in forecast: last_date += dt.timedelta(days=1) df.loc[last_date.strftime("%Y-%m-%d")] = [np.nan for _ in range(len(df.columns) - 1)] + [pred]

df['Close'].plot(color='black') df['Prediction'].plot(color='green') plt.legend(loc=4) plt.xlabel('Date') plt.ylabel('Price') plt.show()

For example, for Apple Inc. (AAPL) the prediction plot looks like this:

.. image:: https://i.gyazo.com/29121a2a8d0b1a87d0ad0870fb8224ec.png

Once the stock prediction has finished, the Twitter sentiment analysis starts and it retrieves a list of the last 100 tweets posted in english containing the symbol introduced and they are later stored in a list of Tweet class, defined in Tweet.py <./Tweet.py>_ with the tweet's text and polarity from TextBlob.

.. code:: python

auth = tweepy.OAuthHandler(ct.consumer_key, ct.consumer_secret) auth.set_access_token(ct.access_token, ct.access_token_secret) user = tweepy.API(auth)

tweets = tweepy.Cursor(user.search, q=str(symbol), tweet_mode='extended', lang='en').items(ct.num_of_tweets)

tweet_list = [] global_polarity = 0 for tweet in tweets: tw = tweet.full_text blob = TextBlob(tw) polarity = 0 for sentence in blob.sentences: polarity += sentence.sentiment.polarity global_polarity += sentence.sentiment.polarity tweet_list.append(Tweet(tw, polarity))

global_polarity = global_polarity / len(tweet_list)

Now, to make the recommendation I check if the prediction was favorable compared to the last non-predicted value and if it is, then the global polarity is checked and if it is positive (>0.0 according to TextBlob values), the system recommendation is to invest in that symbol, if not, the recommendation is not to invest.

.. code:: python

if df.iloc[-forecast_out-1]['Close'] < df.iloc[-1]['Prediction']: if global_polarity > 0: print("According to the predictions and twitter sentiment analysis -> Investing in %s is a GREAT idea!" % str(symbol)) elif global_polarity < 0: print("According to the predictions and twitter sentiment analysis -> Investing in %s is a BAD idea!" % str(symbol)) else: print("According to the predictions and twitter sentiment analysis -> Investing in %s is a BAD idea!" % str(symbol))

Author & License

Head of Development

Álvaro Bartolomé del Canto <https://github.com/alvarob96>_ - [email protected]

twitter-stock-recommendation is distributed under the MIT License. See the LICENSE.txt <./LICENSE>_ file in the release for details.

Related Projects

Stock-Prediction-System-Application

Stock Prediction System is a ML based website designed using Django's Framework and CSS's BootSt...

30 Jun 2022 66

stocktalk

Data collection tool for social media analytics

20 Aug 2016 872

stocksight

Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natur...

25 Sep 2017 2,127

bulbea

Deep Learning based Python Library for Stock Market Prediction and Modelling

09 Mar 2017 2,033

MachineLearningStocks

Using python and scikit-learn to make stock predictions

12 Feb 2017 1,743

stocks

Sentiment Analysis for Stock Prediction

09 Feb 2021 7

Stock-Price-Indicator

Python script for stock prediction

12 Sep 2016 6

Stock-Prediction

Technical and sentiment analysis to predict the stock market with machine learning models based o...

25 Feb 2021 137

trading-utils

Collection of scripts and utilities for stock market analysis, strategies etc

28 Mar 2021 105

time-series-forecasting-pytorch

Acquiring data from Alpha Vantage and predicting stock prices with PyTorch's LSTM

29 Apr 2021 231

investpy

Financial Data Extraction from Investing.com with Python

27 Nov 2018 1,566

Stocksent

A Python library for sentiment analysis of various tickers from the latest news by trusted source...

05 Jul 2021 47

clairvoyant

Software designed to identify and monitor social/historical cues for short term stock movement

12 Sep 2016 2,400

trendet

Python package for trend detection on stock time series data

09 Aug 2019 457

Speculator

API for predicting the next Bitcoin and Ethereum with machine learning and technical analysis

03 Sep 2017 104