ECON4130 Assignment 3 - Trading $TSLA with @elonmusk

Introduction¶

Good papers are usually the ones with interesting ideas. Although techniques are important, motivations should always come first. In this assignment, we will try to gain insight from tweets of Tesla's CEO, Elon Musk (@elonmusk), for trading his company's stock ($TSLA). We hope you will find this exercise interesting. For simplicity, settings in this exercise are simple and therefore might not yield the best outcome. You are always welcome to consider extensions of this exercise, and design your own trading mechanism.

Coverage¶

In this assignment, we will cover skills mentioned in the following notebooks:

D2 Text Data

Important Notes¶

When writing your answers, please run all cells including those specified #Nothing is required to do in this cell.

Part 1. Twitter Data¶

Scraping data from Twitter is not an easy task. Although you can access to Twitter's official API (after applying for an account here) for scrapping tweets, various limitations would be posted to users. For example, with the standard API, you would only be able to retrieve tweets up to 7 days ago.

Developers has devoted great effort in overcoming these challeges. While python packages like twitterscraper (details in here) has been developed, their performances are highly unstable. When we first prepared this assignment, the package worked well and now it is not working.

Therefore, in this exercise, we will switch to use a dataset avaliable in Kaggle, called "Elon Musk's Tweets". The dataset is downloaded from https://www.kaggle.com/kulgen/elon-musks-tweets.

Q1.1 Download the data file data_elonmusk.csv from the course website and put it in the same file with your assignment. Load the data and store it with the name raw_data_tweet. When loading the data, use the encoding latin1.

Hint: When you want to read a csv file with the encoding latin1, you can use pandas.read_csv("yourfile.csv", encoding = "latin1").

Q1.2 Show the first 30 rows of the raw_data_tweet.

Q1.3 Create another dataframe named df_tweets_formatted using raw_data_tweet with the following formatting:

Step 1: Delete all the tweets that are retweeted from others. (i.e. Keep rows which raw_data_tweet["Retweet from"].isna() == True). Name the new dataframe as s1.

Step 2: Keep only the 2 columns: Tweet and Time of s1. Name this new dataframe as s2.

(Given) Step 3&4: Transform the column Time into datetime format and leave only data in 2016.

In [3]:

#Put your answer here


###########################################################################
import datetime as dt 
import pandas as pd 

#Step 3:
s3 = s2.copy()
s3["Time"] = pd.to_datetime(s3["Time"])
s3["Time"] = s3["Time"].dt.date

#Step 4:
start = dt.date(2016,1,1)
end = dt.date(2016,12,31)
df_tweets_formatted = s3[(s3["Time"]>=start) & (s3["Time"]<= end)]

Besides the above formatting, you might also note that it is common to have hyperlinks in tweets. For example:

Since nothing can be inferred from these links, we will want to delete them when doing our analysis. Below we will use the re package for doing so. Details of the package can be referred in here.

where

https?:\/\/ matches any "http://" and "https://" in string
\S+ matches strings that do not contain white spaces

Q1.4 Define a function del_https to delete hyperlinks in tweets and another function del_n to delete the phase \n in tweets. Apply the 2 functions on the column df_tweets_wohttp["Tweet"].

Hint: As in the previous assignment, you can apply a function func on the whole column col of the dataframe df by using the line df["col"].apply(func).

To mention someone in twitter, as like in instagram and other social media platforms, you will put @ followed by the user's username. For example, people talk about Elon Musk and Donald Trump in their tweets with @elonmusk and @realDonaldTrump.

As we are going to perform sentiment analysis on Elon Musk's tweets for trading $TSLA, tweets mentioning other users (or companies) might bring inrelevant information and give confusing trading signal. For instance, given NIO is one of the major competitior to Tesla, if Elon Musk publishes a tweet: @NIO is doing really well on their new models! (This is fake and for demonstration purpose only.), it is an obvious selling signal to $TSLA but the tweet is highly positive in its sentiment and we might be misguided to buy.

Q1.5 Delete the tweets containing "@" in df_tweets_wohttp (mentioning others). Reset the index of it and call it df_tweets_final.

Hint: You may want to use the line df_tweets_wohttp[df_tweets_wohttp["Tweet"].str.contains('@') == False].

Q1.6 Do the sentiment analysis on df_tweets_final["Tweet"] with TextBlob. Save the polarity of tweets into another column named Sentiment of df_tweets_final. (i.e. df_tweets_final["Sentiment"])

Q1.7 Since there could have multiple tweets within the same date, calculate the average across same date

Hint: You may want to use the line df_tweets_final.groupby('Time').mean().

Part 2. Financial Data¶

Q2.1 Download adjusted closing stock price of TSLA from 2016-01-01 to 2016-12-31 by using the package yfinance. Details of the package can be referred in here. Save the adjusted closing stock price of TSLA from 2016-01-04 (first trading day in 2016) to 2016-12-30 (last trading day in 2016) as stock_data.

Hint: You may want to use the line yf.download('TSLA',start, end)

Q2.2 Plot the time series graph of stock_data using matplotlib.pyplot, and export it as ts_Yourname.png and ts_Yourname.pdf. Details of the package can be referred in here. You DO NOT need to submit the png and pdf file to me.

Hint: You may want to use the functions plt.plot() and plt.savefig().

Part 3. Trading with sentiment analysis¶

The simpliest way to evaluate the performance of a trading strategy is to compare it with the buy and hold strategy, that is, to buy at the beginning and to sell at the end. If a specific strategy fail to beat even this simpliest buy and hold strategy, it should never be considered as a good strategy.

Q3.1 Save the adjusted closing price of TSLA on its first trading day and its last trading day in 2016 asfirst_trading_day_price_2016 and last_trading_day_price_2016 respectively. Calculate the percentage gain(or loss) if we buy one share of TSLA on its first trading day in 2016 and sell it on its last trading day in 2016. Save the percentage gain(or loss) calculated as buy_hold_gain and print it out.

Hint: The percentage gain(or loss) is calculated by: $$ \frac{Total Revenue - Total Cost}{Total Cost} $$

Therefore, in this buy and hold strategy, the percentage gain(or loss) is calculated by: $$ \frac{2016LastTradingDayPrice - 2016FirstTradingDayPrice}{2016FirstTradingDayPrice} $$

Now we have to prepare the data needed for our sentiment trading strategy. Since it is possible that some of these tweets were publish after the market trading hours, we would trade TSLA according to the average sentiment of Elon Musk's tweets one day before (t-1, where t is the current date).

In [10]:

#Nothing is required to do in this cell

trad_sentiment_data = pd.concat([stock_data,sentiment_t_df], axis = 1)

#We would trade according to Sentiment_tminus1
trad_sentiment_data["Sentiment_tminus1"] = trad_sentiment_data["Sentiment"].shift(+1)   

trad_sentiment_data

Q3.2 Define a function buy_sell_signal where gives 1 when sentiment>0.5 and gives -1 when sentiment<-0.5. Apply the function on trad_sentiment_data["Sentiment_tminus1"] to create the buy and sell signal. Save the signal in another column trad_sentiment_data["BuySell"].

In here, we have 1 means buy and -1 means sell. When Elon Musk's tweets on yesterday are highly optimistic, we buy TSLA, and when they are highly pessimistic, we sell the stock.

Dropping the NA entries, we can now have a clear picture on the buy sell signals indicated by the sentiments of Elon Musk's tweets.

In [12]:

#Nothing is required to do in this cell

trad_sentiment_data_buysell = trad_sentiment_data[["Adj Close", "Sentiment_tminus1", "BuySell"]]

trad_sentiment_data_buysell = trad_sentiment_data_buysell.dropna()

trad_sentiment_data_buysell

Finally, we want to get the profit of our trading strategy.

Q3.3 By using the provided function percentage_profit_senti(), calculate the profit of our sentiment trading strategy. Save it as senti_gain and print it out.

In [15]:

def percentage_profit_senti(data, last_trading_day_price):
    revenue = 0
    cost = 0
    accu_position = 0
    
    for i in range(0, len(data)):
        if data["BuySell"][i] == 1:     #Buy
            cost = cost + data["Adj Close"][i]
            accu_position = accu_position + 1
            
            
        elif data["BuySell"][i] == -1:  #Sell
            revenue = revenue + data["Adj Close"][i]   #Allow short sell without incurring any interests
            accu_position = accu_position - 1
            
            
    close_position = accu_position
    
    if close_position == 0:
        total_gain = revenue - cost
        
    elif close_position <0:
        cost = cost + last_trading_day_price*abs(close_position) #Close all short position at the end
        total_gain = revenue - cost
        
    elif close_position >0: 
        revenue = revenue + last_trading_day_price*close_position #Close all long position at the end
        total_gain = revenue - cost
        
        
    return total_gain/cost

#Put your answer here

Compare with the buy and hold strategy, we can see that trading with elon musk's tweets can generate way higher percentage in profit!

In [14]:

#Nothing is required to do in this cell

print("The percentage gain of a buy and hold strategy on TSLA in 2016 is", 
      buy_hold_gain*100, "%")
print("The percentage gain of  the sentiment trading strategy on TSLA in 2016 is", 
      senti_gain*100, "%")