Calculating financial returns in R

One of the most important tasks in financial markets is to analyze historical returns on various investments. To perform this analysis we need historical data for the assets. There are many data providers, some are free most are paid. In this chapter we will use the data from Yahoo’s finance website.

In this post we will:

  1. Download prices
  2. Calculate Returns
  3. Calculate mean and standard deviation of returns

Lets load the libraries first.

library(tidyquant)
library(timetk)

We will get the price for first for Netflix price.

netflix <- tq_get("NFLX",                    
                  from = '2009-01-01',
                  to = "2018-03-01",
                  get = "stock.prices")

Next we will plot the adjusted price of Netflix.

netflix %>%
  ggplot(aes(x = date, y = adjusted)) +
  geom_line() +
  ggtitle("Netflix since 2009") +
  labs(x = "Date", "Price") +
  scale_x_date(date_breaks = "years", date_labels = "%Y") +
  labs(x = "Date", y = "Adjusted Price") +
  theme_bw()

Calculating the daily and monthly returns for individual stock

Once we downloaded the stock prices from yahoo finance, the next thing to do is to calculate the returns. We will again use tidyquant package to do the calculations. We have already download the price data for Netflix above, if you haven’t done that then see the above section.

# Calculate daily returns

netflix_daily_returns <- netflix %>%
  tq_transmute(select = adjusted,           # this specifies which column to select   
               mutate_fun = periodReturn,   # This specifies what to do with that column
               period = "daily",      # This argument calculates Daily returns
               col_rename = "nflx_returns") # renames the column

#Calculate monthly returns just change the argument "period"

netflix_monthly_returns <- netflix %>%
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = "monthly",      # This argument calculates Monthly returns
               col_rename = "nflx_returns")

Charting the daily and monthly for Netflix

# We will use a line chart for daily returns

netflix_daily_returns %>%
  ggplot(aes(x = date, y = nflx_returns)) +
  geom_line() +
  theme_classic() +
  labs(x = "Date", y = "Daily returns") +
  ggtitle("Daily Returns for Netflix") +
  scale_x_date(date_breaks = "years", date_labels = "%Y") +
  scale_y_continuous(breaks = seq(-0.5,0.6,0.05),
                     labels = scales::percent) 

After looking at the daily returns chart for Netflix we can conclude that the returns are quite volatile and the stock can move +/- 5% on any given day. To get a sense of how extreme the returns can be we can plot a histogram.

netflix_daily_returns %>%
  ggplot(aes(x = nflx_returns)) +
  geom_histogram(binwidth = 0.015) +
  theme_classic() +
  labs(x = "Daily returns") +
  ggtitle("Daily Returns for Netflix") +
  scale_x_continuous(breaks = seq(-0.5,0.6,0.05),
                     labels = scales::percent) +
  annotate(geom = 'text', x = -0.30, y= 200, label = "Extremely\nnegative\nreturns") +
  annotate(geom = 'segment', x = -0.305, xend = -0.35,  y = 120, yend = 20, color = 'red', arrow = arrow()) +
  annotate(geom = 'segment', x = 0.405, xend = 0.42,  y = 120, 
           yend = 20, color = 'blue', arrow = arrow(type = "open")) +
  annotate(geom = 'text', x = 0.430, y = 200, label = "Extremely\npositive\nreturns")

Next we can plot the monthly returns for Netflix since 2009. We will use the bar chart to plot the data.

# Charting the monthly returns for Netflix. Using bar charts

netflix_monthly_returns %>%
  ggplot(aes(x = date, y = nflx_returns)) +
  geom_bar(stat = "identity") +
  theme_classic() +
  labs(x = "Date", y = "Monthly returns") +
  ggtitle("Monthly Returns for Netflix") +
  geom_hline(yintercept = 0) +
  scale_y_continuous(breaks = seq(-0.6,0.8,0.1),
                     labels = scales::percent) +
  scale_x_date(date_breaks = "years", date_labels = "%Y")

Calculating the cumulative returns for the Netflix stock

Plotting the daily and monthly returns are useful for understanding the daily and monthly volatility of the investment. To calculate the growth of our investment or in other word, calculating the total returns from our investment, we need to calculate the cumulative returns from that investment. To calculate the cumulative returns we will use the cumprod() function.

netflix_cum_returns <- netflix_daily_returns %>%
  mutate(cr = cumprod(1 + nflx_returns)) %>%      # using the cumprod function
  mutate(cumulative_returns = cr - 1)
netflix_cum_returns %>%
  ggplot(aes(x = date, y = cumulative_returns)) +
  geom_line() +
  theme_classic() +
  labs(x = "Date", y = "Cumulative Returns") +
  ggtitle("Cumulative returns for Netflix since 2009",
          subtitle = "$1 investment in 2009 grew to $85")

This chart shows the cumulative returns since 2009 for Netflix. With the power of hindsight, one could have made $85 on a $1 investment since 2009. That is quite a remarkable performance. But as we know its easier said then done. During the 10 year or so period there were times when the investment lost 50% of its value during the Qwickster fiasco. Very few investors can hold onto investments through such periods.

netflix_monthly_returns %>%
  mutate(cr = cumprod(1 + nflx_returns)) %>%
  mutate(cumulative_returns = cr - 1) %>%
  ggplot(aes(x = date, y = cumulative_returns)) +
  geom_line() +
  theme_classic() +
  labs(x = "Date", y = "Cumulative Returns") +
  ggtitle("Cumulative returns for Netflix since 2010")

We can visualize that the monthly returns chart is much more smoother than the daily chart.

Multiple stocks

Downloading stock market data for multiple stocks.

# Setting our stock symbols to a variable

tickers <- c("FB", "AMZN", "AAPL", "NFLX", "GOOG") 

# Dowload the stock price data

multpl_stocks <- tq_get(tickers,
                        from = "2013-01-01",
                        to = "2018-03-01",
                        get = "stock.prices")

Charting the stock prices for multiple stocks

Next we will chart the stock prices for multiple stocks

multpl_stocks %>%
  ggplot(aes(x = date, y = adjusted, color = symbol)) +
  geom_line() +
  ggtitle("Price chart for multiple stocks")

This is not the result we expected. Since these stocks have huge price differences (FB is under 165 and AMZN is above 1950), their scales are different. We can overcome this problem by plotting stocks on their own individual y scale.

multpl_stocks %>%
  ggplot(aes(x = date, y = adjusted)) +
  geom_line() +
  facet_wrap(~symbol, scales = "free_y") +  # facet_wrap is used to make diff frames
  theme_classic() +       # using a new theme
  labs(x = "Date", y = "Price") +
  ggtitle("Price chart FAANG stocks")

### Calculating the returns for multiple stocks

Calculating the the returns for multiple stocks is just as easy as the single stock. Here were just need to pass an additional argument. We need to use the argument group_by(symbol) to calculate the returns for individual stocks.

#Calculating the daily returns for multiple stocks

multpl_stock_daily_returns <- multpl_stocks %>%
  group_by(symbol) %>%                            # We are grouping the stocks by the stock symbol
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = 'daily',
               col_rename = 'returns')

#Calculating the monthly returns for multiple stocks

multpl_stock_monthly_returns <- multpl_stocks %>%
  group_by(symbol) %>%                             # We are grouping the stocks by symbol
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = 'monthly',
               col_rename = 'returns')

Charting the returns for multiple stocks

Once we have the returns calculation we can plot the returns on the chart.

multpl_stock_daily_returns %>%
  ggplot(aes(x = date, y = returns)) +
  geom_line() +
  geom_hline(yintercept = 0) +
  facet_wrap(~symbol, scales = "free_y") +
  scale_y_continuous(labels = scales::percent) +
  ggtitle("Daily returns for FAANG stock") +
  labs(x = "Date", y = "Returns") +
  scale_color_brewer(palette = "Set2",
                     name = "",
                     guide = FALSE) +
  theme_classic()

multpl_stock_monthly_returns %>%
  ggplot(aes(x = date, y = returns)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 0) +
  facet_wrap(~symbol, scales = "free_y") +
  scale_y_continuous(labels = scales::percent,
                     breaks = seq(-0.5,0.75,0.05)) +
  ggtitle("Monthly returns for FAANG stock") +
  labs(x = "Date", y = "Returns") +
  scale_fill_brewer(palette = "Set1",   # We will give them different colors instead of black
                     name = "",
                     guide = FALSE) +
  theme_classic()

Apple is the least volatile of the FAANG stocks and Facebook and Netflix are the most volatile. This is obvious with the kind of business they are in. Apple is a steady business, with stable cash flows. Its products are liked and used by millions people and they have enormous loyalty towards Apple. Netflix and Facebook are also incredible businesses but they are in the high growth phase and any issues (earnings or user growth decline) can affect the stock materially.

Calculating Cumulative returns for multiple stocks

Often we would like to see which investment produced the best results in the past. To do this we can calculate the cumulative results. Below we compare the investment result for all the FAANG stocks since 2013. Which was the best investment since 2013?

multpl_stock_monthly_returns %>%
  mutate(returns = if_else(date == "2013-01-31", 0, returns)) %>%
  group_by(symbol) %>%  # Need to group multiple stocks
  mutate(cr = cumprod(1 + returns)) %>%
  mutate(cumulative_returns = cr - 1) %>%
  ggplot(aes(x = date, y = cumulative_returns, color = symbol)) +
  geom_line() +
  labs(x = "Date", y = "Cumulative Returns") +
  ggtitle("Cumulative returns for all since 2013") +
  scale_y_continuous(breaks = seq(0,20,2),
                     labels = scales::percent) +
  scale_color_brewer(palette = "Set1",
                     name = "") +
  theme_bw()

Not surprisingly, Netflix had the best returns since 2013. Amazon and Facebook come in distant second and third. The most surprising result is Google. It has severely under performed the other stocks in the FAANG group. Maybe the market participants are worried about its spending on the moon shot projects (Google glass, X Labs, Waymo etc). Whether these projects can produce results is yet to be seen.

A contrarian could argue that given the investments in the future projects, Google is currently undervalued an could be the better investment among the FAANG stocks.

Statistical Data

Calculating the Mean, standard deviation for Individual Stock

We already have the daily and monthly returns data for Netflix. Now we we will calculate the daily and monthly mean and standard deviations of the returns. We will use mean() and sd() functions for our purpose.

# Calculating the mean
nflx_daily_mean_ret <- netflix_daily_returns %>%
  select(nflx_returns) %>%
  .[[1]] %>%
  mean(na.rm = TRUE)

nflx_monthly_mean_ret <- netflix_monthly_returns %>%
  select(nflx_returns) %>%
  .[[1]] %>%
  mean(na.rm = TRUE)

# Calculating the standard deviation

nflx_daily_sd_ret <- netflix_daily_returns %>%
  select(nflx_returns) %>%
  .[[1]] %>%
  sd()

nflx_monthly_sd_ret <- netflix_monthly_returns %>%
  select(nflx_returns) %>%
  .[[1]] %>%
  sd()

nflx_stat <- tibble(period = c("Daily", "Monthly"),
                    mean = c(nflx_daily_mean_ret, nflx_monthly_mean_ret),
                    sd = c(nflx_daily_sd_ret, nflx_monthly_sd_ret))

nflx_stat 
## # A tibble: 2 x 3
##   period     mean     sd
##   <chr>     <dbl>  <dbl>
## 1 Daily   0.00240 0.0337
## 2 Monthly 0.0535  0.176

We can see that Netflix has an average daily returns of 0.2% and 3.3% standard deviation. Its monthly mean returns is 5.2% and 17% standard deviation. This data is for the entire period since 2009. What if we want to calculate the mean and standard deviations for each year. We can calculate this by grouping the Netflix returns data by year and performing our calculations.

netflix_monthly_returns %>%
  mutate(year = year(date)) %>%
  group_by(year) %>%
  summarise(Monthly_Mean_Returns = mean(nflx_returns),
            MOnthly_Standard_Deviation = sd(nflx_returns)) 
## # A tibble: 10 x 3
##     year Monthly_Mean_Returns MOnthly_Standard_Deviation
##    <dbl>                <dbl>                      <dbl>
##  1  2009              0.0566                      0.0987
##  2  2010              0.110                       0.142 
##  3  2011             -0.0492                      0.209 
##  4  2012              0.0562                      0.289 
##  5  2013              0.137                       0.216 
##  6  2014              0.00248                     0.140 
##  7  2015              0.0827                      0.148 
##  8  2016              0.0138                      0.126 
##  9  2017              0.0401                      0.0815
## 10  2018              0.243                       0.233

We can also plot the result for better understanding.

netflix_monthly_returns %>%
  mutate(year = year(date)) %>%
  group_by(year) %>%
  summarise(Mean_Returns = mean(nflx_returns),
            Standard_Deviation = sd(nflx_returns)) %>%
  gather(Mean_Returns, Standard_Deviation, key = statistic, value = value) %>%
  ggplot(aes(x = year, y = value, fill = statistic)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_y_continuous(breaks = seq(-0.1,0.4,0.02),
                     labels = scales::percent) +
  scale_x_continuous(breaks = seq(2009,2018,1)) +
  labs(x = "Year", y = "") +
  theme_bw() +
  theme(legend.position = "top") +
  scale_fill_brewer(palette = "Set1",
                    name = "",
                    labe = c("Mean", "Standard Deviation")) +
  ggtitle("Netflix Monthly Mean and Standard Deviation since 2009")

As we can see, the monthly returns and standard deviations have fluctuated a lot since 2009. In 2011 the average monthly return was -5%. This was the year when Netflix had the Qwickster fiasco. The stock lost about 75% during this year.

Calculating the Mean, standard deviation for Multiple Stocks

Next we can calculate the mean and standard deviations for multiple stocks.

multpl_stock_daily_returns %>%
  group_by(symbol) %>%
  summarise(mean = mean(returns),
            sd = sd(returns)) 
## # A tibble: 5 x 3
##   symbol     mean     sd
##   <chr>     <dbl>  <dbl>
## 1 AAPL   0.00100  0.0153
## 2 AMZN   0.00153  0.0183
## 3 FB     0.00162  0.0202
## 4 GOOG   0.000962 0.0141
## 5 NFLX   0.00282  0.0300
multpl_stock_monthly_returns %>%
  group_by(symbol) %>%
  summarise(mean = mean(returns),
            sd = sd(returns)) 
## # A tibble: 5 x 3
##   symbol   mean     sd
##   <chr>   <dbl>  <dbl>
## 1 AAPL   0.0213 0.0725
## 2 AMZN   0.0320 0.0800
## 3 FB     0.0339 0.0900
## 4 GOOG   0.0198 0.0568
## 5 NFLX   0.0614 0.157

Calculating the yearly mean and standard deviation of returns.

multpl_stock_monthly_returns %>%
  mutate(year = year(date)) %>%
  group_by(symbol, year) %>%
  summarise(mean = mean(returns),
            sd = sd(returns))
## # A tibble: 30 x 4
## # Groups:   symbol [?]
##    symbol  year      mean     sd
##    <chr>  <dbl>     <dbl>  <dbl>
##  1 AAPL    2013  0.0210   0.0954
##  2 AAPL    2014  0.0373   0.0723
##  3 AAPL    2015 -0.000736 0.0629
##  4 AAPL    2016  0.0125   0.0752
##  5 AAPL    2017  0.0352   0.0616
##  6 AAPL    2018  0.0288   0.0557
##  7 AMZN    2013  0.0391   0.0660
##  8 AMZN    2014 -0.0184   0.0706
##  9 AMZN    2015  0.0706   0.0931
## 10 AMZN    2016  0.0114   0.0761
## # ... with 20 more rows

We can also plot this statistical data.

multpl_stock_monthly_returns %>%
  mutate(year = year(date)) %>%
  group_by(symbol, year) %>%
  summarise(mean = mean(returns),
            sd = sd(returns)) %>%
  ggplot(aes(x = year, y = mean, fill = symbol)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +
  scale_y_continuous(breaks = seq(-0.1,0.4,0.02),
                     labels = scales::percent) +
  scale_x_continuous(breaks = seq(2009,2018,1)) +
  labs(x = "Year", y = "Mean Returns") +
  theme_bw() +
  theme(legend.position = "top") +
  scale_fill_brewer(palette = "Set1",
                    name = "Stocks") +
  ggtitle("Monthly Mean returns for FAANG stocks")

multpl_stock_monthly_returns %>%
  mutate(year = year(date)) %>%
  group_by(symbol, year) %>%
  summarise(mean = mean(returns),
            sd = sd(returns)) %>%
  ggplot(aes(x = year, y = sd, fill = symbol)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +
  scale_y_continuous(breaks = seq(-0.1,0.4,0.02),
                     labels = scales::percent) +
  scale_x_continuous(breaks = seq(2009,2018,1)) +
  labs(x = "Year", y = "Std Dev") +
  theme_bw() +
  theme(legend.position = "top") +
  scale_fill_brewer(palette = "Set1",
                    name = "Stocks") +
  ggtitle("Monthly Standard Deviation returns for FAANG stocks")

Calculating covariance and correlation for Multiple Stocks

Another important statistical calculation is the correlation and covariance of the stocks. For calculating these stats, we need to modify our data. We will need to spread the returns in such a way that they are in the wide format. Then we will convert it to an xts object.

The Covariance table.

# Calculating the Covariance
multpl_stock_monthly_returns %>%
  spread(symbol, value = returns) %>%
  tk_xts(silent = TRUE) %>%
  cov()
##               AAPL        AMZN          FB         GOOG          NFLX
## AAPL  5.254736e-03 0.001488462 0.000699818 0.0007420307 -1.528193e-05
## AMZN  1.488462e-03 0.006399439 0.001418561 0.0028531565  4.754894e-03
## FB    6.998180e-04 0.001418561 0.008091594 0.0013566480  3.458228e-03
## GOOG  7.420307e-04 0.002853157 0.001356648 0.0032287790  3.529245e-03
## NFLX -1.528193e-05 0.004754894 0.003458228 0.0035292451  2.464202e-02

The correlation table

# Calculating the correlation

multpl_stock_monthly_returns %>%
  spread(symbol, value = returns) %>%
  tk_xts(silent = TRUE) %>%
  cor() 
##              AAPL      AMZN        FB      GOOG         NFLX
## AAPL  1.000000000 0.2566795 0.1073230 0.1801471 -0.001342964
## AMZN  0.256679539 1.0000000 0.1971334 0.6276759  0.378644485
## FB    0.107322952 0.1971334 1.0000000 0.2654184  0.244905437
## GOOG  0.180147089 0.6276759 0.2654184 1.0000000  0.395662114
## NFLX -0.001342964 0.3786445 0.2449054 0.3956621  1.000000000

We can plot this using a corrplot() package to chart the correlation matrix chart.

library(corrplot)
## corrplot 0.84 loaded
multpl_stock_monthly_returns %>%
  spread(symbol, value = returns) %>%
  tk_xts(silent = TRUE) %>%
  cor() %>%
  corrplot()

Summary

We did a lot in this port.

  1. Download prices
  2. Calculate returns
  3. calculate mean and standard deviations
  4. calculate the correlation and covariance of stocks.