Factor Based Analysis in Python

In the last post we performed several steps in downloading and analyzing the fund performance data. We used the Fama French’s 3 factor model to analyze Fidelity Contrafund Fund (FCNTX). In this post we will repeat the same steps without all the explanation. We will try to make things clear using the comments in our code. So lets begin by loading all the modules we will need to run our analysis.

# Pandas to read csv file and other things
import pandas as pd
# Datareader to download price data from Yahoo Finance
import pandas_datareader as web
# Statsmodels to run our multiple regression model
import statsmodels.api as smf
# To download the Fama French data from the web
import urllib.request
# To unzip the ZipFile 
import zipfile

As we did in the last post our eventual goal is to automate the process. So we will build several functions and in the end combine all those into one function that automates the regression analysis for us in one line of code.

Get Fama French Data

def get_fama_french():
    # Web url
    ff_url = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip"
    
    # Download the file and save it
    # We will name it fama_french.zip file
    
    urllib.request.urlretrieve(ff_url,'fama_french.zip')
    zip_file = zipfile.ZipFile('fama_french.zip', 'r')
    
    # Next we extact the file data
    
    zip_file.extractall()
    
    # Make sure you close the file after extraction
    
    zip_file.close()
    
    # Now open the CSV file
    
    ff_factors = pd.read_csv('F-F_Research_Data_Factors.csv', skiprows = 3, index_col = 0)
    # We want to find out the row with NULL value
    # We will skip these rows
    
    ff_row = ff_factors.isnull().any(1).nonzero()[0][0]
    
    # Read the csv file again with skipped rows
    ff_factors = pd.read_csv('F-F_Research_Data_Factors.csv', skiprows = 3, nrows = ff_row, index_col = 0)
    
    # Format the date index
    ff_factors.index = pd.to_datetime(ff_factors.index, format= '%Y%m')
    
    # Format dates to end of month
    ff_factors.index = ff_factors.index + pd.offsets.MonthEnd()
    
    # Convert from percent to decimal
    ff_factors = ff_factors.apply(lambda x: x/ 100)
    return ff_factors

Lets see if the data is downloaded correctly.

ff_data = get_fama_french()
print(ff_data.tail())

##             Mkt-RF     SMB     HML      RF
## 2019-01-31  0.0841  0.0302 -0.0060  0.0021
## 2019-02-28  0.0340  0.0202 -0.0284  0.0018
## 2019-03-31  0.0110 -0.0315 -0.0407  0.0019
## 2019-04-30  0.0396 -0.0170  0.0198  0.0021
## 2019-05-31 -0.0694 -0.0125 -0.0238  0.0021

This looks good. Our data has been downloaded correctly.

Get the Mutual Fund Data

We want our mutual fund price data to align with the fama french data, so we need to get the last date of FF data.

# Last day of FF data
ff_last = ff_data.index[ff_data.shape[0] - 1].date()
# Build the get_price function
# We need 3 arguments, ticker, start and end date
def get_price_data(ticker, start, end):
    price = web.get_data_yahoo(ticker, start, end)
    price = price['Adj Close'] # keep only the Adj Price col
    return price

Lets check if this function works.

# Get Price data for Fidelity's fund
price_data = get_price_data("FCNTX", "1980-01-01", "2019-06-30")
# Make sure to only have data upto last date of Fama French data
price_data = price_data.loc[:ff_last]
print(price_data.tail())

## Date
## 2019-05-24    12.60
## 2019-05-28    12.59
## 2019-05-29    12.47
## 2019-05-30    12.53
## 2019-05-31    12.36
## Name: Adj Close, dtype: float64

As we can see the last date matches with the FF data. Next we need to build the returns calculation function.

Get Returns data

def get_return_data(price_data, period = "M"):
    
    # Resample the data to monthly price
    price = price_data.resample(period).last()
    
    # Calculate the percent change
    ret_data = price.pct_change()[1:]
    
    # convert from series to DataFrame
    ret_data = pd.DataFrame(ret_data)
    
    # Rename the Column
    ret_data.columns = ['portfolio']
    return ret_data
    
ret_data = get_return_data(price_data, "M")
print(ret_data.tail())

##             portfolio
## Date                 
## 2019-01-31   0.094460
## 2019-02-28   0.023960
## 2019-03-31   0.022077
## 2019-04-30   0.048800
## 2019-05-31  -0.057208

Next we need to merge this data with the Fama French data

Merge the portfolio return data with Fama French data

In this step we will merge the data. We also need to rename the columns to something more appropriate. We will also calculate the portfolio excess returns.

# Merging the data
all_data = pd.merge(pd.DataFrame(ret_data),ff_data, how = 'inner', left_index= True, right_index= True)
# Rename the columns
all_data.rename(columns={"Mkt-RF":"mkt_excess"}, inplace=True)
# Calculate the excess returns
all_data['port_excess'] = all_data['portfolio'] - all_data['RF']
print(all_data.tail())

##             portfolio  mkt_excess     ...           RF  port_excess
## 2019-01-31   0.094460      0.0841     ...       0.0021     0.092360
## 2019-02-28   0.023960      0.0340     ...       0.0018     0.022160
## 2019-03-31   0.022077      0.0110     ...       0.0019     0.020177
## 2019-04-30   0.048800      0.0396     ...       0.0021     0.046700
## 2019-05-31  -0.057208     -0.0694     ...       0.0021    -0.059308
## 
## [5 rows x 6 columns]

Run the multiple regression model

Finally our data is ready to run the regression model.

model = smf.formula.ols(formula = "port_excess ~ mkt_excess + SMB + HML", data = all_data).fit()
print(model.params)

## Intercept     0.001178
## mkt_excess    0.893443
## SMB           0.032794
## HML          -0.103232
## dtype: float64

Success!!!

We can see that the results match the results we got from our last post. We have successfully replicated the process in Python. Now you know how to calculate the alpha and beta of any portfolio returns against the Fama & French’s 3 factors model.

Finally lets combine all these functions into one function that automates our analysis in the future.

def run_reg_model(ticker,start,end):
    # Get FF data
    ff_data = get_fama_french()
    ff_last = ff_data.index[ff_data.shape[0] - 1].date()
    #Get the fund price data
    price_data = get_price_data(ticker,start,end)
    price_data = price_data.loc[:ff_last]
    ret_data = get_return_data(price_data, "M")
    all_data = pd.merge(pd.DataFrame(ret_data),ff_data, how = 'inner', left_index= True, right_index= True)
    all_data.rename(columns={"Mkt-RF":"mkt_excess"}, inplace=True)
    all_data['port_excess'] = all_data['portfolio'] - all_data['RF']
    # Run the model
    model = smf.formula.ols(formula = "port_excess ~ mkt_excess + SMB + HML", data = all_data).fit()
    return model.params

Finally here comes the one line code to download and analyze a new fund. Lets test this on the same Goldman Sachs’s Strategic Growth Fund (GGRAX).

ggrax_model = run_reg_model("GGRAX", start = "1999-05-01", end = "2019-06-30")
print(ggrax_model)

## Intercept    -0.000192
## mkt_excess    1.024486
## SMB          -0.191246
## HML          -0.173825
## dtype: float64

Great!!! It works on a different fund as well. So we have just build a powerful function that can download any public fund data and find the regression results against the Fama French 3 factors model.

(The slight difference in alpha compared to the last post is due to missing May 1999 returns which was included in the code we did in R.)