We will perform the same step as we did in the last post. This time we will do it in Python. We will need a few libraries to download our data. We will use the urllib.request for downloading the file and zipfile to extract the content of the zip file.

import urllib.request
import zipfile

Once we have imported the modules, we are ready to download the files from the website.

# Create the download url
ff_url = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip"
# Download the file and save it
# We will name it fama_french.zip file
urllib.request.urlretrieve(ff_url,'fama_french.zip')

Next we will open the file

# Use the zilfile package to load the contents, here we are
# Reading the file
zip_file = zipfile.ZipFile('fama_french.zip', 'r')
# Next we extact the file data
# We will call it ff_factors.csv
zip_file.extractall('ff_factors.csv')
# Make sure you close the file after extraction
zip_file.close()

Next we can use pandas package to read the csv file. We already know that the first 3 rows are unnecessary so we will skip those rows.

import pandas as pd
ff_factors = pd.read_csv('F-F_Research_Data_Factors.csv', skiprows = 3)
print(ff_factors.head())
##   Unnamed: 0    Mkt-RF       SMB       HML        RF
## 0     192607      2.96     -2.30     -2.87      0.22
## 1     192608      2.64     -1.40      4.19      0.25
## 2     192609      0.36     -1.32      0.01      0.23
## 3     192610     -3.24      0.04      0.51      0.32
## 4     192611      2.53     -0.20     -0.35      0.31

Success Again this data needs to be cleaned, which we will do in the next post.