We will perform the same step as we did in the last post. This time we will do it in Python. We will need a few libraries to download our data. We will use the
urllib.request for downloading the file and
zipfile to extract the content of the zip file.
import urllib.request import zipfile
Once we have imported the modules, we are ready to download the files from the website.
# Create the download url ff_url = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip" # Download the file and save it # We will name it fama_french.zip file urllib.request.urlretrieve(ff_url,'fama_french.zip')
Next we will open the file
# Use the zilfile package to load the contents, here we are # Reading the file zip_file = zipfile.ZipFile('fama_french.zip', 'r') # Next we extact the file data # We will call it ff_factors.csv zip_file.extractall('ff_factors.csv') # Make sure you close the file after extraction zip_file.close()
Next we can use pandas package to read the csv file. We already know that the first 3 rows are unnecessary so we will skip those rows.
import pandas as pd ff_factors = pd.read_csv('F-F_Research_Data_Factors.csv', skiprows = 3) print(ff_factors.head())
## Unnamed: 0 Mkt-RF SMB HML RF ## 0 192607 2.96 -2.30 -2.87 0.22 ## 1 192608 2.64 -1.40 4.19 0.25 ## 2 192609 0.36 -1.32 0.01 0.23 ## 3 192610 -3.24 0.04 0.51 0.32 ## 4 192611 2.53 -0.20 -0.35 0.31
Success Again this data needs to be cleaned, which we will do in the next post.