Typically we will just be either reading csv files directly or using pandas-datareader or quandl. Consider this lecture just a quick overview of what is possible with pandas (we won't be working with SQL or excel files in this course). Pandas is not the best way to work with SQL. Use other specialized libray instead.

Data Input and Output

Pandas can read and write to a variety of file types using its pd.read_ and pd.to_ methods respectively.

To get a list of file formats that pandas can read/write to, type pd.read_/pd.to_ followed by the tab key

In [3]:
import numpy as np
import pandas as pd

CSV

CSV Input

Use read_csv() to read csv files as a DataFrame

In [4]:
df = pd.read_csv('example')
df
Out[4]:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15

CSV Output

Use to_csv() to write a DataFrame to csv file. Pass the argument index=False if you don't want to save the index 0, 1, .. 3 as a column in csv file.

In [5]:
df.to_csv('example',index=False)

Excel

Pandas can read and write excel files, keep in mind, pandas only imports data, not formulas or images. Having images or macros may cause this read_excel() method to crash.

You may be prompted to install xlrd library for reading excel file, just type conda install xlrd in command line terminal to install

Excel Input

Use read_excel() to read excel file. Pandas view each worksheet as a DataFrame

In [7]:
pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
Out[7]:
a b c d
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15

Excel Output

Use to_excel() to write a dataframe to excel file

In [8]:
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')

HTML

You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:

conda install lxml
conda install html5lib
conda install BeautifulSoup4

Then restart Jupyter Notebook. (or use pip install if you aren't using the Anaconda Distribution)

Pandas can read table tabs off of html. For example:

HTML Input

Pandas read_html() function will read tables off of a webpage and return a list of DataFrame objects:

In [9]:
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
In [10]:
type(df)
Out[10]:
list
In [12]:
df[0].head()
Out[12]:
Bank Name City ST CERT Acquiring Institution Closing Date Updated Date
0 Washington Federal Bank for Savings Chicago IL 30570 Royal Savings Bank December 15, 2017 February 21, 2018
1 The Farmers and Merchants State Bank of Argonia Argonia KS 17719 Conway Bank October 13, 2017 February 21, 2018
2 Fayette County Bank Saint Elmo IL 1802 United Fidelity Bank, fsb May 26, 2017 July 26, 2017
3 Guaranty Bank, (d/b/a BestBank in Georgia & Mi... Milwaukee WI 30003 First-Citizens Bank & Trust Company May 5, 2017 July 26, 2017
4 First NBC Bank New Orleans LA 58302 Whitney Bank April 28, 2017 December 5, 2017