Typically we will just be either reading csv files directly or using pandas-datareader or quandl. Consider this lecture just a quick overview of what is possible with pandas (we won't be working with SQL or excel files in this course). Pandas is not the best way to work with SQL. Use other specialized libray instead.
Pandas can read and write to a variety of file types using its pd.read_
and pd.to_
methods respectively.
To get a list of file formats that pandas can read/write to, type pd.read_
/pd.to_
followed by the tab
key
import numpy as np
import pandas as pd
df = pd.read_csv('example')
df
Use to_csv()
to write a DataFrame to csv file. Pass the argument index=False
if you don't want to save the index 0, 1, .. 3 as a column in csv file.
df.to_csv('example',index=False)
Pandas can read and write excel files, keep in mind, pandas only imports data, not formulas or images. Having images or macros may cause this read_excel()
method to crash.
You may be prompted to install xlrd
library for reading excel file, just type conda install xlrd
in command line terminal to install
Use read_excel()
to read excel file. Pandas view each worksheet as a DataFrame
pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
Use to_excel()
to write a dataframe to excel file
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:
conda install lxml
conda install html5lib
conda install BeautifulSoup4
Then restart Jupyter Notebook.
(or use pip install
if you aren't using the Anaconda Distribution)
Pandas can read table tabs off of html. For example:
Pandas read_html()
function will read tables off of a webpage and return a list of DataFrame objects:
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
type(df)
df[0].head()