Typically we will just be either reading csv files directly or using pandas-datareader or quandl. Consider this lecture just a quick overview of what is possible with pandas (we won't be working with SQL or excel files in this course). Pandas is not the best way to work with SQL. Use other specialized libray instead.
Pandas can read and write to a variety of file types using its pd.read_ and pd.to_ methods respectively.
To get a list of file formats that pandas can read/write to, type pd.read_/pd.to_ followed by the tab key
import numpy as np
import pandas as pd
df = pd.read_csv('example')
df
Use to_csv() to write a DataFrame to csv file. Pass the argument index=False if you don't want to save the index 0, 1, .. 3 as a column in csv file.
df.to_csv('example',index=False)
Pandas can read and write excel files, keep in mind, pandas only imports data, not formulas or images. Having images or macros may cause this read_excel() method to crash.
You may be prompted to install xlrd library for reading excel file, just type conda install xlrd in command line terminal to install
Use read_excel() to read excel file. Pandas view each worksheet as a DataFrame
pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
Use to_excel() to write a dataframe to excel file
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:
conda install lxml
conda install html5lib
conda install BeautifulSoup4
Then restart Jupyter Notebook.
(or use pip install if you aren't using the Anaconda Distribution)
Pandas can read table tabs off of html. For example:
Pandas read_html() function will read tables off of a webpage and return a list of DataFrame objects:
df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
type(df)
df[0].head()