Operations¶

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:

Grab the First/Last n Rows with head() or tail()¶

Use head(n=5) to find the first n rows in the DataFrame. Use tail(n=5) to get the last n rows in the DataFrame. The default is 5 rows (n=5)

import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()

Finding Info on Unique Values¶

The unique() Method¶

Use the unique() method to find unique values in a DataFrame

df['col2'].unique()

array([444, 555, 666], dtype=int64)

The nunique() Method¶

Use the nunique() method to find the count of unique values in a DataFrame

df['col2'].nunique()

3

The value_counts() Method¶

The value_counts() method gives you a table of unique values and how many times these values show up

df['col2'].value_counts()

444    2
555    1
666    1
Name: col2, dtype: int64

Selecting Data¶

Pass the conditional selection statement to the DataFrame. The conditional selection statements composed of a list of boolean values [False, False, True, ..., True]

Select from DataFrame using criteria from multiple columns

newdf = df[(df['col1']>2) & (df['col2']==444)]
newdf

The apply() Method¶

The apply() method enables you to apply your own custom functions or built-in functions to a DataFrame

Applying custom function

def times2(x):
    return x*2

This will broadcast the function to column 1

df['col1'].apply(times2)

0    2
1    4
2    6
3    8
Name: col1, dtype: int64

Alternatively, you can apply a lambda function

df['col2'].apply(lambda x:x*2)

0     888
1    1110
2    1332
3     888
Name: col2, dtype: int64

Applying built-in function

df['col3'].apply(len)

0    3
1    3
2    3
3    3
Name: col3, dtype: int64

df['col1'].sum()

10

Permanently Removing a Column¶

df.drop('col1', axis=1, inplace=True)

del df['col1']

df

Get Column Names and Index Info¶

Use the member variable .columns to get the column names

df.columns

Index(['col2', 'col3'], dtype='object')

Use the member variable .index to get the start, stop and step size of an index

df.index

RangeIndex(start=0, stop=4, step=1)

Sorting and Ordering a DataFrame¶

df

Use sort_values() to sort by column or by row. Note that inplace=False by default.

df.sort_values(by='col2')

Alternatively,

df.sort_values('col2')

The sorting order ascending by default Ascending=True. Use Ascending=False to sort in descending order

df.sort_values('col2', ascending=False)

Find Null Values or Check for Null Values¶

Find Null Values

To find null/missing values in a DataFrame, use isnull() which returns boolean values.

df.isnull()

Drop rows with NaN Values

df.dropna()

Filling in NaN values with something else:

import numpy as np

df = pd.DataFrame({'col1':[1,2,3,np.nan],
                   'col2':[np.nan,555,666,444],
                   'col3':['abc','def','ghi','xyz']})
df.head()

df.fillna('FILL')

Create a Pivot Table¶

data = {'A':['foo','foo','foo','bar','bar','bar'],
     'B':['one','one','two','two','one','one'],
       'C':['x','y','x','y','x','y'],
       'D':[1,3,2,5,4,1]}

df = pd.DataFrame(data)

df

Use pivot_table() to create a pivot table. A pivot table with multi-level index

df.pivot_table(values='D',index=['A', 'B'],columns=['C'])

	col1	col2	col3
0	1	444	abc
1	2	555	def
2	3	666	ghi
3	4	444	xyz

	col2	col3
0	444	abc
1	555	def
2	666	ghi
3	444	xyz

	col2	col3
0	444	abc
1	555	def
2	666	ghi
3	444	xyz

	col2	col3
0	444	abc
3	444	xyz
1	555	def
2	666	ghi

Table of Contents