Missing Data¶

Let's show a few convenient methods to deal with Missing Data in pandas:

import numpy as np
import pandas as pd

Create a DataFrame from a Dictionary¶

The key in a dictionary are the columns. Use np.nan to signify missing / null value

df = pd.DataFrame({'A':[1,2,np.nan],
                  'B':[5,np.nan,np.nan],
                  'C':[1,2,3]})

df

The dropna() Method¶

Use dropna() to remove ROW(S) with null/missing value(s)

df.dropna()

Use dropna(axis=1) to remove COLUMN(S) with null/missing value(s)

df.dropna(axis=1)

Use the threshold thresh argument to specify a minimum of non-na values. The row will be kept if the number of non-na values >= number specified in threshold

df.dropna(thresh=2)

The fillna() Method¶

Missing value are indicated by NaN. We can replace the missing value with fillna()

df.fillna(value='FILL VALUE')

Set the fill value to be the mean of the column

df['A'].fillna(value=df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64

	A	B	C
0	1.0	5.0	1
1	2.0	NaN	2
2	NaN	NaN	3

	A	B	C
0	1.0	5.0	1
1	2.0	NaN	2

	A	B	C
0	1	5	1
1	2	FILL VALUE	2
2	FILL VALUE	FILL VALUE	3

Table of Contents

Missing Data¶

Create a DataFrame from a Dictionary¶

The dropna() Method¶

The fillna() Method¶