Let's show a few convenient methods to deal with Missing Data in pandas:
import numpy as np
import pandas as pd
The key
in a dictionary are the columns. Use np.nan
to signify missing / null value
df = pd.DataFrame({'A':[1,2,np.nan],
'B':[5,np.nan,np.nan],
'C':[1,2,3]})
df
Use dropna()
to remove ROW(S) with null/missing value(s)
df.dropna()
Use dropna(axis=1)
to remove COLUMN(S) with null/missing value(s)
df.dropna(axis=1)
Use the threshold thresh
argument to specify a minimum of non-na values. The row will be kept if the number of non-na values >= number specified in threshold
df.dropna(thresh=2)
Missing value are indicated by NaN
. We can replace the missing value with fillna()
df.fillna(value='FILL VALUE')
Set the fill value to be the mean of the column
df['A'].fillna(value=df['A'].mean())