The groupby method allows you to group together rows based on a column and perform an aggregate function on them.
Create dataframe
import pandas as pd
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
'Sales':[200,120,340,124,243,350]}
df = pd.DataFrame(data)
df
Now you can use the .groupby() method to group rows together based off of a column name. For instance let's group based off of Company. This will create a DataFrameGroupBy object:
df.groupby('Company')
You can save this object as a new variable:
by_comp = df.groupby("Company")
And then call aggregate methods off the object:
Get the average sales by each company
by_comp.mean()
df.groupby('Company').mean()
df.groupby('Company').mean().loc['FB']
by_comp.std()
Note that "Person" is returned as well. Python is able to sort in descending order
by_comp.min()
Note that "Person" is returned as well. Python is able to sort in ascending order
by_comp.max()
Note that Pandas will count "Person" as well
by_comp.count()
The function describe()
returns the count, mean, std, min, max and quartile values
by_comp.describe()
by_comp.describe().transpose()
by_comp.describe().transpose()['GOOG']