Groupby¶

The groupby method allows you to group together rows based on a column and perform an aggregate function on them.

Create dataframe

import pandas as pd

data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}

df = pd.DataFrame(data)
df

Now you can use the .groupby() method to group rows together based off of a column name. For instance let's group based off of Company. This will create a DataFrameGroupBy object:

df.groupby('Company')

<pandas.core.groupby.DataFrameGroupBy object at 0x02806CD0>

You can save this object as a new variable:

by_comp = df.groupby("Company")

And then call aggregate methods off the object:

Get the average sales by each company

Pandas automatically ignore the non-numeric column "Person"

by_comp.mean()

df.groupby('Company').mean()

Using loc() with groupby()¶

df.groupby('Company').mean().loc['FB']

Sales    296.5
Name: FB, dtype: float64

More examples of aggregate methods:¶

by_comp.std()

Note that "Person" is returned as well. Python is able to sort in descending order

by_comp.min()

Note that "Person" is returned as well. Python is able to sort in ascending order

by_comp.max()

Note that Pandas will count "Person" as well

by_comp.count()

Using describe() with groupby()¶

The function describe() returns the count, mean, std, min, max and quartile values

by_comp.describe()

Transpose describe()¶

by_comp.describe().transpose()

by_comp.describe().transpose()['GOOG']

		Sales
Company
FB	count	2.000000
	mean	296.500000
	std	75.660426
	min	243.000000
	25%	269.750000
	50%	296.500000
	75%	323.250000
	max	350.000000
GOOG	count	2.000000
	mean	160.000000
	std	56.568542
	min	120.000000
	25%	140.000000
	50%	160.000000
	75%	180.000000
	max	200.000000
MSFT	count	2.000000
	mean	232.000000
	std	152.735065
	min	124.000000
	25%	178.000000
	50%	232.000000
	75%	286.000000
	max	340.000000

Company	FB								GOOG					MSFT
	count	mean	std	min	25%	50%	75%	max	count	mean	...	75%	max	count	mean	std	min	25%	50%	75%	max
Sales	2.0	296.5	75.660426	243.0	269.75	296.5	323.25	350.0	2.0	160.0	...	180.0	200.0	2.0	232.0	152.735065	124.0	178.0	232.0	286.0	340.0

	Company	Person	Sales
0	GOOG	Sam	200
1	GOOG	Charlie	120
2	MSFT	Amy	340
3	MSFT	Vanessa	124
4	FB	Carl	243
5	FB	Sarah	350

Table of Contents

Groupby¶

Using loc() with groupby()¶

More examples of aggregate methods:¶

Using describe() with groupby()¶

Transpose describe()¶