Time Resampling¶

We will usually get data that has a DateTime Index on a smaller time scale (every day, every hour, etc…). However, it is often a good idea to aggregate the data based off some frequency (monthly, quarterly, etc…)

While using groupby could solve part of this issue, a simple groupby isn’t “smart” enough to understand things like business quarters or business year starts. Luckily, pandas has frequency sampling tools to help with this!

Imports¶

import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

Read CSV File¶

Use read_csv() to read CSV file

df = pd.read_csv('time_data/walmart_stock.csv')

Faster alternative¶

If your have a standard date format %y%m%d, you can use this method. Simply pass arguments index_col='Date' and parse_dates=True to the .read_csv() method

df = pd.read_csv('time_data/walmart_stock.csv',index_col='Date', parse_dates=True)

The index_col='Date' argument will set the Date column as index but the column is still of data type String. The parse_dates=True argument will convert Date column to DateTime object

df.index

DatetimeIndex(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06',
               '2012-01-09', '2012-01-10', '2012-01-11', '2012-01-12',
               '2012-01-13', '2012-01-17',
               ...
               '2016-12-16', '2016-12-19', '2016-12-20', '2016-12-21',
               '2016-12-22', '2016-12-23', '2016-12-27', '2016-12-28',
               '2016-12-29', '2016-12-30'],
              dtype='datetime64[ns]', name='Date', length=1258, freq=None)

If you use this method, you can jump to section "Resample" to continue.

df.head()

Notice that there is a "Date" column and this column is what we want as a Date Index

Check out DataFrame info

Notice that the "Date" column is of data type "non-null object". This indicates the "Date" column is a string.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
Date         1258 non-null object
Open         1258 non-null float64
High         1258 non-null float64
Low          1258 non-null float64
Close        1258 non-null float64
Volume       1258 non-null int64
Adj Close    1258 non-null float64
dtypes: float64(5), int64(1), object(1)
memory usage: 63.9+ KB

Create a DateTime Object .to_datetime()¶

Create a date index from the date column

Method 1

Use to_datetime() method and pass in the entire date Series. The to_datetime() method will re-format the Series to a datetime object. Set the DateTime object to 'Date' column in the DataFrame

# df['Date'] = pd.to_datetime(df['Date'])

Method 2

Use .apply() to apply pd.to.datetime to all elements in the Date Series

# df['Date'] = df['Date'].apply(pd.to_datetime)
df.head()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
Date         1258 non-null datetime64[ns]
Open         1258 non-null float64
High         1258 non-null float64
Low          1258 non-null float64
Close        1258 non-null float64
Volume       1258 non-null int64
Adj Close    1258 non-null float64
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 68.8 KB

Notice that the Date column changed to DateTime object

Format Argument¶

If the date is not in standard format, use the format argument in .to_datetime() method to specify the format e.g. "%d/%m/%Y" (%y: 2-digit year)

Set DateTime Index .set_index()¶

As we have a DateTime object now, we can pass it to the set_index() method with argument inplace=True to set it as an index

df.set_index('Date',inplace=True)
df.head()

Resampling (Grouping) .resample()¶

A common operation with time series data is resamplling based on the time series index. The resample() method is essentially acting as a groupby method specifc to time series data that has a DateTime object.

First of all, make sure that we have a DateTime object as index

Table: Time Series Offest strings¶

Alias	Description
B	business day frequency
C	custom business day frequency (experimental)
D	calendar day frequency
W	weekly frequency
M	month end frequency
SM	semi-month end frequency (15th and end of month)
BM	business month end frequency
CBM	custom business month end frequency
MS	month start frequency
SMS	semi-month start frequency (1st and 15th)
BMS	business month start frequency
CBMS	custom business month start frequency
Q	quarter end frequency
BQ	business quarter endfrequency
QS	quarter start frequency
BQS	business quarter start frequency
A	year end frequency
BA	business year end frequency
AS	year start frequency
BAS	business year start frequency
BH	business hour frequency
H	hourly frequency
T, min	minutely frequency
S	secondly frequency
L, ms	milliseconds
U, us	microseconds
N	nanoseconds

# Our index
df.index

DatetimeIndex(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06',
               '2012-01-09', '2012-01-10', '2012-01-11', '2012-01-12',
               '2012-01-13', '2012-01-17',
               ...
               '2016-12-16', '2016-12-19', '2016-12-20', '2016-12-21',
               '2016-12-22', '2016-12-23', '2016-12-27', '2016-12-28',
               '2016-12-29', '2016-12-30'],
              dtype='datetime64[ns]', name='Date', length=1258, freq=None)

You need to call resample with the rule parameter, then you need to call some sort of aggregation function. This is because due to resampling, we need some sort of mathematical rule to join the rows by (mean,sum,count,etc...)

Yearly Means

Use rule='A', year end frequency, we can expect the last day of the year as our resampling

df.resample(rule='A').mean()

Custom Resampling¶

You could technically also create your own custom resampling function:

def first_day(entry):
    """
    Returns the first instance of the period, regardless of samplling rate.
    """
    return entry[0]

Note that we PASS firstday in (without parentheses), NOT calling it

df.resample(rule='A').apply(first_day)

Visualization with Resampling¶

Reample by Year

Use bar plot as dataset is not continuous

df['Close'].resample('A').mean().plot(kind='bar')
plt.title('Yearly Mean Close Price for Walmart')

<matplotlib.text.Text at 0x11d782080>

Reample by Month

df['Open'].resample('M').max().plot(kind='bar',figsize=(16,6))
plt.title('Monthly Max Opening Price for Walmart')

<matplotlib.text.Text at 0x11e8ddda0>

That is it! Up next we'll learn about time shifts!

	Date	Open	High	Low	Close	Volume	Adj Close
0	2012-01-03	59.970001	61.060001	59.869999	60.330002	12668800	52.619235
1	2012-01-04	60.209999	60.349998	59.470001	59.709999	9593300	52.078475
2	2012-01-05	59.349998	59.619999	58.369999	59.419998	12768200	51.825539
3	2012-01-06	59.419998	59.450001	58.869999	59.000000	8069400	51.459220
4	2012-01-09	59.029999	59.549999	58.919998	59.180000	6679300	51.616215

	Date	Open	High	Low	Close	Volume	Adj Close
0	2012-01-03	59.970001	61.060001	59.869999	60.330002	12668800	52.619235
1	2012-01-04	60.209999	60.349998	59.470001	59.709999	9593300	52.078475
2	2012-01-05	59.349998	59.619999	58.369999	59.419998	12768200	51.825539
3	2012-01-06	59.419998	59.450001	58.869999	59.000000	8069400	51.459220
4	2012-01-09	59.029999	59.549999	58.919998	59.180000	6679300	51.616215

	Open	High	Low	Close	Volume	Adj Close
Date
2012-01-03	59.970001	61.060001	59.869999	60.330002	12668800	52.619235
2012-01-04	60.209999	60.349998	59.470001	59.709999	9593300	52.078475
2012-01-05	59.349998	59.619999	58.369999	59.419998	12768200	51.825539
2012-01-06	59.419998	59.450001	58.869999	59.000000	8069400	51.459220
2012-01-09	59.029999	59.549999	58.919998	59.180000	6679300	51.616215

	Open	High	Low	Close	Volume	Adj Close
Date
2012-12-31	67.158680	67.602120	66.786520	67.215120	9239015	59.389349
2013-12-31	75.264048	75.729405	74.843055	75.320516	6951496	68.147179
2014-12-31	77.274524	77.740040	76.864405	77.327381	6515612	71.709712
2015-12-31	72.569405	73.064167	72.034802	72.491111	9040769	68.831426
2016-12-31	69.481349	70.019643	69.023492	69.547063	9371645	68.054229

	Open	High	Low	Close	Volume	Adj Close
Date
2012-12-31	59.970001	61.060001	59.869999	60.330002	12668800	52.619235
2013-12-31	68.930000	69.239998	68.449997	69.239998	10390800	61.879708
2014-12-31	78.720001	79.470001	78.500000	78.910004	6878000	72.254228
2015-12-31	86.269997	86.720001	85.550003	85.900002	4501800	80.624861
2016-12-31	60.500000	61.490002	60.360001	61.459999	11989200	59.289713

Table of Contents