ETS Decomposition Code Along

Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Read CSV

In [2]:
airline = pd.read_csv('airline_passengers.csv', index_col="Month")
In [3]:
airline.head()
Out[3]:
Thousands of Passengers
Month
1949-01 112.0
1949-02 118.0
1949-03 132.0
1949-04 129.0
1949-05 121.0
In [4]:
airline.plot()
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7a36ed0>

From the chart above, it looks like we have seasonality in it and the trend is going up. It is a linear trend or exponential trend? It's hard to tell from this plot. We use ETS to decompose this.

Get Data in Correct Format

Use to_datetime() to convert index to DateTime index

In [5]:
airline.dropna(inplace=True)
airline.index = pd.to_datetime(airline.index)
In [6]:
airline.head()
Out[6]:
Thousands of Passengers
Month
1949-01-01 112.0
1949-02-01 118.0
1949-03-01 132.0
1949-04-01 129.0
1949-05-01 121.0

From the above table, we saw the "Month" column with date in it now, so it's clearly a DateTime index

Perform Decomposition

Choosing the Right Model

We can use an additive model when it seems that the trend is more linear and the seasonality and trend components seem to be constant over time (e.g. every year we add 10,000 passengers). A multiplicative model is more appropriate when we are increasing (or decreasing) at a non-linear rate (e.g. each year we double the amount of passengers).

From the chart above, it looks like the trend in these earlier days is slightly increasing at a higher rate than just linear. A quick way to determine is to look at the peak at each seasonality peak.

In [7]:
# multiplicative model is used
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(airline['Thousands of Passengers'], model='multiplicative')
C:\Users\KL\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

We can plot the observed, trend or seasonal component separately

In [8]:
result.observed.plot(title="Observed")
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0xda36cf0>
In [9]:
result.trend.plot(title="Trend Portion")
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0xdcc7730>
In [10]:
result.seasonal.plot(title="Sesonal Portion")
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0xdcaa270>

Or we can plot everything together:

In [11]:
fig = result.plot()

When you run result.plot(), you may see two of the same plots here. This is just a small bug with statsmodels function. To remove duplicate plot, assign result.plot() to a variable and run again. You should see a single plot now.