import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
airline = pd.read_csv('airline_passengers.csv', index_col="Month")
airline.head()
airline.plot()
From the chart above, it looks like we have seasonality in it and the trend is going up. It is a linear trend or exponential trend? It's hard to tell from this plot. We use ETS to decompose this.
Use to_datetime()
to convert index to DateTime index
airline.dropna(inplace=True)
airline.index = pd.to_datetime(airline.index)
airline.head()
From the above table, we saw the "Month" column with date in it now, so it's clearly a DateTime index
We can use an additive model when it seems that the trend is more linear and the seasonality and trend components seem to be constant over time (e.g. every year we add 10,000 passengers). A multiplicative model is more appropriate when we are increasing (or decreasing) at a non-linear rate (e.g. each year we double the amount of passengers).
From the chart above, it looks like the trend in these earlier days is slightly increasing at a higher rate than just linear. A quick way to determine is to look at the peak at each seasonality peak.
# multiplicative model is used
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(airline['Thousands of Passengers'], model='multiplicative')
We can plot the observed, trend or seasonal component separately
result.observed.plot(title="Observed")
result.trend.plot(title="Trend Portion")
result.seasonal.plot(title="Sesonal Portion")
Or we can plot everything together:
fig = result.plot()
When you run result.plot()
, you may see two of the same plots here. This is just a small bug with statsmodels function. To remove duplicate plot, assign result.plot()
to a variable and run again. You should see a single plot now.