import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
airline = pd.read_csv('airline_passengers.csv', index_col="Month")
airline['6-month-SMA']=airline['Thousands of Passengers'].rolling(window=6).mean()
airline['12-month-SMA']=airline['Thousands of Passengers'].rolling(window=12).mean()
airline.head()
airline.plot(figsize=(10,8))
We just showed how to calculate the SMA based on some window. However, basic SMA has some "weaknesses".
To help fix some of these issues, we can use an EWMA (Exponentially-weighted moving average).
EWMA will allow us to reduce the lag effect from SMA and it will put more weight on values that occured more recently (by applying more weight to the more recent values, thus the name). The amount of weight applied to the most recent values will depend on the actual parameters used in the EWMA and the number of periods given a window size. Full details on Mathematics behind this can be found here. Here is the shorter version of the explanation behind EWMA.
The formula for EWMA is:
$ y_t = \frac{\sum\limits_{i=0}^t w_i x_{t-i}}{\sum\limits_{i=0}^t w_i} $
Where x_t is the input value, w_i is the applied weight (Note how it can change from i=0 to t), and y_t is the output.
Now the question is, how to we define the weight term w_i ?
This depends on the adjust parameter you provide to the .ewm() method.
When adjust is True (default), weighted averages are calculated using weights:
$y_t = \frac{xt + (1 - \alpha)x{t-1} + (1 - \alpha)^2 x_{t-2} + ...
When adjust=False is specified, moving averages are calculated as:
$\begin{split}y_0 &= x_0 \\ y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,\end{split}$
which is equivalent to using weights:
\begin{split}w_i = \begin{cases} \alpha (1 - \alpha)^i & \text{if } i < t \\ (1 - \alpha)^i & \text{if } i = t. \end{cases}\end{split}
When adjust=True we have y0=x0 and from the last representation above we have yt=αxt+(1−α)yt−1, therefore there is an assumption that x0 is not an ordinary value but rather an exponentially weighted moment of the infinite series up to that point.
One must have 0<α≤1, and while since version 0.18.0 it has been possible to pass α directly, it’s often easier to think about either the span, center of mass (com) or half-life of an EW moment:
\begin{split}\alpha = \begin{cases} \frac{2}{s + 1}, & \text{for span}\ s \geq 1\\ \frac{1}{1 + c}, & \text{for center of mass}\ c \geq 0\\ 1 - \exp^{\frac{\log 0.5}{h}}, & \text{for half-life}\ h > 0 \end{cases}\end{split}
N-day EW moving average
.import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
airline = pd.read_csv('airline_passengers.csv', index_col="Month")
It's not a DateTime Index. It's a string index.
airline.index
There are missing data points in our dataset as well. Use dropna()
to fix it first!
airline.dropna(inplace=True)
airline.index = pd.to_datetime(airline.index)
airline.head()
It's now a DateTime index
airline.index
Use the span approach to calcuate EWMA. For monthly data, pass the parameter span=12
to ewm().mean()
method
airline['EWMA12'] = airline['Thousands of Passengers'].ewm(span=12).mean()
airline[['Thousands of Passengers','EWMA12']].plot()
The behavior at the beginning is different from the behavior at the end. The seasonality trend is more clear towards the end points than the beginning points. This is because we weighted the points closer to the present time heavier than the older points.