The most popular library in Python for dealing with Time Series data is the statsmodels
library. It is heavily inspired by the R statistical programming language
.
Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. We will be focus on time series data.
To manually install, you can use conda install statsmodels
The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at <a href="https://www.statsmodels.org/stable/index.html" target = "_blank">statsmodels.org</a>
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
You can safely ignore the warning:
Please use the pandas.tseries
module instead. from pandas.core import datetools
import statsmodels.api as sm
df = sm.datasets.macrodata.load_pandas().data
df.head()
print(sm.datasets.macrodata.SOURCE)
print(sm.datasets.macrodata.NOTE)
index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))
df.index = index
df.head()
df['realgdp'].plot()
plt.ylabel("REAL GDP")
The Hodrick-Prescott filter separates a time-series y_t into a trend τ_t and a cyclical component ζt
$y_t = \tau_t + \zeta_t$
The components are determined by minimizing the following quadratic loss function
$\min_{\\{ \tau_{t}\\} }\sum_{t}^{T}\zeta_{t}^{2}+\lambda\sum_{t=1}^{T}\left[\left(\tau_{t}-\tau_{t-1}\right)-\left(\tau_{t-1}-\tau_{t-2}\right)\right]^{2}$
Use Tuple unpacking to grab the trend
# Tuple unpacking
gdp_cycle, gdp_trend = sm.tsa.filters.hpfilter(df.realgdp)
gdp_cycle
type(gdp_cycle)
Add a column for "trend"
df["trend"] = gdp_trend
df[['trend','realgdp']].plot()
df[['trend','realgdp']]["2000-03-31":].plot(figsize=(12,8))