当前位置:网站首页>Machine learning notes - seasonality of time series
Machine learning notes - seasonality of time series
2022-06-26 13:04:00 【Sit and watch the clouds rise】
One 、 Seasonality
As long as the average of the sequence is regular 、 Periodic changes , Time series will show seasonality . Seasonal changes usually follow the clock and calendar —— One day 、 It's common to repeat a week or a year . Seasonality is usually driven by the natural world's cycles over days and years or by social behavioural conventions around dates and times .

Two simulated seasonal features . The first one is , indicators , It is best for observing very short periods , E.g. weekly observation . The second kind , Fourier characteristic , The best season to observe long-term changes , For example, the daily observation season every year .
Two 、 Seasonal charts and seasonal indicators
Just as we use moving averages to find trends in a series , We can use seasonal maps to find seasonal patterns .
Seasonal charts show time series segments plotted for a common period , This period is what you want to observe “ season ”. This figure shows the seasonal graph of daily views of Wikipedia articles on trigonometric functions : Daily views of articles are plotted over a common weekly period .

1、 Seasonal indicators
The seasonal index is a binary characteristic that represents the seasonal difference at the time series level . If you consider the seasonal period as a classification feature and apply a single heat code , Then we can get the seasonal index .
By coding every day of the week , We get weekly seasonal indicators . Creating weekly metrics for the trigonometric series will provide us with six new “ fictitious ” function . ( If you give up one of the indicators , Linear regression is the best ; We have selected Monday in the figure below .)
| Date | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
|---|---|---|---|---|---|---|
| 2016-01-04 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-05 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-06 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-07 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
| 2016-01-08 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
| 2016-01-09 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| 2016-01-10 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| 2016-01-11 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... |
Adding seasonal indicators to the training data helps the model distinguish the average value in the seasonal cycle :

2、 Fourier characteristics and periodogram
The feature we are discussing now is more suitable for long seasons , Not many observations with unrealistic indicators . Fourier features do not create a feature for each date , Instead, try to use several features to capture the overall shape of the seasonal curve .
Let's take a look at the graph of annual seasons in the trigonometric function . Pay attention to the repetition of various frequencies : Every year, 3 Second long up and down movement , Every year, 52 Times of short cycle exercise , Maybe there's something else .

We try to capture these frequencies in a season with Fourier characteristics . The idea is to include in our training data periodic curves with the same frequency as the seasons we are trying to model . The curves we use are the sine and cosine curves of trigonometric functions .
Fourier characteristics are sine and cosine curve pairs , From the longest season , Each potential frequency corresponds to a pair of . Fourier pairs that simulate annual seasonality will have frequencies : Once a year 、 Twice a year 、 Three times a year , wait .

If we take a set of these sinusoids / Cosine curves are added to our training data , The linear regression algorithm will calculate the weight suitable for the seasonal component in the target sequence . This figure illustrates how linear regression uses four Fourier pairs to simulate Wiki Trigonometry Annual seasonality in the series .

We only need eight characteristics ( Four sinusoids / Cosine pair ) The annual seasonality can be well estimated . Combine this with the need for hundreds of features ( One for every day of the year ) The seasonal index methods are compared . By using only Fourier features for seasonal “ The main effect ” Modeling , You usually need to add fewer features to your training data , This means that the calculation time is reduced and the risk of over fitting is reduced .
3、 Use periodogram to select Fourier features
How many Fourier pairs should we actually include in the feature set ? We can answer this question with a periodic graph . The periodogram tells you the intensity of the frequency in the time series . say concretely , Chart y The value on the axis is
, among a and b Is the sine and cosine coefficients at this frequency ( As shown in the Fourier component diagram above )).

From left to right , The periodic graph is in Quarterly Then it goes down , Four times a year . This is why we chose four Fourier pairs to simulate the annual seasons . We ignore the weekly frequency , Because it better models with metrics .
Calculate Fourier characteristics
Understanding how Fourier features are calculated is not essential to using them , But if you see the details, you can clarify things , The cell hidden cells below illustrate how to derive a set of Fourier features from the index of the time series .( however , We will use in our application the code from statsmodels Library function .)
import numpy as np
def fourier_features(index, freq, order):
time = np.arange(len(index), dtype=np.float32)
k = 2 * np.pi * (1 / freq) * time
features = {}
for i in range(1, order + 1):
features.update({
f"sin_{freq}_{i}": np.sin(i * k),
f"cos_{freq}_{i}": np.cos(i * k),
})
return pd.DataFrame(features, index=index)
# Compute Fourier features to the 4th order (8 new features) for a
# series y with daily observations and annual seasonality:
#
# fourier_features(y, freq=365.25, order=4)3、 ... and 、 Example - Tunnel flow
We will continue to use the tunnel traffic data set again . This hidden cell loads data and defines two functions :seasonal_plot and plot_periodogram.
from pathlib import Path
from warnings import simplefilter
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
simplefilter("ignore")
# Set Matplotlib defaults
plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True, figsize=(11, 5))
plt.rc(
"axes",
labelweight="bold",
labelsize="large",
titleweight="bold",
titlesize=16,
titlepad=10,
)
plot_params = dict(
color="0.75",
style=".-",
markeredgecolor="0.25",
markerfacecolor="0.25",
legend=False,
)
%config InlineBackend.figure_format = 'retina'
# annotations: https://stackoverflow.com/a/49238256/5769929
def seasonal_plot(X, y, period, freq, ax=None):
if ax is None:
_, ax = plt.subplots()
palette = sns.color_palette("husl", n_colors=X[period].nunique(),)
ax = sns.lineplot(
x=freq,
y=y,
hue=period,
data=X,
ci=False,
ax=ax,
palette=palette,
legend=False,
)
ax.set_title(f"Seasonal Plot ({period}/{freq})")
for line, name in zip(ax.lines, X[period].unique()):
y_ = line.get_ydata()[-1]
ax.annotate(
name,
xy=(1, y_),
xytext=(6, 0),
color=line.get_color(),
xycoords=ax.get_yaxis_transform(),
textcoords="offset points",
size=14,
va="center",
)
return ax
def plot_periodogram(ts, detrend='linear', ax=None):
from scipy.signal import periodogram
fs = pd.Timedelta("1Y") / pd.Timedelta("1D")
freqencies, spectrum = periodogram(
ts,
fs=fs,
detrend=detrend,
window="boxcar",
scaling='spectrum',
)
if ax is None:
_, ax = plt.subplots()
ax.step(freqencies, spectrum, color="purple")
ax.set_xscale("log")
ax.set_xticks([1, 2, 4, 6, 12, 26, 52, 104])
ax.set_xticklabels(
[
"Annual (1)",
"Semiannual (2)",
"Quarterly (4)",
"Bimonthly (6)",
"Monthly (12)",
"Biweekly (26)",
"Weekly (52)",
"Semiweekly (104)",
],
rotation=30,
)
ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
ax.set_ylabel("Variance")
ax.set_title("Periodogram")
return ax
data_dir = Path("../input/ts-course-data")
tunnel = pd.read_csv(data_dir / "tunnel.csv", parse_dates=["Day"])
tunnel = tunnel.set_index("Day").to_period("D")Let's take a look at the curve of one week and more than one year .
X = tunnel.copy()
# days within a week
X["day"] = X.index.dayofweek # the x-axis (freq)
X["week"] = X.index.week # the seasonal period (period)
# days within a year
X["dayofyear"] = X.index.dayofyear
X["year"] = X.index.year
fig, (ax0, ax1) = plt.subplots(2, 1, figsize=(11, 6))
seasonal_plot(X, y="NumVehicles", period="week", freq="day", ax=ax0)
seasonal_plot(X, y="NumVehicles", period="year", freq="dayofyear", ax=ax1);
Now let's look at the periodic graph :
plot_periodogram(tunnel.NumVehicles);
The periodic chart is consistent with the seasonal chart above : Weekly peak season and annual season are weak . We will model the weekly seasons with metrics , Use Fourier features to model the seasons of each year . From right to left , The periodic chart is in bimonthly (6) And every month (12) Decreasing between , So let's use 10 Fourier pairs .
We will use DeterministicProcess Create our seasonal features , This is our first 2 The same utilities used in this lesson to create trend characteristics . Use two seasonal periods ( Weekly and yearly ), We need to instantiate one of them as “ Add on ”:
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
fourier = CalendarFourier(freq="A", order=10) # 10 sin/cos pairs for "A"nnual seasonality
dp = DeterministicProcess(
index=tunnel.index,
constant=True, # dummy feature for bias (y-intercept)
order=1, # trend (order 1 means linear)
seasonal=True, # weekly seasonality (indicators)
additional_terms=[fourier], # annual seasonality (fourier)
drop=True, # drop terms to avoid collinearity
)
X = dp.in_sample() # create features for dates in tunnel.indexAfter creating the feature set , We can fit the model and make predictions . We're going to add a 90 Day forecast , To understand how our model infers beyond the training data .
y = tunnel["NumVehicles"]
model = LinearRegression(fit_intercept=False)
_ = model.fit(X, y)
y_pred = pd.Series(model.predict(X), index=y.index)
X_fore = dp.out_of_sample(steps=90)
y_fore = pd.Series(model.predict(X_fore), index=X_fore.index)
ax = y.plot(color='0.25', style='.', title="Tunnel Traffic - Seasonal Forecast")
ax = y_pred.plot(ax=ax, label="Seasonal")
ax = y_fore.plot(ax=ax, label="Seasonal Forecast", color='C3')
_ = ax.legend()
We can also use time series to do more to improve our predictions . The next step is to use the time series itself as a feature . Using time series as input to predictions allows us to model another component that often appears in the series : cycle .
边栏推荐
- UVa11582 [快速幂]Colossal Fibonacci Numbers!
- processsing 函数random
- processing 随机生成线动画
- Less than 40 lines of code to create a blocprovider
- Summary of some application research cases of UAV Remote Sensing in forest monitoring
- Deeply analyze the differences between dangbei box B3, Tencent Aurora 5S and Xiaomi box 4S
- zoopeeper设置acl权限控制(只允许特定ip访问,加强安全)
- 源码学习:AtomicInteger类代码内部逻辑
- Photoshop 2022 23.4.1增加了哪些功能?有知道的吗
- Power Designer - Custom Comment button
猜你喜欢
RSS rendering of solo blog system failed

Research and development practice of Kwai real-time data warehouse support system

Solution of Splunk iowait alarm

倍福EtherCAT Xml描述文件更新和下载

倍福CX5130换卡对已有的授权文件转移操作

倍福PLC通过MC_ReadParameter读取NC轴的配置参数

P2393 yyy loves Maths II

ES6:Map

倍福NC轴状态转移图解析

Less than 40 lines of code to create a blocprovider
随机推荐
LeetCode_栈_中等_150. 逆波兰表达式求值
postgis 地理化函数
Stream learning record
. Net Maui performance improvement
Tiger DAO VC产品正式上线,Seektiger生态的有力补充
倍福EtherCAT Xml描述文件更新和下载
map 取值
C# 结构体:定义、示例
UVA5009 Error Curves三分
Tiger DAO VC产品正式上线,Seektiger生态的有力补充
Adobe Acrobat prevents 30 security software from viewing PDF files or there are security risks
OPLG: 新一代云原生可观测最佳实践
processsing 函数random
Adobe Acrobat阻止30款安全软件查看PDF文件 或存在安全风险
快手实时数仓保障体系研发实践
Redis learning - 04 persistence
轻流完成与「DaoCloud Enterprise 云原生应用云平台」兼容性认证
First knowledge - Software Testing
Electron official docs series: Processes in Electron
solo 博客系统的 rss 渲染失败