📑Reading Notes for "LTSFLinear"
Overview
This paper insists that the time series is an ordered set of continuous points that will result in the loss of temporal information when using the Transformer structure. To prove this opinion, they propose models named LSTFLinear which achieve outstanding performance and conduct comprehensive studies.

What problem does the paper try to solve？
Attempt to verify whether the Transformer is effective in time series prediction problems and whether a simple model can surpass all current Transformerbased models.

What is the proposed solution?
The LSTF Linear model was proposed as an alternative, achieving extremely high performance with only a singlelayer network

What are the key experimental results in this paper?
Achieved better performances than Transformerbased methods on multiple datasets such as electricity, healthcare, and meteorology

What are the main contributions of the paper?
 Challenge the Transformer structure in the longterm time series forecasting task.
 Introduce the LTSFLinear model which only has one layer while achieving compared results in various fields.
 Conduct comprehensive empirical studies on various aspects of existing Transformerbased solutions.

What are the strong points and weak points in this paper?
 Strong Points： Proposed potential issues in the current research route and opened up a new perspective in a simple way.
 Weak Points: Only conducted research on prediction problems and have not explored other issues, such as anomaly detection.
Background
Over the past several decades, the Transformer has been widely used as the TSF solution. However, the selfattention mechanism is permutationinvariant and antiorder which will cause the loss of temporal information. Typically, time series contain less semantic meaning compared with NLP or CV problems and need more temporal information, which will emphasize this problem. Thus, this paper tries to challenge the Transformerbased LSTF solution with direct multistep forecasting strategies.
For the time series containing $C$ variates, the historical data can be represented as $\mathcal{X}=\{X_1^t,\dots,X_C^t\}_{t=1}^L$, wherein $L$ is the lookback window size. The forecasting problems need to predict $T$ feature time step’s value $\hat{\mathcal{X}}=\{\hat{X}_1^t,\dots,\hat{X}_C^t\}_{t=1}^L$. When $T>1$, the methods can be divided into two parts:
 Integrated multistep(IMS): Learn a singlestep forecaster and interactively apply it to obtain multistep predictions. This method has a smaller variance but will cause error accumulation.
 Direct multistep(DMS): Directly optimize the multistep forecasting objective at one. This method can have more accurate predictions when $T$ is large.
Methods
TransformerBased Methods
The vanilla Transformer model has some limitations when applied to time series problems, thus various works try to improve the performance by adding or replacing some parts of the Transformer. Generally speaking, it can be divided into four major parts.
Preprocessing: To use Transformer to deal with time series datasets, some preprocessing is needed to adopt the data structures, such as normalization with zeromean and adding timestamps as NLP did. Specifically, in Autoformer, seasonaltrend decomposition is proposed to get the trend part and the cyclical part, which helps data more clearly.
Embedding: In NLP’s Transformer, embedding will map the words to the vector in a typical space that reveals the meaning. In time series, time information is significantly important. Thus, various timestamp methods are proposed to help the model reserve the temporal information.
Encoder/Decoder: To help the Transformer structure adopt the time series problems, there have been many improvements made to the encoder and decoder in this work. Among them, in the encoder section, many improvements have been proposed to reduce computational consumption and increase speed. In the decoder section, to avoid cumulative errors, it has also begun to transition from IMS to DMS.
The success of Transformer in the NLP field is largely attributed to its understanding of the semantic relationships between words, but in time series problems, temporary information has become even more important. However, the Transformer’s ability to model time largely comes from the timestamp rather than its structure.
LSTFLinear
This paper hypothesizes that the improvement in the TransformerBased model is due to the DMS strategy rather than the Transformer. Thus they propose LSTFLinear that directly regresses historical time series for future prediction via a weighted sum operation to verify this thinking. The model can be a formula as :
$$ \hat{X}_i=WX_i $$Wherein $W\in\mathbb{R}^{T\times L}$ is a linear layer along the temporal axis. Furthermore, this paper proposes two submodels, DLinear which uses decomposition to obtain the trend part and the seasonal part, and the NLinear which subtracts a portion before making the prediction and rejoined it after making the prediction.
Experiments
In order to verify the quality of the LSTFLinear, the author selected some common sequence data from real life and compared with five popular Transformerbased model. All results are show as follow: