¹¹institutetext: Trip.com Group, Shanghai, China
¹¹email: {yh_liao,zt_wang,weip,qq_nie,zhenhuazhang}@trip.com

TripCast: Pre-training of Masked 2D Transformers for Trip Time Series Forecasting

Yuhua Liao^🖂 Zetian Wang Peng Wei Qiangqiang Nie Zhenhua Zhang

Abstract

Deep learning and pre-trained models have shown great success in time series forecasting. However, in the tourism industry, time series data often exhibit a leading time property, presenting a 2D structure. This introduces unique challenges for forecasting in this sector. In this study, we propose a novel modelling paradigm, TripCast, which treats trip time series as 2D data and learns representations through masking and reconstruction processes. Pre-trained on large-scale real-world data, TripCast notably outperforms other state-of-the-art baselines in in-domain forecasting scenarios and demonstrates strong scalability and transferability in out-domain forecasting scenarios.

Keywords:

Trip Time Series Pre-trained Models Transformer Tourism.

1 Introduction

Refer to caption — Figure 1: An illustration of flight booking time series data (left). The vertical axis represents the flight takeoff date, and the horizontal axis represents the booking process. Within each takeoff date (right), the booking process is shown as a 1D time series and the entire data is shown as a 2D matrix. Across different takeoff dates, the unobserved booking process is shown as a triangle.

Time series forecasting is widely used in various real-world fields, such as finance, speech analysis, action recognition, and traffic flow forecasting [21]. Accurate forecasts empower businesses to optimize decision-making, enhance operations, and improve overall efficiency [1]. In the tourism industry, time series forecasting plays a crucial role in revenue management [11], demand planning [15], and dynamic pricing [26].

In the past decades, deep learning methods [2, 23, 39] have achieved significant success in time series forecasting [21]. These methods are flexible in modeling complex patterns and dependencies in time series, and have been widely used in various domains. However, training deep learning models from scratch requires a large amount of data and computational resources, which limits their usage in practice. In the tourism sector, new routes and flights are scheduled monthly without any historical data. Therefore, it is impractical to train a robust and accurate deep learning model for new routes or flights. More critically, in some domains, the application of deep time series models is hindered by the cold start problem due to the challenges or costs associated with data collection [24]. Remarkably, large-scale pre-training has become a key element of training large neural networks in vision [19, 27] and text [3, 7] domain [10]. Large Language Models (LLMs) learn general representations from web-scale text data and both model size and data scale [14] enhance corresponding zero-shot and in-context learning abilities. This inspires us to investigate the potential of pre-training time series models in the context of the tourism industry, especially given the limited research currently available in this field.

However, the time series data of the tourism industry inherently exhibits a dual-axis nature, as illustrated in Figure 1. The vertical axis represents the event time, such as the flight departure date, while the horizontal axis denotes the leading time prior to the event, such as the booking date. Existing forecasting paradigms typically address this problem from either the event time axis or the leading time axis. These dichotomous approaches result in two primary challenges: accuracy and efficiency.

Firstly, observations of time series in the tourism industry are typically influenced by both past event time points and leading time points. For instance, the booking rate of a flight on a specific departure date is influenced by the booking rate of the same flight on previous departure dates as well as the booking rate on previous leading times. Consequently, ignoring the complex dependencies and causality across different event times and leading times, existing models might fail to yield accurate forecasts. Secondly, building multiple models for different leading time steps or event time steps is inefficient and time-consuming. This fragmented approach necessitates significant computational resources and may lead to redundancy and suboptimal use of data.

To address these challenges, we propose a novel modelling paradigm that treats trip time series as a whole 2D data, and learns local and global dependencies through masking and reconstruction training processes. Furthermore, to validate the transferability and scalability of TripCast as a zero-shot forecaster in the tourism industry, extensive experiments are conducted on zero-shot forecasting tasks in both in-domain and out-domain scenarios.

Our contributions are as follows:

* For the first time, we formulate the problem of trip time series forecasting and introduce a novel modelling paradigm that treats trip time series as 2D data to capture the intrinsic correlations and causality between different event times and leading times.

* To address the challenges of trip time series forecasting, we propose TripCast that learns local and global dependencies through masking and reconstruction processes.

* We perform comprehensive experiments based on large-scale datasets from an online travel agency. The results show that our method as a zero-shot forecaster, outperforms deep learning and pre-trained models in in-domain scenarios and achieves strong scalability and transferability in out-domain scenarios.

2 Problem Statement

2.1 Trip Time Series

Let trip time series be denoted as sequential data with two axes, event time and leading time (Figure 2). The event time axis represents when a good or service is consumed, such as a flight takeoff date or a hotel room check-in date. The leading time axis represents the time before consumption, such as the booking date or search date. Formally, a trip time series $X$ is defined as a 2D matrix with dimensions $H\times C$ , where $H$ is the length of the event time axis and $C$ is the length of the leading time axis. For simplicity, we ignore the covariates dimension in all definitions.

2.2 Trip Time Series Forecasting

Given a trip time series $X\in\mathbb{R}^{H\times C}$ , $H_{obs}$ and $H_{pred}$ is the number of observed and predicted time steps along the event time axis. Correspondingly, $X$ has maximum $H_{pred}$ unobserved steps along the leading time axis and the number of unobserved leading steps is increasing with the advance of time. Our goal is to predict the unobserved leading time steps of future event time steps. Formally, the task can be defined as a parameterized function $\mathcal{F_{\theta}}$ :

\mathcal{F}:X_{H_{obs}:,C_{obs}:}=\mathcal{F_{\theta}}(X_{:H_{obs},:}\cup X_{H% _{obs}:,:C_{obs}})

(1)

where $C_{obs}$ are the observed leading time steps for each event time step. The problem is illustrated in Figure 2.

2.3 In-domain and Out-domain Forecasting

Conceptually, temporal datasets can be categorized into three levels of granularities: domain, collection, and time series [32] as shown in Figure 3. In-domain forecasting involves training and evaluating the model on the same dataset source. Conversely, out-domain forecasting entails training the model on multiple datasets and evaluating it on a dataset from a different domain. In this study, we focus on evaluating the effectiveness and scalability of TripCast in both in-domain and out-domain tasks.

3 Related Work

3.1 Tourism Industry and Time Series Forecasting

Time series forecasting is crucial in the tourism industry for revenue management, demand planning, and dynamic pricing. Existing forecasting methods can be classified into three categories: historical-data-based methods, advanced-data-based methods, and combined methods [31]. Popular traditional methods in the tourism industry include ARIMA [5, 8], Exponential Smoothing [36], and Holt-Winters [12]. With advancements in deep learning, some studies have explored leveraging deep learning models for tourism forecasting. In this work [30], the authors trained forecasting models with temporal fusion transformer (TFT) [18] for five different airports, and found that TFT outperforms traditional methods.

3.2 Pre-training Modelling for Time Series Analysis

Inspired by advancements in pre-training across various fields, self-supervised learning has been adopted for time series forecasting. TS2Vec [35] and CoST [34] learn representations through contrastive learning. However, due to the limited scale of available datasets, they only consider in-domain scenarios, and their transferability is not well-studied. With the explosion of large language models (LLMs) [25, 29], some studies explore to leverage LLMs for time series forecasting [38]. Time-LLM [13] uses text data to reprogram time series modality into language modality. This approach [40] fine-tunes LLMs with time series datasets and achieves state-of-the-art performance on various forecasting scenarios. TEMPO [4] introduces a prompt-based structure to enhance the distribution adaptation of LLMs for time series forecasting. Recently, foundation models pre-trained with time series data have been proposed [6, 28, 33].

4 Methodology

Within this section, we first outline the architecture of TripCast, which is well designed to accommodate the dual-axis properties of trip time series. We then describe the training strategies for both pre-training and downstream tasks.

4.1 Model Structure

Input Projection and Masking. Unlike image modeling [9], we cannot directly apply patch masking to trip time series because observed and unobserved values might be mixed within the same patch. To tokenize the unobserved and missing values, we are inspired by TS2Vec [35] and project the input $x_{h,c}$ to a higher dimension latent vector $z_{h,c}$ and apply token-level mask to the input data. Notably, we mask the latent vectors rather than the raw input data because the value range of the raw input data is dynamic, making it impractical to use a fixed mask value. In this way, observed and unobserved tokens are separated in the latent representations space. Furthermore, we adopt two masking strategies during pre-training stage:

* Random masking: This strategy simulates missing data by masking a predetermined proportion of tokens from the projected data at random (Figure 4). It enhances the robustness of TripCast models and ensures stable performance in real-world applications.

Mask^{random}_{h,c}\sim Bernoulli(p),\hskip 28.45274ptMask^{random}_{h,c}\in\{% 0,1\}

* Progressive masking: In trip time series, unobserved values typically appear in a triangular form, and with the progress of time, unobserved values along the diagonal are gradually revealed. To inject this prior knowledge into training stage and help the model learn causality, we mask triangular regions of the input data in a progressive manner which is shown in Figure 4.

During inference stage, we only mask the unobserved tokens and feed the masked input data into the model to predict these values.
Patching and Positional Encoding. As demonstrated by PatchTST [22] and Vision Transformer [9], patching is an effective way to capture local patterns. In TripCast, we segment the input data into non-overlapping patches and apply a linear projection to each patch. This process reduces input data redundancy and extracts local semantic information. To capture the order of the input sequence, we use sinusoidal positional encoding to encode the positional information of the input data.

z^{pacth}=PatchEmbed(z)+SinusoidalPositionalEncoding(z)

(2)

Transformer Encoder. After patching and positional encoding, we use standard transformer encoder to map the input tokens to latent representations. Each of these layers is composed of a multi-head self-attention mechanism and subsequently a feed-forward neural network.

	$\displaystyle z_{1}^{enc}=SelfAttention(z^{patch})$		(3)
	$\displaystyle z_{2}^{enc}=LayerNorm(z_{1}^{enc}+z^{patch})$		(4)
	$\displaystyle z_{3}^{enc}=FeedForward(z_{2}^{enc})$		(5)
	$\displaystyle z^{enc}=LayerNorm(z_{3}^{enc}+z_{2}^{enc})$		(6)

Reconstruction. Given the latent representations of the transformer encoder, we project the latent vector to $P\times P\times N$ , where $P$ is the size of patch and $N$ is number of predicted series. In this work, we focus on univariate scenario, so $N$ is 1. Then, we reshape the projected latent vectors to $(B,H,C,N)$ as the output of model.
Instance Normalization. To mitigate the distribution drift between training and test data, we apply reversible instance normalization [16] in TripCast models. This normalization module scales the input data by the mean and variance, then reverses the scaling for the output predictions. Although our input data is 2D, the mean and variance are calculated in the same manner as in typical instance normalization.

4.2 Pre-training and Downstream Tasks

We split each dataset into pre-train and train-test partitions in a roughly 90/10 split. To prevent data leakage, we ensure that all return routes and flights are either in the pre-train or train-test set. For train-test sets, we choose the data from 2019-06-01 to 2019-08-31 as validation set and the data from 2019-09-01 to 2019-12-31 as test set on all datasets. All TripCast models are trained on pre-train datasets and evaluated on train-test datasets. Our aim is to demonstrate the potential of TripCast as a zero-shot forecaster in the tourism industry.
Pre-training. In this work, we focus on supervised pre-training since our main goal is to demonstrate the effectiveness and transferability of this novel modelling paradigm. In all pre-training tasks, we set $H$ to 60, $C$ to 40 and the maximum $H_{pred}$ of progressive masking to 15. Furthermore, we use mean absolute error (MAE) as the loss function to train the model during the pre-training stage.
Downstream Tasks. After pre-training, we evaluate TripCast on two downstream tasks: in-domain forecasting and out-domain forecasting. In in-domain forecasting, we pre-train the model within each domain and assess its performance within the same domain. In out-domain forecasting, we pre-train a unified model on all domains, then evaluate its performance on each domain.

5 Experiments

In this work, we collect five extensive, real-world datasets from an online travel agency (OTA) to evaluate the performance of TripCast. These collections encompass flight sales data, flight booking price data, and user search data. First, we pre-train TripCast models of small and base sizes on each dataset and evaluate their performance in in-domain forecasting scenarios. Next, we compare our method with deep learning and pre-trained time series models. Then, for investigating the transferability as well as scalability of TripCast models, we pre-train TripCast model of large size on four datasets except UserSearch, and evaluate its performance on out-domain forecasting tasks. Finally, we conduct extensive ablation studies and examine the impact of various components and masking strategies on the performance of TripCast.

5.1 Datasets

All datasets are preprocessed into univariate time series with date features. Below are the details of the datasets:

Dataset	Period	Total n_series		Total n_obs		Frequency
		Pre-train	Train-test	Pre-train	Train-test
FlightSales	2018-01 $\sim$ 2019-12	3,947	489	110,997,640	13,712,640	Day
RouteSales	2018-01 $\sim$ 2019-12	2,572	286	68,789,440	7,626,080	Day
FlightPrice	2017-08 $\sim$ 2019-12	5,395	595	173,911,800	19,237,680	Day
RoutePrice	2017-01 $\sim$ 2019-12	3,996	445	159,749,040	17,685,080	Day
UserSearch	2017-04 $\sim$ 2019-12	3,298	367	124,884,320	13,690,880	Day

Table 1: Key details of datasets.

* FlightSales: This dataset contains the daily sales rate of seats which is the ratio of the number of seats sold to the capacity of flights. All time series in this dataset are aggregated by flight.

* RouteSales: This dataset is similar to FlightSales, but the time series are aggregated by route.

* FlightPrice: This dataset contains the accumulative average order price of flights. All time series in this dataset are aggregated by flight.

* RoutePrice: This dataset is similar to FlightPrice, but the time series are aggregated by route.

* UserSearch: This dataset contains the accumulative user search count of routes.

5.2 Training

We pre-train the models in three different sizes ranging from small to large, with detailed hyperparameters shown in table 2. The minimum model has less than 1 million parameters while the large model has nearly 20 million parameters. All models are trained with a batch size of 256 and 50000 iterations. We use Adam [17] with an initial learning rate of 3e-4, and cosine learning rate decay. The training is conducted using NVIDIA V100 GPUs with mixed precision training.

Model	Layers	Dimension	Heads	Params	Iters
TripCast_small	4	128	4	928k	50000
TripCast_base	4	256	8	3.4m	50000
TripCast_large	6	512	8	19.4m	50000

Table 2: Details of the hyperparameters of TripCast models in different sizes.

5.3 Evaluation Metrics

As evaluation criteria, in this study, we employ mean absolute error (MAE) and weighted absolute percentage error (WAPE).

MAE=\frac{1}{n}\sum_{i=1}^{n}|y_{i}-\hat{y}_{i}|;\quad\mathrm{WAPE}=\frac{\sum% _{i}\left|y_{i}-\hat{y}_{i}\right|}{\sum_{i}\left|y_{i}\right|}

(7)

5.4 Baselines

For deep learning methods, we compare TripCast with linear family [37], iTransformer [20], and PatchTST [22]. For pre-trained models, we compare TripCast with GPT4TS [40]. The details of the baselines are as follows:

Baseline

Hyperparameters

Values

LinearFamily

model type

{Linear, NLinear, DLinear}

PatchTST

d_model

num_layers

{128, 256}

{2, 3, 4}

iTransformer

d_model

num_layers

use_norm

{128, 256}

{2, 3, 4}

{true, false}

GPT4TS

block_size

n_head

d_model

num_layers

{1024}

{12}

{768}

{6}

Table 3: Hyperparameter search range for baselines.

Constrained by the fact that all baselines are single-axis time series models, we simplify the forecasting task to predicting the value of the last leading step for convenience. The look-back period and prediction horizon of baselines are set to 45 and 15, which are consistent with TripCast models. This setting ensures that the performance of both TripCast and baselines is evaluated at the same time points. Additionally, with a batch size of 256, training of deep learning baselines is conducted over 10,000 iterations. Based on validation loss, early stopping is implemented, with the loss being summarized and reported at intervals of 100 iterations. The optimal checkpoint is chosen according to the validation loss. For pre-trained models, we use the same training hyperparameters as TripCast models. In summary, deep learning models are trained from scratch on train-test datasets, while pre-trained models are trained on pre-train datasets and follow zero-shot evaluation on train-test datasets.

6 Results

6.1 In-domain Forecasting

The performance of TripCast models and baselines in in-domain scenarios is illustrated in Table 4. We find that both TripCast_small and TripCast_base outperform all baselines across all datasets. Among deep learning methods, PatchTST outperforms other methods in three out of five datasets indicating that patching and transformer-based models effectively capture trip time series patterns. GPT4TS, as a LLM-based model outperforms deep learning methods in three out of five datasets. We speculate that the strong transferability of GPT2 and the extensive pre-training data contribute to its superior performance. This also highlights the potential of pre-trained models in trip time series forecasting.

	FlightSales		RouteSales		FlightPrice		RoutePrice		RouteSearch
	MAE	WAPE	MAE	WAPE	MAE	WAPE	MAE	WAPE	MAE	WAPE
Linear	0.064	0.193	0.048	0.153	116.8	0.151	167.4	0.192	94.3	0.127
NLinear	0.063	0.193	0.048	0.153	115.7	0.149	169.1	0.195	95.3	0.129
DLinear	0.064	0.193	0.048	0.152	113.1	0.146	166.7	0.191	92.2	0.124
PatchTST	0.064	0.193	0.048	0.155	109.7	0.142	162.3	0.186	88.8	0.119
iTransformer	0.064	0.193	0.048	0.152	110.8	0.143	163.1	0.187	90.3	0.121
GPT4TS	0.063	0.193	0.047	0.152	110.0	0.142	161.2	0.185	79.9	0.108
TripCast_small	0.052	0.159	0.038	0.122	94.7	0.122	106.6	0.127	47.2	0.064
TripCast_base	0.050	0.153	0.038	0.121	91.4	0.118	103.7	0.124	44.5	0.061

Table 4: Test set results for deep learning and pre-trained baseline methods. Optimal results are highlighted in bold.

6.2 Towards Foundation Model (Out-domain Forecasting)

The ultimate goal of our research is to develop a foundation model for trip time series forecasting. Experimentally, we investigate the effectiveness of our model in out-domain forecasting. We pre-train model of different sizes (Figure 5) on all datasets except UserSearch and evaluate their performance on UserSearch dataset. Our findings indicate that TripCast models perform well on the UserSearch dataset. The accuracy of TripCast_small is close to PatchTST, while TripCast_base and TripCast_large outperforms GPT4TS although it is pre-trained on target domain. Furthermore, we observe that TripCast models’ performance scales well with the number of training iterations. This suggests that our method is a promising candidate for a foundational model in trip time series forecasting.

6.3 Ablation Study

6.3.1 Masking Strategy.

We conducted ablation studies on masking strategy, with a focus on progressive masking, as the robustness of the model is not our primary concern in this work. Table 5 shows that dynamic progressive masking helps models learn causality and achieve better performance.

6.3.2 Positional Encoding.

Attention mechanism is permutation invariant, so transformer models rely on positional encoding to capture the order of the input sequence. We compared the performance of TripCast_base with learned positional encoding, fixed positional encoding, and no positional encoding. Our findings, summarized in Table 6, indicate that fixed positional encoding yields better performance than learned positional encoding.

	FlightSales		FlightPrice		RouteSearch
	MAE	WAPE	MAE	WAPE	MAE	WAPE
TripCast_base	0.050	0.153	91.4	0.118	44.5	0.061
w/o Progressive Mask	0.051	0.153	92.1	0.119	45.3	0.062

Table 5: Ablation study of the masking strategy.

Date/Time	SPE	LPE	MAE	WAPE	MAE	WAPE	MAE	WAPE
			FlightSales		FlightPrice		RouteSearch
✓			0.062	0.186	99.2	0.127	79.5	0.107
✓	✓		0.050	0.153	91.4	0.118	44.5	0.061
✓		✓	0.052	0.157	90.1	0.116	49.7	0.067

Table 6: Ablation study of the positional encoding.

7 Conclusion

In this study, the trip time series forecasting problem is formulated and we proposed a novel modelling paradigm to tackle its challenges. We pre-train transformer-based models on five large-scale real-world datasets and subsequently evaluate their performance in in-domain forecasting. Our findings demonstrate the effectiveness of our approach against other deep learning and pre-trained models. Additionally, we show that our method scales well with model size and training iterations for out-of-domain forecasting. Our work opens new possibilities for time series forecasting in tourism, and we hope that it will inspire further research in this area.

References

[1] Bączek, J., Zhylko, D., Titericz, G., Darabi, S., Puget, J.F., Putterman, I., Majchrowski, D., Gupta, A., Kranen, K., Morkisz, P.: Tspp: A unified benchmarking tool for time-series forecasting. arXiv preprint arXiv:2312.17100 (2023)
[2] Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
[3] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)
[4] Cao, D., Jia, F., Arik, S.O., Pfister, T., Zheng, Y., Ye, W., Liu, Y.: Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948 (2023)
[5] Carmona-Benítez, R.B., Nieto, M.R.: Sarima damp trend grey forecasting model for airline industry. Journal of air transport management 82, 101736 (2020)
[6] Das, A., Kong, W., Sen, R., Zhou, Y.: A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688 (2023)
[7] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
[8] Do, Q.H., Lo, S., Chen, J., Le, C., Anh, L.H.: Forecasting air passenger demand: a comparison of lstm and sarima. Journal of Computer Science 16(7), 1063–1084 (2020)
[9] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[10] Gruver, N., Finzi, M., Qiu, S., Wilson, A.G.: Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems 36 (2024)
[11] Hayes, D.K., Hayes, J.D., Hayes, P.A.: Revenue management for the hospitality industry. John Wiley & Sons (2021)
[12] Huang, L., Zheng, W.: Hotel demand forecasting: a comprehensive literature review. Tourism Review 78(1), 218–244 (2023)
[13] Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J.Y., Shi, X., Chen, P.Y., Liang, Y., Li, Y.F., Pan, S., et al.: Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728 (2023)
[14] Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)
[15] Kim, S., et al.: Forecasting short-term air passenger demand using big data from search engine queries. Automation in Construction 70, 98–108 (2016)
[16] Kim, T., Kim, J., Tae, Y., Park, C., Choi, J.H., Choo, J.: Reversible instance normalization for accurate time-series forecasting against distribution shift. In: International Conference on Learning Representations (2021)
[17] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[18] Lim, B., Arık, S.Ö., Loeff, N., Pfister, T.: Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting 37(4), 1748–1764 (2021)
[19] Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems 36 (2024)
[20] Liu, Y., Hu, T., Zhang, H., Wu, H., Wang, S., Ma, L., Long, M.: itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint arXiv:2310.06625 (2023)
[21] Ma, Q., Liu, Z., Zheng, Z., Huang, Z., Zhu, S., Yu, Z., Kwok, J.T.: A survey on time-series pre-trained models. arXiv preprint arXiv:2305.10716 (2023)
[22] Nie, Y., Nguyen, N.H., Sinthong, P., Kalagnanam, J.: A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730 (2022)
[23] Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y.: N-beats: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437 (2019)
[24] Oreshkin, B.N., Carpov, D., Chapados, N., Bengio, Y.: Meta-learning framework with applications to zero-shot time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 9242–9250 (2021)
[25] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Advances in neural information processing systems 35, 27730–27744 (2022)
[26] Pereira, L.N.: An introduction to helpful forecasting methods for hotel revenue management. International Journal of Hospitality Management 58, 13–23 (2016)
[27] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PMLR (2021)
[28] Rasul, K., Ashok, A., Williams, A.R., Khorasani, A., Adamopoulos, G., Bhagwatkar, R., Biloš, M., Ghonia, H., Hassen, N.V., Schneider, A., et al.: Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278 (2023)
[29] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
[30] Wang, L., Mykityshyn, A., Johnson, C., Cheng, J.: Flight demand forecasting with transformers. In: AIAA AVIATION 2022 Forum. p. 3708 (2022)
[31] Weatherford, L.R., Kimes, S.E.: A comparison of forecasting methods for hotel revenue management. International journal of forecasting 19(3), 401–415 (2003)
[32] Woo, G., Liu, C., Kumar, A., Sahoo, D.: Pushing the limits of pre-training for time series forecasting in the cloudops domain. arXiv preprint arXiv:2310.05063 (2023)
[33] Woo, G., Liu, C., Kumar, A., Xiong, C., Savarese, S., Sahoo, D.: Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592 (2024)
[34] Woo, G., Liu, C., Sahoo, D., Kumar, A., Hoi, S.: Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575 (2022)
[35] Yue, Z., Wang, Y., Duan, J., Yang, T., Huang, C., Tong, Y., Xu, B.: Ts2vec: Towards universal representation of time series. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 8980–8987 (2022)
[36] Yüksel, S.: An integrated forecasting approach to hotel demand. Mathematical and Computer Modelling 46(7-8), 1063–1070 (2007)
[37] Zeng, A., Chen, M., Zhang, L., Xu, Q.: Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 11121–11128 (2023)
[38] Zhang, X., Chowdhury, R.R., Gupta, R.K., Shang, J.: Large language models for time series: A survey. arXiv preprint arXiv:2402.01801 (2024)
[39] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 11106–11115 (2021)
[40] Zhou, T., Niu, P., Sun, L., Jin, R., et al.: One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36, 43322–43355 (2023)