Population-Wide Depression Incidence Forecasting Comparing Autoregressive Integrated Moving Average and Vector Autoregressive Integrated Moving Average to Temporal Fusion Transformers: Longitudinal Observational Study

J Med Internet Res. 2025 May 12;27:e67156. doi: 10.2196/67156.

ABSTRACT

BACKGROUND: Accurate prediction of population-wide depression incidence is vital for effective public mental health management. However, this incidence is often influenced by socioeconomic factors, such as abrupt events or changes, including pandemics, economic crises, and social unrest, creating complex structural break scenarios in the time-series data. These structural breaks can affect the performance of forecasting methods in various ways. Therefore, understanding and comparing different models across these scenarios is essential.

OBJECTIVE: This study aimed to develop depression incidence forecasting models and compare the performance of autoregressive integrated moving average (ARIMA) and vector-ARIMA (VARIMA) and temporal fusion transformers (TFT) under different structural break scenarios.

METHODS: We developed population-wide depression incidence forecasting models and compared the performance of ARIMA and VARIMA-based methods to TFT-based methods. Using monthly depression incidence from 2002 to 2022 in Hong Kong, we applied sliding windows to segment the whole time series into 72 ten-year subsamples. The forecasting models were trained, validated, and tested on each subsample. Within each 10-year subset, the first 7 years were used for training, with the eighth year for setting hold-out validation, and the ninth and tenth years for testing. The accuracy of the testing set within each 10-year subsample was measured by symmetric mean absolute percentage error (SMAPE).

RESULTS: We found that in subsamples without significant slope or trend change (structural break), multivariate TFT significantly outperformed univariate TFT, vector-ARIMA (VARIMA), and ARIMA, with an average SMAPE of 11.6% compared to 13.2% (P=.01) for univariate TFT, 16.4% (P=.002) for VARIMA, and 14.8% (P=.003) for ARIMA. Adjusting for the unemployment rate improved TFT performance more effectively than VARIMA. When fluctuating outbreaks happened, TFT was more robust to sharp interruptions, whereas VARIMA and ARIMA performed better when incidence surged and remained high.

CONCLUSIONS: This study provides a comparative evaluation of TFT and ARIMA and VARIMA models for forecasting depression incidence under various structural break scenarios, offering insights into predicting disease burden during both stable and unstable periods. The findings support a decision-making framework for model selection based on the nature of disruptions and data characteristics. For public health policymaking, the results suggest that TFT may be a more suitable tool for disease burden forecasting during periods of stable burden level or when sudden temporary interruption, such as pandemics or socioeconomic variation, impacts disease occurrence.

PMID:40354111 | DOI:10.2196/67156

Document this CPD