国产情侣

Skip to main content
  • Research
  • Published:

Spatial and temporal analysis and forecasting of TB reported incidence in western China

Abstract

Objective

Tuberculosis (TB) remains an important public health concern in western China. This study aimed to explore and analyze the spatial and temporal distribution characteristics of TB reported incidence in 12 provinces and municipalities in western China and to construct the optimal models for prediction, which would provide a reference for the prevention and control of TB and the optimization of related health policies.

Methods

We collected monthly data on TB reported incidence in 12 provinces and municipalities in western China and used ArcGIS software to analyze the spatial and temporal distribution characteristics of TB reported incidence. We applied the seasonal index method for the seasonal analysis of TB reported incidence and then established the SARIMA and Holt-Winters models for TB reported incidence in 12 provinces and municipalities in western China.

Results

The reported incidence of TB in 12 provinces and municipalities in western China showed apparent spatial clustering characteristics, and Moran鈥檚 I was greater than 0 (p鈥<鈥0.05) over 8听years during the reporting period. Among them, Tibet was the hotspot for TB incidence in 12 provinces and municipalities in western China. The reported incidence of TB in 12 provinces and municipalities in western China from 2004 to 2018 showed clear seasonal characteristics, with seasonal indices greater than 100% in both the first and second quarters. The optimal models constructed for TB reported incidence in 12 provinces and municipalities in western China all passed white noise test (p鈥&驳迟;鈥0.05).

Conclusions

As a hotspot of reported TB incidence, Tibet should continue to strengthen government leadership and policy support, explore TB intervention strategies and causes. The optimal prediction models we developed for reported TB incidence in 12 provinces and municipalities in western China were different.

Peer Review reports

Introduction

Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis which is a serious threat to human health [1, 2]. Before the COVID-19 pandemic, tuberculosis was one of the leading causes of death from a single infectious disease worldwide, and it has already surpassed HIV/AIDS in terms of mortality [3]. According to the WHO鈥檚 Global Tuberculosis Report 2022, the estimated number of TB deaths is increasing between 2019 and 2021, with the incidence of TB increasing by 3.6% between 2020 and 2021 alone, globally [3]. The Global Burden of Disease Study 2015 released by GBD shows that the global economic burden of disease due to TB remains high, such as in Asia, Eastern Europe, and sub-Saharan Africa [4]. Therefore, strengthening the monitoring and early warning of TB incidence remains a global public health priority and important mission.

In China, TB is an important public health issue, and many efforts have been made to prevent and control this infectious disease over the past decades [5]. In 2004, China built a national network of direct reporting systems for infectious diseases, including a tuberculosis management information system, which provides an important reference for the effective development of TB prevention and control strategies. However, China remains one of the countries with a high global TB burden [6]. In 2021, 30 high-burden TB countries accounted for 87% of the estimated total number of cases worldwide, and eight of these countries, including China, accounted for more than two-thirds of the global total [3]. TB is classified as a Class B infectious disease in China, and its morbidity and mortality are at the top of the list of notifiable infectious disease [7]. The incidence of TB is higher in the western region because the economic conditions and the level of medical resource allocation lag behind those of the eastern and central regions in China [8, 9]. Therefore, there is a great need to continue strengthening the monitoring and early warning of this infectious disease in western China.

Surveillance and early warning of infectious diseases play an important role in the development of timely and effective preventive measures by the relevant departments of health administration [10, 11]. Surveillance and early warning of infectious diseases in China consist of three main systems: case, event, and symptom surveillance [10]. However, infectious disease surveillance and early warning systems must rely on data science approaches and big data [12]. For example, spatiotemporal analysis and prediction of TB can provide an important basis for infectious disease surveillance and early warnings.

Spatial epidemiology describes and analyzes the geographic distribution of diseases and is currently widely used in epidemiology [13]. The spatiotemporal analysis of infectious diseases is generally conducted using GIS software, providing powerful spatial analysis capabilities and ideas and methods for the spatial analysis of infectious diseases [14]. TB incidence is typical of spatiotemporal data, and many studies [15,16,17] have confirmed that TB incidence is characterized by spatial aggregation. Therefore, we used GIS software to analyze the spatiotemporal reported incidence of TB and obtain their aggregation range and changes, which is essential for the development of effective prevention and control strategies.

Infectious disease prediction combines epidemiology, mathematics, statistics, computer science, and other related theories. It involves building a model to simulate the spread of infectious diseases, to obtain an estimate of the incidence of infectious diseases in the future, and thus formulate preventive measures in advance to rationally utilize medical resources. Prediction techniques are an important part of infectious disease surveillance, and are of great significance in formulating corresponding preventive measures in advance [18]. Currently, several techniques are available for predicting TB incidence. The first one is the traditional time-series model, such as Autoregressive integrated moving average (ARIMA) [19], Autoregressive integrated moving average with exogenous variables (ARIMAX) [19], Seasonal autoregressive integrated moving average (SARIMA) [20, 21], and Error, Trend, Seasonal (ETS) models [20]; the second one is the physical model, such as Susceptible exposed infectious recovered (SEIR) [22] model; the third one is the machine learning models, such as Long short term memory network (LSTM) [23], Support vector regression (SVR) [24], Random forest (RF) [24], Back propagation neural network (BPNN) [24], etc.; and the fourth one is the hybrid models, which usually combines the traditional time series prediction model with the machine learning model, such as ARIMA-LSTM [23], SARIMA-ANFIS [25] and SARIMA-NNAR [26] models, etc. However, because different models are applicable to different scenarios, the requirements for the sample size and data distribution characteristics of such models are different [7]. Therefore, selecting a suitable model for prediction based on research design and data characteristics is an important part of this study.

Research in the literature has shown that there are only spatial and temporal analyses of the distribution of TB incidence in southwest China or a province or city in eastern China [27,28,29], and forecasting studies of TB incidence in a province or city in western China [30, 31]. Therefore, this study was designed to analyze the spatial and temporal characteristics and to forecast the reported incidence of TB in 12 provinces and municipalities in western China, with the aim of providing recommendations for the formulation of public policies by the government and related departments.

Methods

Study area

According to the criteria of the National Bureau of Statistics of China, the western region of China includes 12 provinces and municipalities: Inner Mongolia Autonomous Region, Guangxi Zhuang Autonomous Region, Chongqing, Sichuan, Guizhou, Yunnan, Tibet Autonomous Region, Shaanxi, Gansu, Qinghai, Ningxia Hui Autonomous Region, and Xinjiang Uyghur Autonomous Region (Fig.听1). The total area is about 6.86 million square kilometres, accounting for about 72% of the country鈥檚 total area.

Fig.听1
figure 1

The geolocation of the western region of China

Data Source

Monthly TB reported incidence were obtained from the China Public Health Science Data Center website () (Additional file 1). In China, a nationwide directly reported infectious disease network system has been under construction since 2004, covering medical and health institutions at the secondary level and above, which has played an important role in the early monitoring and warning of outbreaks and public health emergencies. The Law of the People鈥檚 Republic of China on the Prevention and Treatment of Infectious Diseases stipulates that tuberculosis is classified as a category B infectious disease. In China, if patients with category B infectious diseases are diagnosed at all levels and types of medical and health institutions, physicians must complete a direct report on the network within 24听h. In this study, we collected time-series data on the monthly reported incidence of TB from January 2004 to December 2018 in 12 provinces and municipalities in western China.

Spatial auto-correlation analysis

Spatial autocorrelation is a statistical approach that analyzes the correlation of the same variable at different spatial locations [32, 33]. It is classified into global spatial autocorrelation and local spatial autocorrelation using the application function of spatial autocorrelation. The global spatial autocorrelation is employed to determine whether the study area is spatially clustered or not, which is usually measured by Moran鈥檚 I. Equation听(1) is used to calculate Moran鈥檚 I, which has coefficients in the range (-1, 1) [33].

If the value of Moran鈥檚 I is greater than 0, it indicates the existence of a positive spatial correlation in the study area, and its value is greater, which implies that the objects of the study have a clustering distribution [34]. If the value of Moran鈥檚 I is less than 0, there is a negative spatial correlation in the study area, and its value is smaller, indicating that the object of study has a discrete distribution. A Moran鈥檚 I value close to 0 indicates that there is no autocorrelation.

$$\mathrm I=\frac{\mathrm n\;\times\;{\sum\limits_{\mathrm i=1}^{\mathrm n}}{\sum\limits_{\mathrm j=1}^{\mathrm n}}\;{\mathrm w}_{\mathrm{ij}}\;\left({\mathrm x}_{\mathrm i}-\overline{\mathrm x}\right)\;\;\left({\mathrm x}_{\mathrm j}-\overline{\mathrm x}\right)}{\left(\sum\limits_{\mathrm i=1}^{\mathrm n}\sum\limits_{\mathrm j=1}^{\mathrm n}\;{\mathrm w}_{\mathrm{ij}}\right)\sum\limits_{\mathrm i=1}^{\mathrm n}\left({\mathrm x}_{\mathrm i}-\overline{\mathrm x}\right)^2},\;\mathrm i\neq\mathrm j$$
(1)

Local spatial autocorrelation is used to determine the degree of spatial clustering between individual spatial objects in a region and their neighbors. The local spatial autocorrelation Ii was calculated using Eq.听(2) [35]. When Ii is greater than 0, it indicates a high-high (鈥渉ot spots鈥) or low-low (鈥渃old spots鈥) type, and when the Ii value is less than 0, it indicates a high-low (鈥渟patial outliers鈥) or low鈥揾igh (鈥渟patial outliers鈥) type.

$${\text{I}}_{{\text{i}}} = \frac{{\sum\limits_{j = 1}^{n} {w_{ij} (x_{i} - \overline{x})(x_{j} - \overline{x})} }}{{\frac{1}{n}\sum\limits_{i = 1}^{n} {(x_{i} - \overline{x})^{2} } }},i \ne j$$
(2)

The Getis-Ord Gi* index is designed to identify the level of local spatial autocorrelation and determine the location of cold or hot spots [36]. When the value of Gi* is close to 0, it indicates that the region where the observation is located is randomly distributed; G*i鈥<鈥0 indicates that the observation is in the cold spot region; and G*i鈥>鈥0 indicates that the observation is in the hot spot region. The formula for the Getis-Ord Gi* index is as follows [37,38,39]:

$${\text{G}}_{i}^{*} = \frac{{\sum\limits_{j = 1}^{n} {w_{ij} x{}_{j} - \overline{x}\sum\limits_{j = 1}^{n} {w_{ij} } } }}{{S\sqrt {\frac{{n\sum\limits_{j = 1}^{n} {w_{ij}^{2} - \left( {\sum\limits_{j = 1}^{n} {w_{ij} } } \right)} }}{n - 1}^{2} } }}$$
(3)
$${\text{S}} = \sqrt {\frac{{\sum\limits_{j}^{n} {x_{j}^{2} } }}{n} - (\overline{x})^{2} }$$
(4)
$$\overline{x} = \frac{{\sum\limits_{j = 1}^{n} {x_{j} } }}{n}$$
(5)

where xj is the attribute value of element j; wij is the spatial weight between elements i and j; and n is the total number of elements.

Seasonal index

The seasonal index is a method of calculating an index of seasonal variation that describes the movement of a time series characterized by the inclusion of seasonal periodic movements [40]. Seasonal index鈥=鈥塧verage of the same month (season) / average of the months (seasons) of each year鈥壝椻100%. In public health, if the seasonal index is鈥>鈥100% it means that the month (quarter) is an epidemic season, and if the seasonal index is鈥夆墹鈥100% it means that the month (quarter) is a non-epidemic season [40]. The seasonal index method was used to characterize the seasonal pattern of TB incidence in this study.

SARIMA model

The SARIMA model is more widely used in the field of public health in predicting the incidence of infectious disease situations. The autoregressive integrated moving average (ARIMA) model consists of three components: autoregressive (AR), moving average (MA), and difference (Diff).

$$\text{ARIMA}(\text{p},\text{ d},\text{ q}) =\text{ AR}(\text{p}) +\text{ Diff}(\text{d}) +\text{ MA}(\text{q})$$
(6)

SARIMA is an extended form of the ARIMA model, which consists of a sequence of trends and seasonal components. The seasonal component was added to the ARIMA model to yield the SARIMA model [20, 26, 39]. The SARIMA model is expressed as SARIMA(p, d, q) (P, D, Q)s [41]. In the expression for the SARIMA model, p is the order of autoregression, d is the degree of trend difference, q is the order of moving average, P is the seasonal autoregression lag, D is the degree of seasonal difference, Q is the seasonal moving average, and s is the length of seasonality in the time series [7, 20]. In general, with a monthly time series, s has a value of 12.

The SARIMA modeling process was the same as that of the ARIMA model. Several steps are involved in the construction of the SARIMA model [18, 42, 43]. The premise of SARIMA modeling is that the time series must be stationary. Therefore, the first step in SARIMA modeling is to determine the stationarity of the time series by performing an ADF test or plotting the time series. The second step is the estimation of the SARIMA model parameter values, p, and q, P, and Q. This step can be performed by plotting the ACF and PACF plots to obtain possible model parameter values. To obtain the possible parameter values of the model, we combined the possible model parameter values to yield multiple candidate SARIMA models. Subsequently, t-tests were performed on these SARIMA candidate models, and the R-squared, RMSE, MAPE, MAE, and standardized BIC values were calculated. The third step uses The Ljung-Box Q test to diagnose the model. If the p-value of the Ljung-Box Q test is greater than 0.05, the residuals of the model are white noise series. The coefficient of determination (R2) was used to evaluate the fitting of the SARIMA model; the larger the R2 value, the better was the fit of the model. RMSE, MAPE, and MAE were employed to evaluate the prediction accuracy of the model; the smaller their values, the higher the prediction accuracy of the model. The optimal SARIMA model is determined by the minimum values of BIC, RMSE, MAPE, MAE, and the maximum value of R2, and white noise test.

Holt-Winters model

The Holt-Winters model specializes in forecasting time series data with trends, seasonality and non-stationary series [44]. The Holt-Winters model is used extensively as a method of predicting the incidence of infectious diseases. The Holt-Winters model includes additive and multiplicative models [45]. The additive model assumes that there is an additive relationship between the trend and seasonal components of the time series, while the multiplicative model assumes that there is a multiplicative relationship between the trend and seasonal components [45]. The Holt-Winters model has three smoothing parameters 伪, 尾, and 纬 for adjusting the weights accounted for by level, trend, and seasonality, all of which take values between [0,1] [46]. Ljung-Box Q-tests on the residuals of the time series are also required for the Holt-Winters model. The optimal Holt-Winters model requires the minimum values of BIC, RMSE, MAPE, and MAE with residuals that pass the Ljung-Box Q-test (p鈥>鈥0.05) [46].

Where \({\hat{\text{X}}}_{t}\) is the predicted value at time t; \(X_{t}\) is the observed value at time t; \(\overline{X}_{t}\) is the mean of the observed values at time t; and n is the sequence sample size.

$${\text{R}}^{2} = 1{ - }\frac{{\sum {\left( {{\text{X}}_{t} - {\hat{\text{X}}}_{t} } \right)^{2} } }}{{\sum {\left( {{\text{X}}_{t} - \overline{X}_{t} } \right)^{2} } }}$$
(7)
$${\text{RMSE}} = \sqrt {\frac{{\sum\limits_{t = 1}^{n} {(X_{t} - {\hat{\text{X}}}_{t} )^{2} } }}{n}}$$
(8)
$$\text{MAPE}=\frac{{\sum_{t=1}^n}\left|{\displaystyle\frac{X_t-\;X_t}{X_t}}\right|\times100\%}n$$
(9)
$${\text{MAE}} = \frac{{\sum\limits_{t = 1}^{n} {\left| {X_{t} - {\hat{\text{X}}}_{t} } \right|} }}{n}$$
(10)

Data analysis

ArcGIS software (version 10.8.1) was used to calculate the global and local Moran鈥檚 I of TB incidence in western China from 2004 to 2018 and to perform hotspot analysis. SPSS version 23.0 (SPSS, IBM Corp., Armonk, NY, USA) was used to construct the SARIMA model. The level of significance was set at p鈥&濒迟;鈥0.05.

Results

Spatial and temporal analysis

Table 1 presents the Moran鈥檚 I values for TB reported incidence in western China from 2004 to 2018. During this period, there were 8听years in which Moran鈥檚 I values were greater than 0 and passed the significance test (p鈥<鈥0.05), indicating that TB reported incidence showed significant spatial clustering characteristics. The Moran鈥檚 I values for these 8听years were, in descending order, 2009 (0.611), 2008 (0.446), 2006 (0.423), 2015 (0.407), 2016 (0.351), 2013 (0.347), 2011 (0.270), and 2010 (0.244).

Table 1 Global Moran鈥檚 I analysis of tuberculosis incidence in western China, 2004鈥2018

There is a great need for further local spatial autocorrelation analyses of TB reported incidence because global spatial autocorrelation is only valid for homogeneous spaces, and the 12 provinces and municipalities in western China are prone to spatial heterogeneity due to large differences in their social and natural environments.

The Spatial clustering patterns of TB incidence in western China changed significantly from 2004 to 2018 (Fig.听2 and Table听2). During 2004 and 2005, the reported incidence of TB showed low-low aggregation, including in major provinces such as Yunnan, Guizhou, and Guangxi. From 2006 to 2015, the incidence reported of TB was predominantly characterized by high-high, high-low, and low-low aggregation. Among them, the Tibet Autonomous Region showed high-high aggregation in 2006, 2008, 2009, 2010, and 2015; Chongqing, Yunnan, Guizhou, Gansu, and Inner Mongolia showed high-low aggregation, and Sichuan, Yunnan, Guizhou, Shaanxi, and Guangxi showed low-low aggregation. In 2016鈥2018, the reported incidence of TB was mainly characterized by low-low aggregation, which included four provinces: Yunnan, Chongqing, Guizhou, and Guangxi.

Fig.听2
figure 2

Spatial autocorrelation LISA plots for 12 provinces and municipalities in western China (A听2004 B听2009 C听2013 D听2018). In order to describe the temporal and spatial changes and evolution of TB reported incidence in western China, we divided the entire time series (2004鈥2018) approximately equally into 4 spatial and temporal phases. The 4听years 2004, 2009, 2013, and 2018 were used to cut up the whole time series

Table 2 Spatial clustering patterns of TB incidence in western China, 2004鈥2018

The results of Getis-Ord Gi* showed that the reported incidence of TB exhibited a trend of cold and hot spot shift from 2004 to 2018, with hot spots located in Tibet, Qinghai, and Sichuan provinces, and cold spots mainly located in Yunnan, Guizhou, Guangxi, Chongqing, Inner Mongolia, and Gansu provinces, among others ( Fig.听3 and Table听3).

Fig.听3
figure 3

Localized Getis-Ord Gi* maps for 12 provinces and municipalities in western China (A听2004 B听2009 C听2013 D听2018). In order to describe the temporal and spatial changes and evolution of TB reported incidence in western China, we divided the entire time series (2004鈥2018) approximately equally into 4 spatial and temporal phases. The 4听years 2004, 2009, 2013, and 2018 were used to cut up the whole time series

Table 3 Seasonal index of TB incidence in western China, 2004-2018

Seasonal analysis

Table 4 shows that the reported incidence of TB in 12 provinces and municipalities in western China exceeded 100% in the first and second quarters, indicating that the incidence of TB showed obvious seasonal characteristics.

Table 4 Seasonal index of TB incidence in western China, 2004-2018

SARIMA model

The reported incidence of TB in 12 provinces and municipalities in western China showed significant seasonal characteristics, which fit well with the requirements of SARIMA for modeling. In our experiment, we imported TB monthly reported incidence data from 12 provinces and municipalities in western China into the time-series module of the SPSS software for predictive analysis. We used a time-series analysis in the SPSS software module to construct the SARIMA model. We identified the optimal SARIMA model through repeated experiments and incorporated the results of the expert modeler into the SPSS software module for reference. As shown in Table听5, all 12 optimal SARIMA models passed the white noise test (p鈥&驳迟;鈥0.05). Parameters and t-test results of the 12 optimal SARIMA models are detailed in Additional file 2.

Table 5 The optimal SARIMA model statistics and Ljung-Box Q test in Western China, 2004-2018

Holt-Winters model

We fitted the two models of Holt-Winters with time-series data on the TB reported incidence in 12 provinces and municipalities in western China, respectively, of which five models failed the white noise test(p鈥<鈥0.05). The results are shown in Additional file 3.

Optimal model

The optimal prediction models were selected by comparing the values of R2, RMSE, MAPE, MAE and whether the residuals passed the white noise test of SARIMA and Holt-Winters models. The results are shown in Table听6. We plotted the optimal prediction model for the TB reported incidence in 12 provinces and municipalities in western China. In Fig.听4, we can observe that the blue line in the graph has the same trend as the red line, and there is an overlap, indicating that the optimal model has a better predictive performance. However, as can be seen in Fig.听4, the saw-tooth pattern (regular sharp rises and falls) in the predicted values of reported TB incidence for the 12 provinces and cities in western China is very distinct. This may be closely related to the observed values of TB reported incidence in different provinces and municipalities in western China. That is, the observed values of TB reported incidence showed regular fluctuations (regular sharp rises and falls), and therefore, the predicted scenarios simulated using the SARIMA and Holt-Winters models were consistent with the trend of the observed scenarios.

Table 6 The optimal model for the TB reported incidence in 12 provinces and municipalities in western China
Fig.听4
figure 4

The prediction performance of optimal model on 12 provinces and municipalities in western China, the red line represents the observed value, the blue line represents the predicted value, and the purple dotted line represents the 95% confidence interval of the predicted value. A听Inner Mongolia B听Guangxi C听Chongqing D听Sichuan E听Guizhou F听Yunnan G听Tibet H听Shaanxi I听Gansu J听Qinghai K听Ningxia L听Xinjiang

Discussion

The present study showed that there was obvious spatial aggregation of TB reported incidence in 12 provinces and municipalities in western China from 2004 to 2018, in which Moran鈥檚 I was greater than 0 in 8听years, with a p-value of less than 0.05. Furthermore, local spatial autocorrelation analysis showed that the hotspot of TB reported incidence in 12 provinces and municipalities in western China was located predominantly in Tibet, and the trend of TB reported incidence has gradually changed from a hotspot to a cold spot since 2016. This may be because of the following reasons. First, the promotion of DOTS prevention strategies and the optimization of health resource allocation performed better in China in the early stages [47,48,49], which provided better conditions for the diagnosis and treatment of TB patients. Second, China has long been devoted to strengthening disease surveillance efforts for TB [50] and enhancing the testing capacity of tuberculosis prevention and treatment organizations, in particular, targeting western regions and impoverished populations, providing focused assistance, and implementing the referral system for TB. As a result, the reported incidence of TB in the 12 provinces and municipalities in western China has changed from hot to cold.

TB is one of the main causes of death in Tibet [51]. The concentration of TB hotspots in Tibet may be closely related to economic, environmental, population distribution, and other factors. The economy of Tibet lags behind some of China鈥檚 other provinces and municipalities [31], the population is dispersed, and healthcare resources are relatively underallocated. In addition, Tibet is located in southwestern China, dominated by plateau terrain, with an average altitude of more than 4,500听m and a colder climate, where residents are accustomed to long periods of time indoors, which are prone to cross-transmission of TB [31]. Studies have shown that the tourism industry in Tibet is relatively well-developed and that the movement of large-scale tourist populations may contribute to the spread of TB [31]. Furthermore, the lack of knowledge and poor self-management ability of TB treatment among residents of rural and pastoral Tibetan areas is also an important factor affecting TB treatment [52].

Seasonality is characterized by most infectious diseases [53]. In our study, the reported incidence of TB in the 12 provinces and municipalities in western China showed a clear seasonal pattern, with a significantly higher incidence in the 1st and 2nd quarters (spring and summer) than in the 3rd and 4th quarters (fall and winter). It shows the same seasonality of TB as most countries in the Northern Hemisphere, such as Japan, India, Pakistan, Iran, Portugal, and the United States [54]. The occurrence of seasonal features of TB in the 12 provinces and municipalities in western China may be related to geographic, meteorological, and other factors. The 12 provinces and municipalities in western China have complex topography, including plateaus, basins, plains, and hills. Although they exhibit heterogeneity in terms of climate, seasonal characterization is essentially the same. Thus, the seasonal characteristics of TB incidence were similar.

In addition, several studies have shown that the seasonal character of the reported incidence of TB may also be related to the supply of and/or demand for health services [55, 56]. TB is a chronic health problem. The accessibility of patients to health care may be affected by human (socio-cultural) behaviors, such as planting and harvesting seasons [55], and long public holidays, which may lead to delays in seeking health care. This in turn results in the reported incidence that may not fully reflect the actual incidence of the disease. In China, for example, the longer holidays, National Day and Chinese New Year, are characterized by high mobility which makes it more likely to cause spread of TB. Mobile patients are prone to delayed access to health care due to unstable living environments and insufficient access to health care services [57].

Several issues must be considered in order to obtain more accurate predictions. First, because different models are chosen to predict different results, the choice of prediction model is crucial for obtaining accurate prediction results [7]. Therefore, we should choose an optimal predictive model based on the availability and characteristics of infectious disease data. Second, different models are suitable for different application scenarios; therefore, there is no way to blindly follow the trend of using the most popular prediction models [58]. Currently, some scholars have introduced the most popular AI prediction models, such as Transformer [59], into the field of infectious disease prediction and have achieved better prediction performance. However, whether these cutting-edge AI prediction models can be further generalized in the field of infectious disease prediction needs to be supported by a large amount of in-depth research. The SARIMA model was constructed without relying on many assumptions or incorporating more variables, as long as there were 36 time-series samples [60]. The Holt-Winters model provides more accurate forecasts with the ability to simultaneously account for trends, seasonality, and cyclicality in time-series data [44, 45]. In our study, we only collected time-series data on TB reported incidence in 12 provinces and municipalities in western China from 2004 to 2018 without using other variables, and it is reasonable and scientific to apply the SARIMA and Holt-Winters models to the scenario of TB reported incidence.

This study has some limitations. First, data collection was incomplete. Studies have shown that the reported incidence of TB is influenced by a combination of economic [61], social [61], and demographic factors [62]; however, indicators related to these influencing factors were not included in this study. Second, in terms of methods for predicting TB reported incidence. There is a lack of comparative prediction with machine learning models in the study design. In future studies, we should comprehensively collect data on factors affecting TB reported incidence and try to build machine learning models for comparison in order to obtain more accurate prediction performance. Third, the reasons for the occurrence of seasonality in TB reported incidence are complicated and it is influenced by multiple factors; therefore, the implications of seasonality in TB reported incidence deserve to be further exploration in the future.

Conclusions

This study showed that Tibet is a hotspot of TB reported incidence among 12 provinces and municipalities in the western region of China, which needs to be emphasized by the government and health administration departments in the future. In addition, TB reported incidence shows obvious seasonal characteristics, surveillance of key populations should be performed in spring and summer, and preventive measures should be arranged in advance.

Availability of data and materials

The data used or analyzed during the current study are available from the China Public Health Science Data Center website( ). The metadata is available through the user鈥檚 registered account. The data were relatively uninvolved in providing detailed personal patient information.

Abbreviations

ARIMA:

Autoregressive integrated moving average

SARIMA:

Seasonal autoregressive integrated moving average

ARIMAX:

Autoregressive integrated moving average with exogenous variables

SARIMAX:

Seasonal autoregressive integrated moving average with exogenous variables

ETS:

Error, Trend, Seasonal

SEIR:

Susceptible exposed infectious recovered

LSTM:

Long short-term memory network

SVR:

Support vector regression

RF:

Random forest

BPNN:

Back propagation neural network

ANFIS:

Adaptive neuro fuzzy inference system

NNAR:

Neural network autoregressive

MAE:

Mean absolute error

MAPE:

Mean absolute percentage error

RMSE:

Root mean square error

BIC:

Bayesian Schwarz information criterion

References

  1. Furin J, Cox H, Pai M. Tuberculosis. Lancet. 2019;393(10181):1642鈥56. .

    听 听 听

  2. Natarajan A, Beena PM, Devnikar AV, Mali S. A systemic review on tuberculosis. Indian J Tuberc. 2020;67(3):295鈥311. .

    听 听 听

  3. World Health Organization. Global tuberculosis report 2022. Geneva: World Health Organization. 2022. Available from: .

  4. GBD Tuberculosis Collaborators. The global burden of tuberculosis: results from the Global Burden of Disease Study 2015. Lancet Infect Dis. 2018;18(3):261鈥84. .

    听 听

  5. Long Q, Guo L, Jiang W, Huan S, Tang S. Ending tuberculosis in China: health system challenges. Lancet Public Health. 2021;6(12):e948鈥53. .

    听 听 听

  6. Li T, Yan X, Du X, Huang F, Wang N, Ni N, Ren J, Zhao Y, Jia Z. Extrapulmonary tuberculosis in China: a national survey. Int J Infect Dis. 2023;128:69鈥77. .

    听 听 听

  7. Zhao D, Zhang H, Cao Q, Wang Z, He S, Zhou M, Zhang R. The research of ARIMA, GM(1,1), and LSTM models for prediction of TB cases in China. PLoS One. 2022;17(2):e0262734.听 .

    CAS听 听 听 听

  8. Technical Guidance group of the Fifth National TB Epidemiological Survey. The office of the Fifth National TB Epidemiological Survey The fifth national tuberculosis epidemiological survey in 2010. Chin J Antituberculosis. 2012;34:485鈥508 (in Chinese).

  9. Mijiti P, Yuehua L, Feng X, Milligan PJ, Merle C, Gang W, Nianqiang L, Upur H. Prevalence of pulmonary tuberculosis in western China in 2010鈥11: a population-based, cross-sectional survey. Lancet Glob Health. 2016;4(7):e485-94. .

    听 听 听

  10. Chen W, Ren X, Geng MJ, Deng Y, Huang S, Liu CJ, Wang R, Chen ZM, Wang LP. Priority, difficulties and optimization ideas of infectious disease surveillance and early warning at present stage. Disease Surveillance. 2022;37(6):730鈥3 (in Chinese).

  11. Yang WZ, Lan YJ, Lyu W, Leng ZW, Feng LZ, Lai SJ, Ye CC, Wang Q. Establishment of multi-point trigger and multi-channel surveillance mechanism for intelligent early warning of infectious diseases in China. Zhonghua Liu Xing Bing Xue Za Zhi. 2020;41(11):1753鈥7. .

    CAS听 听 听

  12. Zhang Q. Data science approaches to infectious disease surveillance. Philos Trans A Math Phys Eng Sci. 2022;380(2214):20210115. .

    CAS听 听 听

  13. Xue M, Zhong J, Gao M, Pan R, Mo Y, Hu Y, Du J, Huang Z. Analysis of spatial-temporal dynamic distribution and related factors of tuberculosis in China from 2008 to 2018. Sci Rep. 2023;13(1):4974. .

    CAS听 听 听 听

  14. Li X, Chen D, Zhang Y, Xue X, Zhang S, Chen M, Liu X, Ding G. Analysis of spatial-temporal distribution of notifiable respiratory infectious diseases in Shandong Province, China during 2005鈥2014. 国产情侣 Public Health. 2021;21(1):1597. .

    听 听 听 听

  15. Romanyukha AA, Karkach AS, Borisov SE, Belilovsky EM, Sannikova TE, Krivorotko OI. Small-scale stable clusters of elevated tuberculosis incidence in Moscow, 2000鈥2015: Discovery and spatiotemporal analysis. Int J Infect Dis. 2020;91:156鈥61. .

    听 听 听

  16. Amsalu E, Liu M, Li Q, Wang X, Tao L, Liu X, Luo Y, Yang X, Zhang Y, Li W, Li X, Wang W, Guo X. Spatial-temporal analysis of tuberculosis in the geriatric population of China: An analysis based on the Bayesian conditional autoregressive model. Arch Gerontol Geriatr. 2019;83:328鈥37. .

    听 听 听

  17. Asemahagn MA, Alene GD, Yimer SA. Spatial-temporal clustering of notified pulmonary tuberculosis and its predictors in East Gojjam Zone, Northwest Ethiopia. PLoS One. 2021;16(1):e0245378.听.

    CAS听 听 听 听

  18. Zhao D, Zhang H, Zhang R, He S. Research on hand, foot and mouth disease incidence forecasting using hybrid model in mainland China. 国产情侣 Public Health. 2023;23(1):619. .

    听 听 听 听

  19. Li ZQ, Pan HQ, Liu Q, Song H, Wang JM. Comparing the performance of time series models with or without meteorological factors in predicting incident pulmonary tuberculosis in eastern China. Infect Dis Poverty. 2020;9(1):151. .

    听 听 听 听

  20. Kuan MM. Applying SARIMA, ETS, and hybrid models for prediction of tuberculosis incidence rate in Taiwan. PeerJ. 2022;10:e13117.听.

    听 听 听 听

  21. Mao Q, Zhang K, Yan W, Cheng C. Forecasting the incidence of tuberculosis in China using the seasonal auto-regressive integrated moving average (SARIMA) model. J Infect Public Health. 2018;11(5):707鈥12. .

    听 听 听 听

  22. Xu A, Wen ZX, Wang Y, Wang WB. Prediction of different interventions on the burden of drug-resistant tuberculosis in China: a dynamic modelling study. J Glob Antimicrob Resist. 2022;29:323鈥30. .

    CAS听 听 听

  23. Yang E, Zhang H, Guo X, Zang Z, Liu Z, Liu Y. A multivariate multi-step LSTM forecasting model for tuberculosis incidence with model explanation in Liaoning Province, China. 国产情侣 Infect Dis. 2022;22(1):490. .

    CAS听 听 听 听

  24. Tang N, Yuan M, Chen Z, Ma J, Sun R, Yang Y, He Q, Guo X, Hu S, Zhou J. Machine Learning Prediction Model of Tuberculosis Incidence Based on Meteorological Factors and Air Pollutants. Int J Environ Res Public Health. 2023;20(5):3910. .

    CAS听 听 听 听

  25. Mohammed SH, Ahmed MM, Al-Mousawi AM, Azeez A. Seasonal behavior and forecasting trends of tuberculosis incidence in Holy Kerbala. Iraq Int J Mycobacteriol. 2018;7(4):361鈥7. .

    听 听 听

  26. Azeez A, Obaromi D, Odeyemi A, Ndege J, Muntabayi R. Seasonality and Trend Forecasting of Tuberculosis Prevalence Data in Eastern Cape, South Africa, Using a Hybrid Model. Int J Environ Res Public Health. 2016;13(8):757. .

    听 听 听 听

  27. Wang J, Liu X, Jing Z, Yang J. Spatial and temporal clustering analysis of pulmonary tuberculosis and its associated risk factors in southwest China. Geospat Health. 2023;18(1). .

  28. Duan Y, Cheng J, Liu Y, Fang Q, Sun M, Cheng C, Han C, Li X. Epidemiological Characteristics and Spatial-Temporal Analysis of Tuberculosis at the County-Level in Shandong Province, China, 2016鈥2020. Trop Med Infect Dis. 2022;7(11):346.

    听 听 听 听

  29. Li L, Xi Y, Ren F. Spatio-Temporal Distribution Characteristics and Trajectory Similarity Analysis of Tuberculosis in Beijing, China. Int J Environ Res Public Health. 2016;13(3):291. .

    CAS听 听 听 听

  30. Liao Z, Zhang X, Zhang Y, Peng D. Seasonality and Trend Forecasting of Tuberculosis Incidence in Chongqing. China Interdiscip Sci. 2019;11(1):77鈥85. .

    听 听 听

  31. Li J, Li Y, Ye M, Yao S, Yu C, Wang L, Wu W, Wang Y. Forecasting the Tuberculosis Incidence Using a Novel Ensemble Empirical Mode Decomposition-Based Data-Driven Hybrid Model in Tibet. China Infect Drug Resist. 2021;14:1941鈥55. .

    听 听 听

  32. Yu Y, Wu B, Wu C, Wang Q, Hu D, Chen W. Spatial-temporal analysis of tuberculosis in Chongqing, China 2011鈥2018. 国产情侣 Infect Dis. 2020;20(1):531. .

    听 听 听 听

  33. Shi B, Fu Y, Bai X, Zhang X, Zheng J, Wang Y, Li Y, Zhang L. Spatial Pattern and Spatial Heterogeneity of Chinese Elite Hospitals: A Country-Level Analysis. Front Public Health. 2021;9:710810.听.

    听 听 听 听

  34. Liu MY, Li QH, Zhang YJ, Ma Y, Liu Y, Feng W, Hou CB, Amsalu E, Li X, Wang W, Li WM, Guo XH. Spatial and temporal clustering analysis of tuberculosis in the mainland of China at the prefecture level, 2005鈥2015. Infect Dis Poverty. 2018;7(1):106. .

    听 听 听 听

  35. Sun S, Xie Y, Li Y, Yuan K, Hu L. Analysis of Dynamic Evolution and Spatial-Temporal Heterogeneity of Carbon Emissions at County Level along 鈥淭he Belt and Road鈥-A Case Study of Northwest China. Int J Environ Res Public Health. 2022;19(20):13405. .

    CAS听 听 听 听

  36. Kianfar N, Mesgari MS. GIS-based spatio-temporal analysis and modeling of COVID-19 incidence rates in Europe. Spat Spatiotemporal Epidemiol. 2022;41:100498.听.

    听 听 听 听

  37. Panahi MH, Parsaeian M, Mansournia MA, Khoshabi M, Gouya MM, Hemati P, Fotouhi A. A spatio-temporal analysis of influenza-like illness in Iran from 2011 to 2016. Med J Islam Repub Iran. 2020;34:65. .

    听 听 听 听

  38. Chen C, Li J, Huang J. Spatial-Temporal Patterns of Population Aging in Rural China. Int J Environ Res Public Health. 2022;19(23):15631. .

    听 听 听 听

  39. Alam MS, Tabassum NJ. Spatial pattern identification and crash severity analysis of road traffic crash hot spots in Ohio. Heliyon. 2023;9(5):e16303.听.

    听 听 听 听

  40. Miao J. Analysis of seasonal changes in hospital admissions in a hospital from 2010 to 2014. China Health Statistics. 2016;33(3):503鈥4. (in Chinese).

  41. Wang Y, Xu C, Li Y, Wu W, Gui L, Ren J, Yao S. An Advanced Data-Driven Hybrid Model of SARIMA-NNNAR for Tuberculosis Incidence Time Series Forecasting in Qinghai Province. China Infect Drug Resist. 2020;13:867鈥80. .

    听 听 听

  42. Zhao D, Zhang H, Cao Q, Wang Z, Zhang R. The research of SARIMA model for prediction of hepatitis B in mainland China. Medicine (Baltimore). 2022;101(23):e29317.听.

    听 听 听

  43. Liu J, Yu F, Song H. Application of SARIMA model in forecasting and analyzing inpatient cases of acute mountain sickness. 国产情侣 Public Health. 2023;23(1):56. .

    CAS听 听 听 听

  44. Chen Q, Zheng X, Shi H, Zhou Q, Hu H, Sun M, Xu Y, Zhang X. Prediction of influenza outbreaks in Fuzhou, China: comparative analysis of forecasting models. 国产情侣 Public Health. 2024;24(1):1399. .

    CAS听 听 听 听

  45. Ca帽edo MC, Lopes TIB, Rossato L, Nunes IB, Faccin ID, Salom茅 TM, Simionatto S. Impact of COVID-19 pandemic in the Brazilian maternal mortality ratio: A comparative analysis of Neural Networks Autoregression, Holt-Winters exponential smoothing, and Autoregressive Integrated Moving Average models. PLoS One. 2024;19(1):e0296064.听.

    CAS听 听 听 听

  46. Xun MJ, Li JL, Huang AJ, Chen P. Application of ARIMA and Holt-Winters exponential smoothing in the prediction of tuberculosis in Guizhou Province. Chin Prev Med. 2023;24(7):678鈥82. . (in Chinese).

    听 听

  47. Wang L, Liu J, Chin DP. Progress in tuberculosis control and the evolving public-health system in China. Lancet. 2007;369(9562):691鈥6. .

    听 听 听 听

  48. Hu M, Feng Y, Li T, Zhao Y, Wang J, Xu C, Chen W. Unbalanced Risk of Pulmonary Tuberculosis in China at the Subnational Scale: Spatiotemporal Analysis. JMIR Public Health Surveill. 2022;8(7):e36242.听.

    听 听 听 听

  49. Zhang Q, Song W, Liu S, An Q, Tao N, Zhu X, Yang D, Wan D, Li Y, Li H. An Ecological Study of Tuberculosis Incidence in China, From 2002 to 2018. Front Public Health. 2022;9:766362.

    听 听 听 听

  50. Wang YS, Zhu WL, Li T, Chen W, Wang WB. Changes in newly notified cases and control of tuberculosis in China: time-series analysis of surveillance data. Infect Dis Poverty. 2021;10(1):16. .

    听 听 听 听

  51. Li B, Zhang X, Guo J, Wang J, Pianduo B, Wei X, Yin T, Hu J. Prevalence of pulmonary tuberculosis in Tibet Autonomous Region, China, 2014. Int J Tuberc Lung Dis. 2019;23(6):735鈥40. .

    CAS听 听 听

  52. Zhang J, Yang Y, Qiao X, Wang L, Bai J, Yangchen T, Chodron P. Factors Influencing Medication Nonadherence to Pulmonary Tuberculosis Treatment in Tibet, China: A Qualitative Study from the Patient Perspective. Patient Prefer Adherence. 2020;14:1149鈥58. .

    听 听 听 听

  53. Nickbakhsh S, Ho A, Marques DFP, McMenamin J, Gunson RN, Murcia PR. Epidemiology of Seasonal Coronaviruses: Establishing the Context for the Emergence of Coronavirus Disease 2019. J Infect Dis. 2020;222(1):17鈥25. .

    CAS听 听 听 听

  54. Kim EH, Bae JM. Seasonality of tuberculosis in the Republic of Korea, 2006鈥2016. Epidemiol Health. 2018;40:e2018051.听.

    听 听 听 听

  55. Kirolos A, Thindwa D, Khundi M, Burke RM, Henrion MYR, Nakamura I, Divala TH, Nliwasa M, Corbett EL, MacPherson P. Tuberculosis case notifications in Malawi have strong seasonal and weather-related trends. Sci Rep. 2021;11(1):4621. .

    CAS听 听 听 听

  56. Paz LC, Saavedra CAPB, Braga JU, Kimura H, Evangelista MDSN. Analysis of the seasonality of tuberculosis in Brazilian capitals and the Federal District from 2001 to 2019. Cad Saude Publica. 2022;38(7):e00291321.听Portuguese. .

    听 听 听

  57. Zhen LL, Lu LY, Ren Y, Wang SL, Zhou J, Ren XH, He HH, Liu JY, Wang YL, Jiang J. Trend and influencing factors of delayed treatment of pulmonary tuberculosis patients in Yantai City from 2012 to 2021. Modern preventive medicine. 2024;51(8):1507鈥11. . (in Chinese).

    听 听

  58. Zhao D, Zhang R, Zhang H, He S. Prediction of global omicron pandemic using ARIMA, MLR, and Prophet models. Sci Rep. 2022;12(1):18138. .

    CAS听 听 听 听

  59. Li L, Jiang Y, Huang B. Long-term prediction for temporal propagation of seasonal influenza using Transformer-based model. J Biomed Inform. 2021;122:103894.听.

    听 听 听

  60. Wang YW, Shen ZZ, Jiang Y. Comparison of ARIMA and GM(1,1) models for prediction of hepatitis B in China. PLoS ONE. 2018;13(9):e0201987.听.

    CAS听 听 听 听

  61. Dye C, L枚nnroth K, Jaramillo E, Williams BG, Raviglione M. Trends in tuberculosis incidence and their determinants in 134 countries. Bull World Health Organ. 2009;87(9):683鈥91. .

    CAS听 听 听 听

  62. Saunders MJ, Evans CA. COVID-19, tuberculosis and poverty: preventing a perfect storm. Eur Respir J. 2020;56(1):2001348.

    CAS听 听 听 听

Acknowledgements

We thank Xuelian Wu and Lan Zhang for their efforts in collecting and analyzing the data for this study. We also thank Shiyuan Li from the Center for Disease Control and Prevention of Chongzhou City, Sichuan Province for administrative support and assistance with this study.

Funding

This study was supported by the Sichuan Provincial Primary Health Service Development Research Center (Grant No. SWFZ21-Q-59), Sichuan Hospital Association Medical Management Branch 2022 Youth Program (Grant No. SCYW039).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Daren Zhao, and Shiyuan Li. Data curation: Xuelian Wu, Lan Zhang, and Daren Zhao. Formal analysis: Xuelian Wu, Lan Zhang, and Daren Zhao. Writing鈥搊riginal draft: Daren Zhao, Xuelian Wu, Lan Zhang, and Shiyuan Li. Writing-review & editing: Daren Zhao, and Huiwu Zhang.

Corresponding authors

Correspondence to Daren Zhao or Huiwu Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable. Data were obtained from publicly accessible sources. Formal ethical approval was not required for this study.

Competing interests

The authors declare no competing interests.

Additional information

Publisher鈥檚 Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article鈥檚 Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article鈥檚 Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit .

About this article

Cite this article

Zhao, D., Zhang, H., Wu, X. et al. Spatial and temporal analysis and forecasting of TB reported incidence in western China. 国产情侣 Public Health 24, 2504 (2024). https://doi.org/10.1186/s12889-024-19994-6

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12889-024-19994-6

Keywords