A dose-effect relationship analysis of traditional Chinese Medicine (TCM) is crucial to the modernization of TCM. However, due to the complex and nonlinear nature of TCM data, such as multicollinearity, it can be challenging to conduct a dose-effect relationship analysis. Partial least squares can be applied to multicollinearity data, but its internally extracted principal components cannot adequately express the nonlinear characteristics of TCM data. To address this issue, this paper proposes an analytical model based on a deep Boltzmann machine (DBM) and partial least squares. The model uses the DBM to extract nonlinear features from the feature space, replaces the components in partial least squares, and performs a multiple linear regression. Ultimately, this model is suitable for analyzing the dose-effect relationship of TCM. The model was evaluated using experimental data from Ma Xing Shi Gan Decoction and datasets from the UCI Machine Learning Repository. The experimental results demonstrate that the prediction accuracy of the model based on the DBM and partial least squares method is on average 10% higher than that of existing methods.
Citation: Wangping Xiong, Yimin Zhu, Qingxia Zeng, Jianqiang Du, Kaiqi Wang, Jigen Luo, Ming Yang, Xian Zhou. Dose-effect relationship analysis of TCM based on deep Boltzmann machine and partial least squares[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14395-14413. doi: 10.3934/mbe.2023644
Related Papers:
[1]
Olha Kravchenko, Nadiia Bohomolova, Oksana Karpenko, Maryna Savchenko, Nataliia Bondar .
Scenario-based financial planning: the case of Ukrainian railways. National Accounting Review, 2020, 2(3): 217-248.
doi: 10.3934/NAR.2020013
[2]
Asensi Descals-Tormo, María-José Murgui-García, José-Ramón Ruiz-Tamarit .
A new strategy for measuring tourism demand features. National Accounting Review, 2024, 6(4): 480-497.
doi: 10.3934/NAR.2024022
[3]
Basma Ben Néfissa, Faouzi Jilani .
Impact of the mandatory adoption of IFRS on the quality of financial forecasts. National Accounting Review, 2020, 2(1): 95-109.
doi: 10.3934/NAR.2020006
[4]
Jinhong Wang, Yanting Xu .
Factors influencing the transition of China's economic growth momentum. National Accounting Review, 2024, 6(2): 220-244.
doi: 10.3934/NAR.2024010
[5]
Dimitris G. Kirikos .
An evaluation of quantitative easing effectiveness based on out-of-sample forecasts. National Accounting Review, 2022, 4(4): 378-389.
doi: 10.3934/NAR.2022021
[6]
Dina Krasnoselskaya, Venera Timiryanova .
Exploring the impact of ecological dimension on municipal investment: empirical evidence from Russia. National Accounting Review, 2023, 5(3): 227-244.
doi: 10.3934/NAR.2023014
[7]
Serhii Shvets .
Public investment as a growth driver for a commodity-exporting economy: Sizing up the fiscal-monetary involvement. National Accounting Review, 2024, 6(1): 95-115.
doi: 10.3934/NAR.2024005
[8]
Goshu Desalegn, Anita Tangl .
Forecasting green financial innovation and its implications for financial performance in Ethiopian Financial Institutions: Evidence from ARIMA and ARDL model. National Accounting Review, 2022, 4(2): 95-111.
doi: 10.3934/NAR.2022006
[9]
Mohammad Reza Abbaszadeh, Mehdi Jabbari Nooghabi, Mohammad Mahdi Rounaghi .
Using Lyapunov's method for analysing of chaotic behaviour on financial time series data: a case study on Tehran stock exchange. National Accounting Review, 2020, 2(3): 297-308.
doi: 10.3934/NAR.2020017
[10]
Adolfo Maza, Maria Hierro .
A regional macroeconomic approach to the link between immigration and wages in Spain: New insights from a spatial econometric perspective. National Accounting Review, 2025, 7(2): 177-195.
doi: 10.3934/NAR.2025008
Abstract
A dose-effect relationship analysis of traditional Chinese Medicine (TCM) is crucial to the modernization of TCM. However, due to the complex and nonlinear nature of TCM data, such as multicollinearity, it can be challenging to conduct a dose-effect relationship analysis. Partial least squares can be applied to multicollinearity data, but its internally extracted principal components cannot adequately express the nonlinear characteristics of TCM data. To address this issue, this paper proposes an analytical model based on a deep Boltzmann machine (DBM) and partial least squares. The model uses the DBM to extract nonlinear features from the feature space, replaces the components in partial least squares, and performs a multiple linear regression. Ultimately, this model is suitable for analyzing the dose-effect relationship of TCM. The model was evaluated using experimental data from Ma Xing Shi Gan Decoction and datasets from the UCI Machine Learning Repository. The experimental results demonstrate that the prediction accuracy of the model based on the DBM and partial least squares method is on average 10% higher than that of existing methods.
It has been proved, that factor model estimates can be heavily affected by outliers: data points that differ significantly from other observations in the sample. An outlier may be due to variability in the measurement or significant experimental errors; the latter are sometimes excluded from the data set. After the 2009 crisis and the COVID-19 pandemic event, the treatment of outliers attracted the attention of both researchers and the institutes of official statistics, which provided some guidelines on monitoring the effects of outliers when using their data (e.g., see Eurostat, 2020). In this paper we follow Artis et al. (2005), Croux et al. (2003), Bai et al. (2022), Fan et al. (2021) and apply robust estimation methods to factor models to limit the effects of the outliers. We contribute to the robust factor literature by comparing alternative robust factor models in terms of forecasting performances on a set of variables which are central to the economic analysis. Our database includes the 2009 crisis and the beginning of COVID-19 pandemic in March 2020 and consider as a last data January 2021; the pandemic is potentially the most important source of outliers, and its effects on the economic systems have been extensively investigated in some recent studies (Fabeil et al., 2020; Fernandes, 2020; McKibbin and Vines, 2020; McKibbin and Roshen, 2021; Liu, 2021). We shall notice that the amount of sample information is not large enough to estimate forecasting models with structural breaks since adopting them implies that the current model is estimated only using data observed since the most recent break. Similarly, it is not possible to test for a break and compare the two models for the period before and after the pandemic since the spread of contagion and its effects did not yet come to an end. This paper provides an alternative solution and shows that samples from the pandemic period have some information content which can still be used to estimate models without breaks provided a proper inference technique, such as robust inference for outliers, is applied.
The structure of the paper is as follows. Section 2 presents some background on robust inference for outliers. Section 3 introduces standard factors model and the two methodologies used to treat the outliers. Section 4 provides a data description and the empirical results obtained with robust inference methods for factor models. Section 5 concludes the chapter.
2.
Background on robust estimation
The true nature of outliers can be very elusive and dealing with data affected by outliers poses some challenges. There is no unanimous definition for what an outlier is. Outliers could be atypical samples that have an unusually large influence on the estimated model parameters. Outliers could also be perfectly valid samples from the same distribution as the rest of the data that happen to be small-probability instances. Alternatively, outliers could be samples drawn from a different model, and therefore they will likely not be consistent with the model derived from the rest of the data. There is no way to tell which is the case for a particular "outlying" sample point, nevertheless some techniques can be applied to detect outliers. A standard procedure makes use of the linear projection of the dependent variable into the linear space of covariates, the hat matrix of the data. The diagonal of the hat matrix is used to detect outlying observations that may have an impact on the inference. Usually, outliers are excluded from the dataset when estimating the model (data-trimming). See, for example Davidson and McKinnon (2004). In this paper, we compare trimming with two alternative approaches.
The first approach is based on Mahalanobis distances and can applied for detection and robust estimation. We consider robust estimators of multivariate location and scatter computed from the explanatory variables. Many methods for estimating multivariate location and scatter break down in the presence of T/(n+1) outliers, where T is the number of observations and n is the number of variables, as was pointed out by Donoho (1982). For the breakdown value of the multivariate F-estimators of Maronna (1976), see Hampel et al. (1986). In the meantime, several positive breakdown estimators of multivariate location and scatter have been proposed. The Minimum Covariance Determinant (MCD), a highly robust estimator of multivariate location and scatter (Rousseeuw, 1984) which uses only the observations whose covariance matrix has the lowest determinant, was proposed by Rousseeuw and Leroy (1987). Consistency and asymptotic normality of the MCD estimator has been shown by Butler et al. (1993) and Cator and Lopuhaa (2010), whereas has been demonstrated that MVE (Minimum Volume Ellipsoid) has a lower convergence rate (Davies, 1992). The MCD has a bounded influence function (Croux and Haesbroeck, 1999) and it has the highest possible breakdown value (i.e., 50%) when the number of observations used is ⌊(T+n+1)/2⌋ (Lopuha and Rousseeuw, 1991). In addition to being highly resistant to outliers, the MCD is affine equivariant, i.e., the estimates behave properly under affine transformations of the data. Although the MCD was already introduced in 1984, its practical use only became feasible since the introduction of the computationally efficient Fast MCD (FMCD) algorithm of Rousseeuw and Van Driessen (1999), and some extensions have been determined (Hubert et al., 2017); in this paper we follow FMCD technique. MCD have been successfully applied in many fields such as finance and econometrics (Gambacciani & Paolella, 2017; Orhan et al., 2001), quality control (Jensen et al., 2007), geophysics (Neykov, et al., 2007), geochemistry (Filzmoser et al., 2005), image analysis (Vogler et al., 2007). MCD has been used for robust factor model estimation by Croux et al. (2003) and Filzmoser et al. (2003).
The second approach considered, is the Iteratively Reweighted Least Squares (IRLS) proposed in (De la Torre and Black, 2004), which relies on the residuals of the linear projection of the dependent variable on a space generated by a set of factors. The outliers are detected as those that have a large residual with respect to the identified subspace. A new subspace is estimated with the outliers downweighted, and this process is then repeated until the estimated model stabilizes. With this algorithm for every multivariate sample a weight is determined iteratively, reducing the weights related to the outliers until the procedure converge. This technique has been used for outliers' reduction, (Bergstrom and Edlund, 2014), outliers afflicted observations (Kargoll et al., 2018) and in forecasting (Mbamalu et al., 1993). Other applications are statistical estimation (Green, 1984), matrix rank minimization (Mohan and Fazel, 2012), and sparse matrix (Daubechies et al., 2009).
3.
Factor models
In the following, we introduce Factor Models (FM), data trimming and three approaches to outlier handling: ⅰ) standard FM (FM Std) where all data are included without any transformation; ⅱ) Fast Minimum Covariance Determinant methodology combined with FM (FM FMCD) and ⅲ) Iterated Reweighted Least Squares combined with Factors Model (FM IRLS).
In the empirical analysis, a Vector Autoregressive (VAR) model is used for predicting the factors, and according to Lütkepohl (2005), series without unit roots should be used when forecasting with VARs. To meet this requirement for the factors, we perform a unit root ADF test on all variables included in the factor analysis. If necessary, variables have been differentiated to obtain a stationary time series; after this step, we normalize the series and extract the factors. Thus, in the following we assume our T×n data matrix X is covariance stationary with null mean and unitary standard deviation.
Let Xt, t > 0, be a random process with Xt=(x1t,…,xnt)′ a (n×1) random vector. The time index t represents months or quarters, and we assume the process is covariance stationary with null mean and a standard deviation equal to one. Latent factors extraction relies on the following decomposition:
E[XtX′t]ai=ΓXai=λiai,
(1)
where ai and λi,i=1,…,n, are the n-dimensional eigenvectors and the eigenvalues in decreasing order, respectively. Let A be an (n×n) orthonormal matrix with the normalized eigenvectors in the columns, also called factor loading matrix, then:
ΓXA=AΛ,
(2)
where Λ is a diagonal matrix with elements λi,i=1,…,n, on the main diagonal. The vector of n factors Fn,t=(f1,t,…,fn,t)′ is given by the linear transformation:
Fn,t=A′Xt,
(3)
And fk,t, t>0 is the k-th factor. Let us denote with Γn the expectation of the external product of the factor, E[Fn,tF′n,t]; then one obtains the following relationship between Γn and the eigenvector matrix Λ:
Γn=E[A′XtX′tA]=A′E[XtX′t]A=A′ΓXA=Λ.
(4)
Let Fk,t=(f1,t,…,fk,t)′ be the collection of the first k factors at time t, with k<n, then:
Fk,t=A′kXt,
(5)
where Ak is the matrix containing the first k columns of A. Since the columns of A are orthogonal, then A′kAk=Ik. The first k factors capture the following proportion of the total variance:
Vk=k∑i=1λi/n∑i=1λi.
(6)
The collection of factors Fk,t is customarily called standard FM (FM Std).
3.2. A robust factor model: the fast minimum covariance determinant estimator
In n-variate data, n > 2, it is difficult to detect outliers because one can no longer rely on visual inspection, nevertheless a set of summary statistics can be used. One of the statistics used in the literature is the Mahalanobis distance:
D(xt,ˆμ,ˆΣ)=Dt=√(xt−ˆμ)′ˆΣ−1(xt−ˆμ),
(7)
where xt is the t-th row of the data matrix X, ˆμ is the estimator of the location, and ˆΣ is the covariance matrix estimator. Using this distance, one obtains the classical tolerance ellipse defined as the set of n-dimensional points xt,t=1,…,T. Detecting outliers by means of the Mahalanobis distance no longer suffices for multiple outliers because of the masking effect, by which multiple outliers do not necessarily have large Mahalanobis distances (Hubert ET AL., 2017). We consider a robust estimator of multivariate location and scatter base on the notion of Minimum Covariance Determinant (MCD) (Rousseeuw, 1984; Rousseeuw and Leroy, 1987; Hubert et al., 2017). In the MCD, only the r observations, ⌊(T+n+1)/2⌋≤r≤T, whose classical covariance matrix has the lowest determinant are considered in the computation of the Mahalanobis distance:
RD(xt,−μMCD,ˆΣMCD)=√(xt−ˆμMCD)′ˆΣ−1MCD(xt−ˆμMCD)
(8)
where ˆμMCD and ˆΣMCD are the MCD estimator of the mean and the covariance matrix respectively defined as follow:
Where W(d2t) is an appropriate weight function and c1 is a consistency factor (e.g., see Lopulhaa and Rousseuw, 1991). Note that the MCD estimator can only be computed when r > n, otherwise the covariance matrix of any r-subset has determinant 0, so we need at least T > 2n. To avoid excessive noise, it is recommended that T > 5n, so that we have at least five observations per dimension.
The MCD estimator is computationally expensive as it requires the evaluation of (Tr) subsets of size r and for this reason we use the Fast Minimum Covariance Determinant estimator (FMCD) of Rousseeuw and Van Driessen (1999). A major component of the FMCD algorithm is the concentration step, C-step, which works as follows. Given the initial estimates ˆμold and ˆΣold:
● Compute the distances dold(t)=D(xt,ˆμold,ˆΣold),t=1,…,T.
● Sort these distances and yield a permutation τ such that:
In Theorem 1 of Rousseuw and Van Driessen (1999) it is proved that det(ˆΣnew)≤det(ˆΣold), with equality only if ˆΣnew=ˆΣold. Thus, if we iterate the C-step, the sequence of determinants obtained in this way converges in a finite number of iterations. The FMCD algorithm supplies a sequence of weights, one or zero (zero for the outliers), that has length T, and we repeat this sequence for n column to obtain the matrix HMCD with dimension T×n. We multiply the data matrix X by HMCD:
HMCD⊙X=XMCD
(11)
where ⊙ denotes the Hadamard's product. We use XMCD to extract the factors (Croux et al., 2003; Pison et al., 2003) Fk,t, as described in Section 3.1 and obtain the model FM FMCD. In this paper we set r=0.95T.
3.3. A robust factor model: the iterated reweighted least squares estimator
The Maximum-Likelihood-Type Estimator (M-Estimator) is another popular robust method for estimating the location and scale of a set of points, and its application leads to the Iterated Reweighted Least Squares (IRLS) (Bergstrom et al., 2014; Daubechies et al., 2009). Define the residual as follows:
εt=||xt−AkFk,t||2
(12)
where Ak and Fk,t have been defined in Section 3.1, and ||⋅||2 is the Euclidean norm for vectors. IRLS assumes continuous weights as a function of the residual:
wt=ρ(εt)ε2t
(13)
For some given robust loss function ρ(⋅) from the set of the real to the positive reals. The objective function then becomes:
T∑t=1ρ(εt)
(14)
Many loss functions have been proposed in the statistics literature (Huber 1981; Barnett and Lewis 1984). When ρ(εt)=ε2t, all weights are equal to 1, and we obtain the standard least-squares solution, which is not robust. Other robust loss functions are described in Vidal et al. (2016), in this work we use a Geman-McClure loss (Geman and McClure, 1987):
ρ(εt)=ε2tε20+ε2t
(15)
where ε20 is a parameter that we consider equal to the square root of mean of ε2t. Following De la Torre and Blank (2004), we use a Geman-McClure loss scaled by ε20 which yields the following procedure. Given an initial parameter ε20 and factor loadings and factors, Ak and Fk,t, respectively, obtained from the FM Std of Section 3.1, iterate until convergence the following steps:
1. Compute the residuals εt=||xt−AkFk,t||2.
2. Compute the weights wt=ε20ε20+ε2t.
3. Estimate the covariance Σ←∑Tt=1wtxtx′t∑Tt=1wt.
4. Extract the first k largest eigenvalues of Σ and collect the corresponding eigenvectors in Ak.
The factor matrix Fk,t obtained as above is called FM IRLS.
3.4. A forecasting model
Once factors are extracted a forecasting procedure is needed to predict the variables of interest. We assume the first k latent factors (determined with the three methodologies described above), Fk,t=(f1,t,…,fk,t)′, with k<n, follow a VAR model. Using only k factors, the reconstruction of the variables derives from the approximated model:
Xk,t=AkFk,t
(16)
With the term Xk,t we mean the approximation of the vector Xt obtained using the first k factors Fk,t. Considering the dynamic part related to the k factors, our model is thus as follows:
Xk,t=AkFk,t
(17)
Fk,t=ck+ΦkFk,t−1+εk,t,εk,tWN(0,Σk)
(18)
where ck has dimension k x 1 and Φk has dimensions k x k. As shown in Billio et al. (2022), under VAR assumption for the factors, the variables of interest Xk,t follow a VAR model with restrictions. Thus, the conditional forecasts Xk,t+h at the horizon h, h=1,…,H are obtained as follows:
Xk,t+h|t=AkˆFk,t+h|t
(19)
where:
ˆFk,t+h|t=E[Fk,t+h|X1,…,Xt]=ck+ΦkˆFk,t+h−1|t
(20)
To summarize, we first estimate the latent factors and then use a VAR model on factors to forecast both the factors and the variables of interest. After then, the variables will be reconverted to their correct values reversing the procedures of normalization and integrated if they have been previously differentiated.
4.
Empirical applications
4.1. Data description
We consider a dataset of macroeconomic variables related to the US and the EU economies, provided by Bloomberg. It consists of 42 monthly variables and 2 quarterly variables, sampled from December 2001 to January 2021, and includes some key variables for policy making: core and headline prices, labour market variables, imports, exports, industrial production, consumption, sales, leading indicators of interest rates, and the term structure. See Table 1 for a more detailed description.
Table 1.
Macroeconomic variables for two major geographical regions, the US and EU, sampled either at monthly or quarterly frequency from December 2001 to January 2021.
N
C
Definition
L
MU
F
1
US
Export
Ex
m/m
M
2
US
Import
Im
m/m
M
3
US
Unemployment rate
UR
%
M
4
US
Employment (Agricultural sector)
EA
thousands
M
5
US
Employment (Private sector)
EP
thousands
M
6
US
Average hourly wages
Ahw
m/m
M
7
US
PCE
PCE
y/y
M
8
US
PCE core
PCEc
y/y
M
9
US
PPI
PPI
y/y
M
10
US
Industrial Production
IP
y/y
M
11
US
Industrial Orders
IO
m/m
M
12
US
Durable goods orders
Dgo
m/m
M
13
US
Durable goods orders excluding transport
Dgoet
m/m
M
14
US
Stocks
S
m/m
M
15
US
Use of production capacity
Upc
%
M
16
US
ISM manufacturing
ISMm
level
M
17
US
Start of new construction sites
Snc
m/m
M
18
US
Constructions expenditure
Cse
m/m
M
19
US
Existing homes sale
Ehs
m/m
M
20
US
New homes sale
Nhs
m/m
M
21
US
Expenditure (real)
Er
m/m
M
22
US
Income (real)
Ir
m/m
M
23
US
Retail sales
RS
m/m
M
24
US
Conference Board
CB
level
M
25
US
Michigan Consumer Sentiment Index
MCSI
level
M
26
US
RUS10 (Int.Rate Gov.Bond 10Y US)
RUS10
Yield
M
27
US
DeltaRUS72 (=RUS7-RUS2)
DRUS72
Yield
M
28
EU
Export
Ex
m/m
M
29
EU
Import
Im
m/m
M
30
EU
Unemployment rate
UR
%
M
31
EU
HCPI
HCPI
y/y
M
32
EU
CPI core
CPIc
y/y
M
33
EU
PPI
PPI
y/y
M
34
EU
Industrial Production
IP
y/y
M
35
EU
Constructions expenditure
Cse
m/m
M
36
EU
PMI manufacturing index
PMImI
level
M
37
EU
ESI
ESI
level
M
38
EU
Leading indicator
LeIn
level
M
39
EU
Retail Sales
RS
y/y
M
40
EU
REMU10 (Int.Rate Gov.Bond 10Y EU)
REMU10
Yield
M
41
EU
DeltaREMU72 (=REMU7-REMU2)
DREMU72
Yield
M
42
EU/US
CEUUS
CEUUS
Ratio?/$
M
43
US
Gross Domestic Product
GDPUS
q/q
Q
44
EU
Gross Domestic Product
GDPEU
q/q
Q
Note: In the columns, the series: number (N), country (C), description (Definition), label (L), measure unit (MU), and frequency (F) that is quarterly (Q) or monthly (M).
In our dataset, the 2009 financial crisis and the COVID-19 pandemic generated outliers in many time series. For example, a graphical inspection of the US unemployment rate series reveals the dramatic impact of COVID-19 pandemic after March 2021 (Figure 1). In presence of outliers, the researcher can choose to trim the data, that is to reduce or eliminate the outliers and to run inference on a linear model overcoming the inference issues (such as bias) that the outliers can generate. Data trimming requires outliers are detected first. To detect the presence of the outliers, a standard procedure consists in fitting the linear regression model:
Y=Xβ+ε
(21)
Figure 1.
The US Unemployment rate (in percentage) from December 2001 to January 2021.
By Least Squares and recovering the hat matrix H from the fitted value of Y:
ˆY=Xˆβ=X(X′X)−1X′Y=HY
(22)
The hat matrix H is a symmetrical and idempotent T x T projection matrix; it has n eigenvalues equal to one and T-n equal to zero. The diagonal elements ht,t have the following property:
0≤ht,t≤1,t=1,…,T
(23)
Points where ht,t have large values are called leverage points, and it can be proved that the presence of leverage points signals that there are observations that might have a decisive influence on the estimation of the regression parameters. We consider the leverage points as a proxy for the quick survey of presence of the outliers. We can see in Figure 2 the greater values of ht,t is detected for the 2009 crisis and COVID-19 pandemic.
Figure 2.
Outliers' detection. The Hat matrix diagonal values for the US Unemployment rate from December 2001 to January 2021.
The factors have been extracted from the monthly variables using three FM methodologies (FM Std, FM FMCD, and FM IRLS), and then they supply the forecast using a VAR model as described in Section 3.4. As regards to the quarterly variables we follow a nowcasting procedure. First, we derive the regression coefficients of the quarterly variables on the nowcasted factors and secondly, we use the coefficients and the forecasted factors to forecast the quarterly variables.
We analyze the stability of the factors and the percentage of explained variance. We follow a rolling window estimation approach and analyze the out-of-sample forecast ability of the FM with a twelve-months horizon. There are 61 overlapping windows of 170 observations each. The first window is from December 2001 to January 2016, the second shift is one month from January 2002 to February 2016, and the 61st is from December 2006 to January 2021. See Figure 3 for a graphical illustration of the procedure (see Figure 3).
Figure 3.
Rolling windows. The window size is 170 observations. We consider 61 overlapping windows. The first window is from December 2001 to January 2016 (second line), the second window is from January 2002 to February 2016 (third line), and the last is from December 2012 to January 2021 (last line). The green segments identify the data used to produce the factors and the forecast. The forecast's horizon is twelve months and is identified by the orange. Segments. The red segments indicate the 2009 crise and the COVID-19 pandemic.
In our empirical applications, the k factors are used to forecast the variables of interest, which are the Unemployment rate and the Harmonized Index of Consumer Prices (HCPI); with nowcasting procedure we produce the forecast also for GDP. Our choice is to explain a given proportion of variance Vk<1 in Eq. (6) with a reduced number of factors in order to limit the dimension of forecast model (i.e., the VAR model). For example, in our application we choose to explain at least 80% of the variance, Vk=0.8, with no more than 9 factors.
Figure 4 reports the values of the leverage point ht,t estimated from the panel series in some relevant windows (see plot labels). The value of ht,t increases slowly in the observation windows where the 2009 crisis is included (e.g., see plots w1, w19, w31, and w43). When the observations associated to the COVID-19 pandemic period are included in the samples, then ht,t reaches much larger values, about 1 (see w51, w52, w53, and w58). A double peak appeared in the window w61 due to the second wave in the CODIV-19 pandemic.
Figure 4.
Leverage points ht,t for some relevant windows.
In conclusion, we consider COVID-19 as the greater cause of outliers that the researchers are facing. For this reason, we choose a percentage of r=0.95T for the values to be saved in FMCD algorithm.
Figure 5 shows the eigenvalues (left column) and the contribution of the first 9 factors (right) to the variance for the three FMs. The graphs refer to the data of the last window in Figure 3. The scale of the eigenvalues differs across models since the weights used in FMCD and IRLS have different size. The decay rate of the spectrum is similar across models, and this indicates small number of factors explain a large proportion of variance. The FM FMCD and FM IRLS models intercept a smaller proportion of variance than in the FM Std case.
Figure 5.
Eigenvalues (left) and contribution of the factors (right) for the FM Std (top), FM FMCD (middle) and FM IRLS (bottom).
The green lines in Figure 6 shows the weights used in the FMCD (left) and IRLS (right). Setting r=0.95T yields weight equal to one for all windows expect for the COVID-19 pandemic windows where the weight is equal to zero. For the IRLS the weights are strictly positive for all estimation windows and below one. The two weight sequences have different impact on the extraction of the factors (e.g., see the first factor in the same figure and the three factors in Figure 7).
Figure 6.
Factor 1 extracted with the FM Std (magenta line), the FM FMCD (left, blue line) and FM IRLS (right, blue line), and the weights used (green line). Due to the configuration of the weights, the amplitude of the peak associated with the pandemics (see top plot in Figure 7) is reduced in the FM FMCD model, whereas both peaks are largely reduced in the IRLS model.
Figure 7.
The first three factors (blue line), their out-of-sample forecasts (magenta solid lines) with the confidence bands (magenta dashed lines), for the FM Std (top), FM FMCD (middle) and FM IRLS (bottom).
The FM Std model factors exhibit at least 2 peaks corresponding to the 2009 crisis and COVID-19 pandemics windows. In the robust FM procedures, the weight sequences reduce substantially the effects of the two sources of outliers.
In Table 2, we illustrate the bias issues induced by the presence of outliers, by comparing the correlation between the variables in the dataset (columns) and the first factor of the three models (rows), estimated in the last windows (w61). The first factor in the three models explains the 26%, 23%, and 20% of the variance, respectively. The set of the most correlated variables in the standard FM model differs from the one of the FM FMCD and FM IRLS models, which indicates that the bias in the estimation of the factor can be large in the FM models if the outliers are not treated properly.
Table 2.
The 10 most correlated variables with the first factors for the three methodologies in the 61st (December 2006–January 2021) window.
FM Std: Correlations between Factors 1 and Variables.
Country
USA
USA
EU
USA
EU
USA
USA
EU
USA
EU
Indicator
EP
EA
Ex
Ex
RS
Ahw
Er
Im
IP
LeIn
Measure
thousands
thousands
m/m
m/m
y/y
m/m
m/m
m/m
y/y
level
Factor 1 w61
0.835
0.835
0.798
0.785
0.726
–0,724
0.721
0.708
0.686
0.669
FM FMCD: Correlations between Factors 1 and Variables.
Country
USA
EU
EU
EU
EU
USA
USA
USA
USA
EU
Indicator
PPI
PMImI
ESI
LeIn
IP
Stocks
PCE
ISMm
Upc
PPI
Measure
y/y
level
level
level
y/y
m/m
y/y
level
%
y/y
Factor 1 w61
–0.847
–0.762
–0.759
–0.730
–0.728
–0.708
–0.707
–0.680
–0.668
–0.600
FM IRLS: Correlations between Factors 1 and Variables.
FM FMCD and FM IRLS share 9 common variables, and the correlation levels are similar in the two models; the result indicates that the choice of the weights can have an impact on the results, but the economic interpretation of the factors is not affected too much.
4.3. Forecast comparison
We use the rolling window analysis introduced in the previous section for comparing the three models: FM Std, FM FMCD and FM IRLS. For each window, the models produce 12 forecasts out of sample (see Figure 3). We measure and compare sequentially the ability of the models to forecast the following variables: GDP, Unemployment rate, as well as PCE for both the EU and the US regions.
For every window, we determine the factors and compute the forecast at the horizon of 12 months for the monthly variables and 4 quarters for the quarterly variables. The rolling window of 170 observations is moved forward by one month, and the forecasts are computed again. We repeat this exercise 61 times until the end date of the observation window coincides with January 2021.
For every series, compute the square of the difference between the forecast and the actual values, sum the squared differences, divide them by the total number of forecast points, and take a square root to obtain the Root Mean Square Error (RMSE). Let st be the forecast horizon at time t for monthly data. In our application, it is equal to 12 for all t except when the end of the window is close to January 2021, when the horizon decreases. Moreover, let TEp=61 be the number of forecasts for each one of the st months.
At time t, we have the following error for every forecast (we omit here the identification of the variable):
e(t)=st∑i=1[f(t+i)−v(t+i)]2
(24)
where f(t+i) indicates the forecast for the variable v(t+i). The forecast is made at time t with forecasting horizon i. The RMSE is thus defined as follows:
RMSE=√1∑TEpt=1stTEp∑j=1e(j)
(25)
As a first step, we show the RMSE value for the first three factors, that intercept more than 52.5%, 47.8%, 46.7% of the total variance for FM Std, FM FMCD, FM IRLS respectively.
The left column of Figure 8 shows the actual values of the three factors (solid blue lines, in the rows), the 12-step-ahead FM forecasts (dashed lines), and their envelope (solid red lines), which can be considered an approximation of the forecasting error bands. The forecast comparison includes the COVID-19 pandemic period, but cannot be made for the 2009 crisis one, due to the choice of the rolling window size (see Figure 3). Thus, in the following we focus on the forecast ability during the COVID-19 period.
Figure 8.
In the rows, the first three factors FM Std (left), FM FMDC (middle) and FM IRLS (right). In each plot, the actual value of the factor (blue solid), the forecasts (red dashed) and the forecast envelop (red solid).
For the first factor, the actual values belong to the envelope region for all periods except for the pandemic crisis periods, which reveal the difficulties in predicting the effects of the pandemic events. A similar behavior can be detected for the second factor. The middle and right columns in Figure 8 show the first three factors for FMCD and IRLS methodologies respectively; their behavior is comparable only by graphical point of view, because they have different scale due to the two applied algorithms. Figure 9 shows the forecasts and the RMSEs for our variables of interest and for the three methodologies: FM Std (left column), FM FMCD (middle) and FM IRLS (right).
Figure 9.
In the rows, the variables forecasted with the FM Std (left), FM FMDC (middle) and FM IRLS (right) model. In each plot, the actual value of the variable (blue solid), the forecasts (red dashed) and the forecast envelop (red solid).
By using the envelope (solid red lines) as reference lines, it is possible to compare graphically the forecast performance of the models. Since the actual data belong to the area delimited by the envelope of the FM FMCD and FM IRLS models, we conclude that they usually perform better than FM Std. The lower RMSE level of the FM FMCD and FM IRLS model for both the monthly and quarterly variables allows us to confirm this result (see panel (a) in Table 3).
Table 3.
Root Mean Square Error (RMSE) for the variables of interest (rows) following different forecasting models (columns).
The effect of outliers on the most impacted variables propagates to the forecast of the other variables through the factors, which can explain the bad performance of the standard FM model. The variables of interest that are the most difficult to predict are the GDPs and the Unemployment rate. Their values have been most affected by the crisis. On the other hand, prices maintain good predictability for both regions because this variable was not heavily penalized by the crisis. The impact of outliers can be reduced in the FM FMCD and FM IRLS models, nevertheless, for the US Unemployment Rate, the effects of COVID-19 on the forecasting performances have been disruptive for all the three methodologies.
In Table 3(b) we can see that RMSE of the 12-month-ahead forecast for USA Unemployment Rate is smaller than one of the forecasts at 5 and 7 months ahead. This is mainly due to the error magnitude of forecast done in April 2020. This forecast exercise includes in its horizon the first sample impacted by the COVID-19 and has a very large forecast error. Since the dataset ends in January 2021, the forecast horizon st in this exercise is 9 months (see Figure 3); which implies that for this forecast it is possible to measure the errors at 1, 5, 7 months but not at 12.
Finally, following the guidelines provided by Eurostat (2020) on modelling outliers due to COVID-19, we monitor sequentially the forecasting errors. The RMSEs for the one-, two-, seven- and twelve-step-ahead forecasts of the three methodologies indicate that the FM FMCD and FM IRLS models have better performances than the FM Std model at all the horizons (see Figures 10, 11 and 12 in Appendix). The numerical results in the panel (b) of Table 3 suggest the FM IRLS model has superior forecasting ability at all horizons.
The bottom line from this section is the following:
● the sample observations during the 2009 crisis and the 2020 COVID-19 pandemic heavily affect factor estimates obtained with the standard procedure;
● consequently, standard factor models can produce significant forecasting errors in the presence of outliers, whereas robust models perform better;
● the variables most impacted by the 2009 crisis and the pandemic (such as GDP and unemployment) exhibit the most significant forecast errors in all estimation procedures;
● the sequential forecasting comparison between MCD and IRLS showed that the latter approach usually leads to superior forecasting performances.
5.
Conclusions
Outliers can have disruptive effects on inference, biasing the estimates and the conclusion of the statistical analysis. Through the lens of factor models we provide evidence of the effects of outliers due to the 2009 crisis and the COVID-19 pandemic on the forecast abilities of the models. We applied two techniques for robust factor estimation based on robust covariance matrix estimators. The robust methodologies that we chose have the advantage of avoiding data deletion or manipulation. We compare the standard factor estimation with the robust estimation approaches for an extended period and on a set of relevant variables. The choice to include the COVID-19 pandemic period in the estimation and forecasting exercises has the scope to highlight the relevance of handling outliers in periods of large shocks to the world's economies. We show that robust estimation can reduce outliers' influence and produce good forecasts.
Conflict of interest
All authors declare no conflicts of interest in this paper.
References
[1]
D. B. Singh, R. K. Pathak, D. Rai, From traditional herbal medicine to rational drug discovery: strategies, challenges, and future perspectives, Rev. Bras. Farmacogn., 32 (2022), 147–159. https://doi.org/10.1007/s43450-022-00235-z
[2]
J. Yang, Y. Li, Q. Liu, L. Li, A. Feng, T. Wang, et al., Brief introduction of medical database and data mining technology in big data era, J. Evidence-Based Med., 13 (2020), 57–69. https://doi.org/10.1111/jebm.12373 doi: 10.1111/jebm.12373
[3]
D. Ma, S. Wang, Y. Shi, S. Ni, M. Tang, A. Xu, The development of traditional Chinese medicine, J. Tradit. Chin. Med. Sci., 8 (2021), 1–9. https://doi.org/10.1016/j.jtcms.2021.11.002 doi: 10.1016/j.jtcms.2021.11.002
[4]
J. Sun, X. Meng, Application of big data technology in extracting information analysis of traditional Chinese medicine, J. Phys.: Conf. Ser., 1881 (2021), 042050. https://doi.org/10.1088/1742-6596/1881/4/042050 doi: 10.1088/1742-6596/1881/4/042050
[5]
X. Chu, B. Sun, Q. Huang, S. Peng, Y. Zhou, Y. Zhang, Quantitative knowledge presentation models of traditional Chinese medicine (TCM): A review, Artif. Intell. Med., 103 (2020), 101810. https://doi.org/10.1016/j.artmed.2020.101810 doi: 10.1016/j.artmed.2020.101810
[6]
H. Jiang, Y. Zhang, Z. Liu, X. Wang, J. He, H. Jin, Advanced applications of mass spectrometry imaging technology in quality control and safety assessments of traditional Chinese medicines, J. Ethnopharmacol., 284 (2022), 114760. https://doi.org/10.1016/j.jep.2021.114760 doi: 10.1016/j.jep.2021.114760
[7]
L. Gan, X. Yin, J. Huang, B. Jia, Transcranial Doppler analysis based on computer and artificial intelligence for acute cerebrovascular disease, Math. Biosci. Eng., 20 (2023), 1695–1715. https://doi.org/10.3934/mbe.2023077 doi: 10.3934/mbe.2023077
[8]
X. Wang, X. Zhang, J. Li, B. Hu, J. Zhang, W. Zhang, et al., Analysis of prescription medication rules of traditional Chinese medicine for bradyarrhythmia treatment based on data mining, Medicine, 101 (2022), 31436. https://doi.org/10.1097/md.0000000000031436 doi: 10.1097/md.0000000000031436
[9]
Y. Yang, Y. Huang, L. Yang, H. Liu, Design of TCM research demand system based on data mining technology, in 2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS), (2023), 1–5. https://doi.org/10.1109/ICICACS57338.2023.10099881
[10]
J. Chen, J. Xu, P. Huang, Y. Luo, Y. Shi, P. Ma, The potential applications of traditional Chinese medicine in Parkinson's disease: A new opportunity, Biomed. Pharmacother., 149 (2022), 112866. https://doi.org/10.1016/j.biopha.2022.112866 doi: 10.1016/j.biopha.2022.112866
[11]
Y. Liu, T. Geng, Z. Wan, Q. Lu, X. Zhang, Z. Qiu, et al., Associations of serum folate and vitamin B12 levels with cardiovascular disease mortality among patients with type 2 diabetes, JAMA Netw. Open, 5 (2022), 2146124. https://doi.org/10.1001/jamanetworkopen.2021.46124 doi: 10.1001/jamanetworkopen.2021.46124
[12]
Y. Liu, B. Li, Y. Su, R. Zhao, P. Song, H. Li, et al., Potential activity of Traditional Chinese Medicine against Ulcerative colitis: A review, J. Ethnopharmacol., 289 (2022), 115084. https://doi.org/10.1016/j.jep.2022.115084 doi: 10.1016/j.jep.2022.115084
[13]
M. E. McNamara, M. Zisser, C. G. Beevers, J. Shumake, Not just "big" data: Importance of sample size, measurement error, and uninformative predictors for developing prognostic models for digital interventions, Behav. Res. Ther., 153 (2022), 104086. https://doi.org/10.1016/j.brat.2022.104086
[14]
H. Abdi, Lynne. J. Williams, Partial least squares methods: Partial least squares correlation and partial least square regression, Comput. Toxicol., 930 (2012). https://doi.org/10.1007/978-1-62703-059-5_23
[15]
F. Schuberth, M. E. Rademaker, J. Henseler, Assessing the overall fit of composite models estimated by partial least squares path modeling, Eur. J. Mark., 57 (2022), 1678–1702. https://doi.org/10.1108/EJM-08-2020-0586 doi: 10.1108/EJM-08-2020-0586
[16]
Z. Shang, Y. Dong, M. Li, Z. Li, Robust feature selection and classification algorithm based on partial least squares regression, J. Comput. Appl., 37 (2017), 871–875.
[17]
J. Qin, X. Yu, P. Zhang, M. Yang, An optimal band selection method for hyperspectral imagery based on kernel partial least squares, J. Geomatics Sci. Technol., 30 (2013), 172–176.
[18]
L. Zhou, Research on Feature Extraction Method Based on Monlinear Partial Least Squares, Master's thesis, Nanjing University of Science and Technology in Nanjing, 2011.
[19]
Z. Zhu, J. Du, R. Yu, B. Nie, Partial least squares optimization method integrating restricted Boltzmann machine, Comput. Eng., 43 (2017), 193–197.
[20]
W. Xiong, T. Li, Q. Zeng, J. Du, B. Nie, C. Chen, et al., Research on partial least squares method based on deep confidence network in traditional Chinese medicine, Discrete Dyn. Nat. Soc., 2020 (2020), 4142824. https://doi.org/10.1155/2020/4142824 doi: 10.1155/2020/4142824
[21]
W. Dai, K. Feng, X. Sun, L. Xu, S. Wu, K. Rahmand, et al., Natural products for the treatment of stress-induced depression: Pharmacology, mechanism and traditional use, J. Ethnopharmacol., 285 (2022), 114692. https://doi.org/10.1016/j.jep.2021.114692 doi: 10.1016/j.jep.2021.114692
[22]
Y. Fei, H. Cao, R. Xia, Q. Chai, C. Liang, Y. Feng, et al., Methodological challenges in design and conduct of randomised controlled trials in acupuncture, BMJ, 376 (2022), 064345. https://doi.org/10.1136/bmj-2021-064345 doi: 10.1136/bmj-2021-064345
[23]
J. Yan, C. Peng, P. Chen, W. Zhang, C. Jiang, S. Sang, et al., In-vitro anti-Helicobacter pylori activity and preliminary mechanism of action of Canarium album Raeusch. fruit extracts, J. Ethnopharmacol., 283 (2022), 114578. https://doi.org/10.1016/j.jep.2021.114578 doi: 10.1016/j.jep.2021.114578
Q. Zeng, Research and Application of improved PLS in Traditional Chinese Medicine Data, Master's thesis, Jiangxi University of Chinese Medicine in Nanchang, 2019. https://doi.org/10.27180/d.cnki.gjxzc.2019.000050
[26]
T. Li, Study on Partial Least Squares Variable Screening Method for Chinese Dosage Effectiveness Data, Master's thesis, Jiangxi University of Chinese Medicine in Nanchang, 2021. https://doi.org/10.27180/d.cnki.gjxzc.2021.000470
[27]
Y. Ichikawa, K. Hukushima, Statistical-mechanical study of deep boltzmann machine given weight parameters after training by singular value decomposition, J. Phys. Soc. Jpn., 91 (2022), 114001. https://doi.org/10.7566/JPSJ.91.114001 doi: 10.7566/JPSJ.91.114001
[28]
N. Srivastava, R. R Salakhutdinov, G. E. Hinton, Modeling documents with deep boltzmann machines, preprint, arXiv: 13096865.
[29]
F. Taheri, K. Rahbar, P. Salimi, Effective features in content-based image retrieval from a combination of low-level features and deep Boltzmann machine, Multimed. Tools Appl., (2022). https://doi.org/10.1007/s11042-022-13670-w
[30]
Y. Wang, F. Xu, J. Wang, X. Cui, T. Yi, Reconfigurable stochastic neurons for restricted boltzmann machine, J. Phys.: Conf. Ser., 2347 (2022), 012014. https://doi.org/10.1088/1742-6596/2347/1/012014 doi: 10.1088/1742-6596/2347/1/012014
[31]
N Zhang, S. Ding, J. Zhang, Y. Xue, An overview on Restricted Boltzmann Machines, Neurocomputing, 275 (2018), 1186–1199. https://doi.org/10.1016/j.neucom.2017.09.065 doi: 10.1016/j.neucom.2017.09.065
[32]
V. Duraisamy, A. Devi, S. Aggarwal, Multi disease prediction based on combined deep reinforcement Boltzmann machines, AIP Conf. Proc., 2555 (2022), 020003. https://doi.org/10.1063/5.0108952 doi: 10.1063/5.0108952
A. Tsanas, M. Little, P. McSharry, L. Ramig, Accurate telemonitoring of Parkinson's disease progression by non-invasive speech tests, Nat. Prec., (2009). https://doi.org/10.1038/npre.2009.3920.1
[35]
H. Anysz, A. Zbiciak, N. Ibadov, The influence of input data standardization method on prediction accuracy of artificial neural networks, Procedia Eng., 153 (2016), 66–70. https://doi.org/10.1016/j.proeng.2016.08.081 doi: 10.1016/j.proeng.2016.08.081
[36]
M. Shanker, M. Y. Hu, M. S. Hung, Effect of data standardization on neural network training, Omega, 24 (1996), 385–397. https://doi.org/10.1016/0305-0483(96)00010-2 doi: 10.1016/0305-0483(96)00010-2
This article has been cited by:
1.
Xiangtao Chen, Yuting Bai, Peng Wang, Jiawei Luo,
Data augmentation based semi-supervised method to improve COVID-19 CT classification,
2023,
20,
1551-0018,
6838,
10.3934/mbe.2023294
2.
Juan Zhou, Xiong Li, Yuanting Ma, Zejiu Wu, Ziruo Xie, Yuqi Zhang, Yiming Wei,
Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data,
2023,
20,
1551-0018,
5117,
10.3934/mbe.2023237
3.
Laura Grassini,
Statistical features and economic impact of Covid-19,
2023,
5,
2689-3010,
38,
10.3934/NAR.2023003
4.
Wenhui Feng, Xingfa Zhang, Yanshan Chen, Zefang Song,
Linear regression estimation using intraday high frequency data,
2023,
8,
2473-6988,
13123,
10.3934/math.2023662
5.
Zejun Li, Yuxiang Zhang, Yuting Bai, Xiaohui Xie, Lijun Zeng,
IMC-MDA: Prediction of miRNA-disease association based on induction matrix completion,
2023,
20,
1551-0018,
10659,
10.3934/mbe.2023471
Jie Zheng, Yijun Li,
Machine learning model of tax arrears prediction based on knowledge graph,
2023,
31,
2688-1594,
4057,
10.3934/era.2023206
8.
Jun Zhao, Peibiao Zhao,
Quantile hedging for contingent claims in an uncertain financial environment,
2023,
8,
2473-6988,
15651,
10.3934/math.2023799
9.
Zahra Hajirahimi, Mehdi Khashei, Negar Bakhtiarvand,
A linear directional optimum weighting (LDOW) approach for parallel hybridization of classifiers,
2024,
161,
15684946,
111754,
10.1016/j.asoc.2024.111754
10.
Sangjae Lee, Joon Yeon Choeh,
Exploring the influence of online word-of-mouth on hotel booking prices: insights from regression and ensemble-based machine learning methods,
2024,
4,
2769-2140,
65,
10.3934/DSFE.2024003
Wangping Xiong, Yimin Zhu, Qingxia Zeng, Jianqiang Du, Kaiqi Wang, Jigen Luo, Ming Yang, Xian Zhou. Dose-effect relationship analysis of TCM based on deep Boltzmann machine and partial least squares[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14395-14413. doi: 10.3934/mbe.2023644
Wangping Xiong, Yimin Zhu, Qingxia Zeng, Jianqiang Du, Kaiqi Wang, Jigen Luo, Ming Yang, Xian Zhou. Dose-effect relationship analysis of TCM based on deep Boltzmann machine and partial least squares[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14395-14413. doi: 10.3934/mbe.2023644
Table 1.
Macroeconomic variables for two major geographical regions, the US and EU, sampled either at monthly or quarterly frequency from December 2001 to January 2021.
N
C
Definition
L
MU
F
1
US
Export
Ex
m/m
M
2
US
Import
Im
m/m
M
3
US
Unemployment rate
UR
%
M
4
US
Employment (Agricultural sector)
EA
thousands
M
5
US
Employment (Private sector)
EP
thousands
M
6
US
Average hourly wages
Ahw
m/m
M
7
US
PCE
PCE
y/y
M
8
US
PCE core
PCEc
y/y
M
9
US
PPI
PPI
y/y
M
10
US
Industrial Production
IP
y/y
M
11
US
Industrial Orders
IO
m/m
M
12
US
Durable goods orders
Dgo
m/m
M
13
US
Durable goods orders excluding transport
Dgoet
m/m
M
14
US
Stocks
S
m/m
M
15
US
Use of production capacity
Upc
%
M
16
US
ISM manufacturing
ISMm
level
M
17
US
Start of new construction sites
Snc
m/m
M
18
US
Constructions expenditure
Cse
m/m
M
19
US
Existing homes sale
Ehs
m/m
M
20
US
New homes sale
Nhs
m/m
M
21
US
Expenditure (real)
Er
m/m
M
22
US
Income (real)
Ir
m/m
M
23
US
Retail sales
RS
m/m
M
24
US
Conference Board
CB
level
M
25
US
Michigan Consumer Sentiment Index
MCSI
level
M
26
US
RUS10 (Int.Rate Gov.Bond 10Y US)
RUS10
Yield
M
27
US
DeltaRUS72 (=RUS7-RUS2)
DRUS72
Yield
M
28
EU
Export
Ex
m/m
M
29
EU
Import
Im
m/m
M
30
EU
Unemployment rate
UR
%
M
31
EU
HCPI
HCPI
y/y
M
32
EU
CPI core
CPIc
y/y
M
33
EU
PPI
PPI
y/y
M
34
EU
Industrial Production
IP
y/y
M
35
EU
Constructions expenditure
Cse
m/m
M
36
EU
PMI manufacturing index
PMImI
level
M
37
EU
ESI
ESI
level
M
38
EU
Leading indicator
LeIn
level
M
39
EU
Retail Sales
RS
y/y
M
40
EU
REMU10 (Int.Rate Gov.Bond 10Y EU)
REMU10
Yield
M
41
EU
DeltaREMU72 (=REMU7-REMU2)
DREMU72
Yield
M
42
EU/US
CEUUS
CEUUS
Ratio?/$
M
43
US
Gross Domestic Product
GDPUS
q/q
Q
44
EU
Gross Domestic Product
GDPEU
q/q
Q
Note: In the columns, the series: number (N), country (C), description (Definition), label (L), measure unit (MU), and frequency (F) that is quarterly (Q) or monthly (M).
Note: In the columns, the series: number (N), country (C), description (Definition), label (L), measure unit (MU), and frequency (F) that is quarterly (Q) or monthly (M).
FM Std: Correlations between Factors 1 and Variables.
Country
USA
USA
EU
USA
EU
USA
USA
EU
USA
EU
Indicator
EP
EA
Ex
Ex
RS
Ahw
Er
Im
IP
LeIn
Measure
thousands
thousands
m/m
m/m
y/y
m/m
m/m
m/m
y/y
level
Factor 1 w61
0.835
0.835
0.798
0.785
0.726
–0,724
0.721
0.708
0.686
0.669
FM FMCD: Correlations between Factors 1 and Variables.
Country
USA
EU
EU
EU
EU
USA
USA
USA
USA
EU
Indicator
PPI
PMImI
ESI
LeIn
IP
Stocks
PCE
ISMm
Upc
PPI
Measure
y/y
level
level
level
y/y
m/m
y/y
level
%
y/y
Factor 1 w61
–0.847
–0.762
–0.759
–0.730
–0.728
–0.708
–0.707
–0.680
–0.668
–0.600
FM IRLS: Correlations between Factors 1 and Variables.
Country
EU
USA
EU
USA
EU
EU
USA
USA
USA
EU
Indicator
LeIn
PPI
ESI
Upc
IP
PPI
PCE
Stocks
PCEc
PMImI
Measure
level
y/y
level
%
y/y
y/y
y/y
m/m
y/y
level
Factor 1 w61
–0.877
–0.852
–0.840
–0.796
–0.794
–0.722
–0.708
–0.674
–0.666
–0.623
Panel a. Cross-horizon overall RMSEs.
RMSE
FM Std
FM FMCD
FM IRLS
USA GDP
0.42
0.39
0.36
USA Unemployment rate
40.06
33.62
33.65
USA PCE
0.83
0.62
0.49
EU GDP
0.48
0.41
0.42
EU Unemployment rate
0.80
0.72
0.70
EU HCPI
0.98
0.81
0.80
Panel b. RMSEs at different horizons.
1 month ahead
5 months ahead
7 months ahead
12 months ahead
RMSE
FM Std
FM FMCD
FM IRLS
FM Std
FM FMCD
FM IRLS
FM Std
FM FMCD
FM IRLS
FM Std
FM FMCD
FM IRLS
USA Unempl. Rate
3.54
3.11
3.11
33.53
27.06
27.08
56.97
47.31
47.35
4.91
3.61
3.32
USA PCE
0.49
0.46
0.31
0.79
0.60
0.47
0.82
0.69
0.52
1.05
0.67
0.58
EU Unempl. rate
0.16
0.15
0.14
0.66
0.62
0.65
0.80
0.75
0.68
1.29
1.38
0.98
EU HCPI
0.51
0.49
0.45
0.91
0.79
0.74
1.00
0.87
0.82
1.25
1.06
1.05
Figure 1. Structure diagram of DBM with N hidden layers
Figure 2. Overall structure of DBM-PLS model
Figure 3. Model evaluation visualization results. (a) Comparison of RMSE of six models. (b) Comparison of MAE of six models. (c) Comparison of MSPE of six models. (d) Comparison of R2 of six models
Figure 4. Descent of RMSE in DBM-PLS learning process