Loading [MathJax]/jax/output/SVG/jax.js
Research article

Bayesian analysis of random effects panel interval-valued data models

  • Received: 11 March 2025 Revised: 07 May 2025 Accepted: 16 May 2025 Published: 23 May 2025
  • In the era of big data, interval-valued data is quite common in real life and can be used to describe the uncertainty of variables. In this paper, we introduced random effects panel interval-valued data models based on the center and range method and constructed a Bayesian method for the models, including estimation and prediction. Some simulation studies indicate that the proposed Bayesian method performs well. Finally, our proposed panel interval-valued data Bayesian models were applied in forecasting of the Air Quality Index, and the experimental evaluation of actual data sets shows the advantages and the performance of our proposed models.

    Citation: Dengke Xu, Linlin Shen, Yuanyang Tangzhu, Shiqi Ke. Bayesian analysis of random effects panel interval-valued data models[J]. Electronic Research Archive, 2025, 33(5): 3210-3224. doi: 10.3934/era.2025141

    Related Papers:

    [1] Chi Liu, Ruiqin Tian, Dengke Xu . Generalized estimation equations method for fixed effects panel interval-valued data models. Electronic Research Archive, 2025, 33(6): 3733-3755. doi: 10.3934/era.2025166
    [2] Juan Li, Geng Sun . A rational resource allocation method for multimedia network teaching reform based on Bayesian partition data mining. Electronic Research Archive, 2023, 31(10): 5959-5975. doi: 10.3934/era.2023303
    [3] Xuerui Li, Lican Kang, Yanyan Liu, Yuanshan Wu . Distributed Bayesian posterior voting strategy for massive data. Electronic Research Archive, 2022, 30(5): 1936-1953. doi: 10.3934/era.2022098
    [4] Showkat Ahmad Lone, Hanieh Panahi, Sadia Anwar, Sana Shahab . Estimations and optimal censoring schemes for the unified progressive hybrid gamma-mixed Rayleigh distribution. Electronic Research Archive, 2023, 31(8): 4729-4752. doi: 10.3934/era.2023242
    [5] Ke Liu, Hanzhong Liu . Testing for individual and time effects in unbalanced panel data models with time-invariant regressors. Electronic Research Archive, 2022, 30(12): 4574-4592. doi: 10.3934/era.2022232
    [6] Meixin Xiong, Liuhong Chen, Ju Ming, Jaemin Shin . Accelerating the Bayesian inference of inverse problems by using data-driven compressive sensing method based on proper orthogonal decomposition. Electronic Research Archive, 2021, 29(5): 3383-3403. doi: 10.3934/era.2021044
    [7] Jongho Kim, Woosuk Kim, Eunjeong Ko, Yong-Shin Kang, Hyungjoo Kim . Estimation of spatiotemporal travel speed based on probe vehicles in mixed traffic flow. Electronic Research Archive, 2024, 32(1): 317-331. doi: 10.3934/era.2024015
    [8] Sadia Anwar, Showkat Ahmad Lone, Aysha Khan, Salmeh Almutlak . Stress-strength reliability estimation for the inverted exponentiated Rayleigh distribution under unified progressive hybrid censoring with application. Electronic Research Archive, 2023, 31(7): 4011-4033. doi: 10.3934/era.2023204
    [9] Xinyi Xu, Shaojuan Ma, Cheng Huang . Uncertainty prediction of wind speed based on improved multi-strategy hybrid models. Electronic Research Archive, 2025, 33(1): 294-326. doi: 10.3934/era.2025016
    [10] Jiping Xing, Yunchi Wu, Di Huang, Xin Liu . Transfer learning for robust urban network-wide traffic volume estimation with uncertain detector deployment scheme. Electronic Research Archive, 2023, 31(1): 207-228. doi: 10.3934/era.2023011
  • In the era of big data, interval-valued data is quite common in real life and can be used to describe the uncertainty of variables. In this paper, we introduced random effects panel interval-valued data models based on the center and range method and constructed a Bayesian method for the models, including estimation and prediction. Some simulation studies indicate that the proposed Bayesian method performs well. Finally, our proposed panel interval-valued data Bayesian models were applied in forecasting of the Air Quality Index, and the experimental evaluation of actual data sets shows the advantages and the performance of our proposed models.



    Nowadays, big data, artificial intelligence, 5G, and other technologies have gradually entered people's vision, and have a huge impact on our lives. These also promote the total amount of data to show exponential growth. In the face of massive data, how to quickly and accurately obtain, process, mine, and integrate the required information becomes particularly important. According to different actual situations and purpose requirements, the data type is no longer limited to the definite point value data, but also gradually evolves to the uncertain data. As time goes on, many scholars at home and abroad have proposed interval numbers to describe uncertain data. Generally speaking, interval-valued data [1] arise due to one of the following two reasons: (ⅰ) imprecise observations of quantities, resulting in the translation of the measured value into an interval of possible values, and (ⅱ) information aggregation. The past three decades have witnessed enormous developments in statistical methods for interval-valued data. In particular, methods of regression focusing on interval-valued data have been extensively developed. For instance, Billard and Diday [2] proposed the center method (CM) for linear regression analysis of the interval-valued data, assuming that the interval-valued data had a uniform distribution. Billard and Diday [3] proposed the min-max method, taking the lower and upper limits of interval data as special point values, and establishing linear regression models for each of them. Billard and Diday [4] proposed the binary center and range method (BCRM), in which the explanatory variable contains information about both the center and range in the regression process. Then, LimaNeto and De Carvalho [5] proposed the centre and range method (CRM), which implied that the intervals were converted to centers and ranges. Based on the CRM, Lima Neto and Carvalho [6] introduced the constrained center and range method (CCRM), which added non-negative constraints of range coefficients. Giordani [7] adapted the CRM and presented a Lasso-based method for the interval-valued regression model. In addition, some scholars tried to establish interval-valued data regression models by other methods instead of the CRM. Maia and De Carvalho [8] proposed a multi-layer perceptron neural network based on interval-valued data and the Holt exponential smoothing method. Souza et al. [9] considered a parametrized approach that automatically extracted the best reference points from intervals. Kong et al. [10] proposed an interval local linear method (ILLM) to fit a regression model with interval-valued explanatory and response variables, which has no restrictions on the form of the regression function. Kong and Gao [11] studied the method of moments (MM) estimation method for interval-valued regression models. However, there is almost no literature studying the Bayesian modeling problem of interval panel data models.

    As is known to all, a panel data set offers a certain number of advantages over traditional pure cross-section or pure time series data sets and statistical models combining cross-section and time series real-valued data become increasingly popular in economic research. For example, Nuroglu and Kunst [12] discussed the impact of exchange rate fluctuations on international trade flows by using panel data analysis and fuzzy data analysis methods. He et al. [13] constructed an interval slacks-based measure (SBM) of non-expected output, which analyzed China's environmental technology efficiency based on panel data of various provinces. But so far, the data in panel data models are all real-valued [14,15,16,17,18,19], so it is very necessary to build a panel data model for interval-valued data. Recently, Ji et al. [20] introduced a panel data regression model for interval-valued data and constructed three kinds of panel interval-valued data regression models, and this is the first attempt to discuss panel interval-valued data models.

    In addition, due to the recent dramatic evolution in advanced computational technologies, Bayesian inference has also received a lot of attention in recent years. In Bayesian regression fields, Park and Casella [21] proposed a Bayesian Lasso method for linear models. By using spline approximation, Xu and Zhang [22] introduced a Bayesian method for the partially linear model with heteroscedasticity based on the variance modeling technique. Castillo et al. [23] studied a high-dimensional linear regression with a sparse prior, which is a mixture of point masses at zero and continuous distributions. Pfarrhofer and Piribauer [24] proposed two shrink age priors to make Bayesian variable selection for high-dimensional spatial autoregressive models. Wang and Tang [25] considered Bayesian inference on a quantile regression model in the presence of nonignorable missing covariates. Zhang et al. [26] considered Bayesian quantile regression analysis for semiparametric mixed-effects double regression models based on the asymmetric Laplace distribution for the errors. Tang et al.[27] for the first time used Elastic Net, a penalized method, for Bayesian quantile regression of panel data and derived the posterior distributions of all parameters based on the asymmetric Laplace prior distributions, and then constructed Gibbs sampling. Tao et al.[28] proposed a Bayesian adaptive Lasso quantile regression method based on asymmetric exponential power distribution and applied the method to panel data. However, few works are devoted to constructing the Bayesian framework for interval-valued data. Zhang et al. [29] proposed the Bayesian nonparametric regression models by assuming that the upper and lower of the interval were distributed as an asymmetric Laplace distribution. Xu and Qin [30] extended the CRM for interval-valued regression models to the Bayesian framework for the first time, and proposed a bivariate Bayesian regression model based on the CRM with known and unknown covariance matrices.

    However, to the best of our knowledge, there is little work done for constructing the Bayesian framework for panel interval-valued data. Hence, a Bayesian model for random effects panel interval-valued data is developed on the basis of the center and range method in this paper, and is compared with the Bayesian estimation based on the center method and the Bayesian estimation based on the minimum and maximum method. Finally, our proposed panel interval-valued data models are applied in forecasting of the Air Quality Index.

    The outline of the paper is as follows. In Section 2, we introduce a Bayesian model of random effects panel interval-valued data based on the center and range method, and give the likelihood based on this model. In Section 3, the prior distribution and Bayesian posterior inference of the model are given in detail. In Section 4, the results of parameter estimation are obtained through simulation and compared with other methods to illustrate the feasibility of the proposed method. In Section 5, we apply the model to real data, and in Section 6, the paper is concluded with a brief discussion.

    For panel interval-valued data set S={(Xit,yit)|i=1,2,,n;t=1,2,,T}, yit=[ylit,yuit] is assumed as the observed interval-valued dependent variables, where the superscripts l and u represent the lower and upper bounds of the interval, respectively, and let Xit=(xit1,xit2,,xitp)T be p×1 interval-valued independent vectors with xitj=[xlitj,xuitj],i=1,2,,n;t=1,2,,T;j=1,2,,p.

    In the following, we consider Bayesian analysis of random effects panel interval-valued data models:

    ycit=αci+Xcitβc+εcit,yrit=αri+Xritβr+εrit, (2.1)

    where ycit=yuit+ylit2, yrit=yuitylit2, Xcit=Xuit+Xlit2, Xrit=XuitXlit2. βc=(βc1,,βcp)T and βr=(βr1,,βrp)T are vectors of p-dimensional unknown parameters. In addition, εcit and εrit are mutually independent and identically distributed normal random variables with zero mean and variances σ2c and σ2r, respectively. αci and αri denote the random effects associated with individual i and assume that the random effects αciN(0,ϕ2c), αriN(0,ϕ2r). Then for time t, model (2.1) can be written as

    Ytc=αc+Xtcβc+εtc,Ytr=αr+Xtrβr+εtr. (2.2)

    Then for s{c,r}, the center and range models take on the following form:

    Yts=αs+Xtsβs+εts, (2.3)

    where Yts=(ys1t,ys2t,,ysnt)T, αs=(αs1,αs2,,αsn)T, Xts=(Xs1t,Xs2t,,Xsnt)T, βs=(βs1,βs2,,βsp)T, εts=(εs1t,εs2t,,εsnt)T.

    For convenience, matrix notation and vector notation are used to represent variables and models. Let Ys=((Y1s)T,(Y2s)T,,(YTs)T)T, Xs=((X1s)T,(X2s)T,,(XTs)T)T, εs=((ε1s)T,(ε2s)T,,(εTs)T)T, Ds=1TsIns, where "" denotes the Kronecker product, 1Ts is a T×1 dimensional vector with all elements 1, and Ins is an n×n dimensional unit matrix. Then, model (2.3) can also be written in matrix form,

    Ys=Dsαs+Xsβs+εs, (2.4)

    where αsN(0,ϕ2sIn), εsN(0,σ2sIN), N=n×T.

    Based on the above model, the likelihood function of the model parameters is defined as follows:

    L(Ys,αs|Xs,βs,σ2s,ϕ2s)=(2π)N+n2(σ2s)N2(ϕ2s)n2exp{12σ2s(YsDsαsXsβs)(YsDsαsXsβs)(αs)Tαs2ϕ2s}. (2.5)

    To estimate the unknown parameters βs, σ2s, and ϕ2s, we implement a Bayesian approach [31]. Therefore, we should appoint prior distributions for the parameters of the models. First, we assume βs has normal prior distributions, i.e., βsN(β0s,Σ0s). It is assumed that the hyperparameters βs0, Σs0 are known. In addition, the prior information for other unknown parameters is σ2sIG(as0,bs0), ϕ2sIG(cs0,ds0), where as0, bs0, cs0, ds0 are hyperparameters to be given, and "IG" means the inverse gamma distribution. Thus, the joint priors of all of the unknown parameter are given by

    π(βs,σ2s,ϕ2s)=p(βs)p(σ2s)p(ϕ2s). (3.1)

    From the likelihood function (2.5) and prior distributions (3.1), we can obtain the following theorems and give a brief proof of Theorem 1. The proofs of the other theorems are similar and will not be written in detail here.

    Theorem 1. Suppose that the parameter βs follows a normal prior distribution, i.e., βsN(βs0,Σs0), and then the posterior distribution of βs follows the normal distribution,

    p(βs|Ys,Xs,αs,σ2s,ϕ2s)N(μβs,Σβs),

    where μβs=Σβs((Xs)T(YsDsαs)σ2s+(Σ0s)1β0s), Σβs=(1σ2s(Xs)TXs+(Σ0s)1)1.

    The posterior distribution of βs is as follows, and we make a transformation of its form:

    p(βs|Ys,Xs,αs,σ2s,ϕ2s)L(Ys,αs|Xs,βs,σ2s,ϕ2s)π(βs,σ2s,ϕ2s)exp{12σ2s(YsDsαsXsβs)T(YsDsαsXsβs)}exp{12(βsβ0s)T(Σ0s)1(βsβ0s)}exp{12(βs)T((Xs)TXsσ2s+(Σ0s)1)βs+((YsDsαs)TXsσ2s+(β0s)T(Σ0s)1)βs}.

    Obviously, the conditional distribution of βs agrees with the form of the probability density of the multivariate normal distribution, so βs obeys the normal distribution, thus this paper obtains Theorem 1.

    Theorem 2. Based on model assumptions, we can obtain the posterior distribution of αs as follows, that is

    p(αs|Ys,Xs,βs,σ2s,ϕ2s)N(μαs,Σαs),

    where μαs=Σαs1σ2s(Ds)T(YsXsβs), Σαs=((Ds)TDsσ2s+Insϕ2s)1.

    Theorem 3. Suppose that the prior distribution of σ2s is known, i.e., σ2sIG(as0,bs0), and then the posterior distribution of σ2s follows the inverse gamma distribution

    p(σ2s|Ys,Xs,βs,αs,ϕ2s)IG(as,bs),

    where as=N2+as0, bs=12(YsDsαsXsβs)T(YsDsαsXsβs)+bs0.

    Theorem 4. Suppose that the prior distribution of ϕ2s is known, i.e., ϕ2sIG(cs0,ds0), and then the posterior distribution of ϕ2s follows the inverse gamma distribution

    p(ϕ2s|Ys,Xs,βs,αs,σ2s)IG(cs,ds),

    where cs=n2+cs0, ds=(αs)Tαs2+ds0.

    From Theorems 1 to 4, the posteriors of βs, σ2s, ϕ2s are all familiar distributions and can be sampled directly. The specific algorithm is shown in Table 1. Thus based on the above MCMC algorithm, a converged posteriori sample can be collected and this sample is noted as θ(sim)=(βs(sim),σ2s(sim),ϕ2s(sim)), sim=1,2,,M,M<Sim. As such, the posteriori estimation of the parameters (^βs,^σ2s,^ϕ2s) can be respectively estimated as follows:

    ^βs=1MMsim=1βs(sim),^σ2s=1MMsim=1σ2s(sim),^ϕ2s=1MMsim=1ϕ2s(sim).
    Table 1.  The algorithm of Bayesian estimation for the unknown parameters.
    Algorithm: The specific algorithm is given for unknown parameters θ=(βs,σ2s,ϕ2s)
    Input: The initial value θ(0)=(βs(0),σ2s(0),ϕ2s(0)) is given, and the number of iterations of the sampling algorithm is Sim.
    Output: Posterior sample sequence (θ(1),θ(2),,θ(Sim)).
    for sim 1 to Sim do:
    1. Sampling αs|Ys,Xs,βs,σ2s,ϕ2sN(μαs,Σαs);
    2. Sampling βs|Ys,Xs,αs,σ2s,ϕ2sN(μβs,Σβs);
    3. Sampling σ2s|Ys,Xs,βs,αs,ϕ2sIG(as,bs);
    4. Sampling ϕ2s|Ys,Xs,βs,αs,σ2sIG(cs,ds);
    End

     | Show Table
    DownLoad: CSV

    During the implementation process, the algorithm is repeated 100 times, and the final parameter estimation is the result based on the 100 times average.

    In this section, we investigate the performance of the proposed model and Bayesian estimation method via Monte Carlo simulation. We compare Bayesian analysis of random effects panel interval-valued data models based on the center and range method (BCRM) with Bayesian analysis of random effects panel interval-valued data models based on the center method (BCM) and Bayesian analysis of random effects panel interval-valued data based on the minimum and maximum method (BMMM). In order to demonstrate the quality of Bayesian estimation and prediction, Section 4.1 introduces the measurement methods. Section 4.2 shows the data generation process and parameter details. Then we present all simulation results and conclusions in Section 4.3.

    There are three measurements for evaluating the performances of different models:

    1) The upper and lower bounds root mean-square errors, i.e., RMSEU and RMSEL [5] are

    RMSEU=1nTni=1Tt=1(ˆyuityuit)2,
    RMSEL=1nTni=1Tt=1(ˆylitylit)2.

    2) The root mean-square error RMSEH [32] is

    RMSEH=1nTni=1Tt=1(|ˆyuityuit|+|ˆylitylit|)2.

    3) The rate of different intervals(RI) defined by Hu and He [33] is

    RI=1nTni=1Tt=1ω(yitˆyit)ω(yitˆyit),

    where ω() represents the width of the intervals.

    In this subsection, based on the center and range method (BCRM), four configurations C1, C2, C3, and C4 are generated, as shown in Table 2.

    Table 2.  Four configurations of simulation data.
    C1 C2 C3 C4
    XsU(0,2) XsU(0,2) XsU(0,2) XsU(0,2)
    αsN(0,0.5In) αsN(0,0.5In) αsN(0,0.5In) αsN(0,0.5In)
    εcN(0,0.5IN) εcN(0,2IN) εcN(0,1IN) εcN(0,3IN)
    εr=εc+e εr=εc+e εrN(0,1IN) εrN(0,3IN)
    eN(0,0.5IN) eN(0,0.5IN)

     | Show Table
    DownLoad: CSV

    The data sets are generated from the following models:

    Ys=Dsαs+Xsβs+εs. (4.1)

    Referring to Xu and Qin [30], first we fixed the regression coefficients βc=(8,7,6)T and βr=(4,6,5)T. Second, Xs is generated from the uniform distribution U(0,2) and αs is generated from N(0,0.5In) and fixed for each replication. We generate Ys from the above model, where εsN(0,σ2sIN). In addition, we consider the noninformative prior information type of hyperparameter values for unknown parameters βs, σ2s, and ϕ2s in the simulation: β0s=03, Σ0s=10×I3, as0=bs0=cs0=ds0=0.01, where 03 is a 3-dimensional vector with all elements being 0. Further, we choose n to be 50,100 and T to be 12, and therefore the sample sizes are N=n×T = 600 and 1200.

    For configurations C1 and C2, we assume that there is a linear relationship hidden in εc and εr that εr=εc+e, where e is the random error generated from N(0,σ2IN) and σ2 = 0.5. In configurations C3 and C4 it is assumed that there is no linear relationship between εc and εr.

    Based on the above settings and the generated data sets, the preceding proposed algorithm is used to evaluate the Bayesian estimation of unknown parameters based on 100 replications. In order to obtain better and more accurate results, we collect the observation results of the following J = 2000 for statistical inference by discarding the first 3000 burn-in iterations.

    Tables 3 and 4 show the Bayesian estimates when n = 50,100 and T is fixed at 12 based on 100 repetitions in all configurations. The accuracy of the estimation is expressed by BIAS and standard deviation SD, and the following conclusions are obtained:

    Table 3.  The results of Bayesian estimation when n = 50.
    Config. Para. βc1 βc2 βc3 σ2c ϕ2c βr1 βr2 βr3 σ2r ϕ2r
    C1 BIAS -0.0053 -0.0012 -0.0020 0.0135 0.0098 -0.0110 -0.0080 0.0059 0.0197 0.0151
    SD 0.0476 0.0522 0.0525 0.0319 0.1145 0.0588 0.0602 0.0644 0.0622 0.1228
    C2 BIAS -0.0078 -0.0011 -0.0008 0.0330 -0.0019 -0.0179 -0.0113 0.0089 0.0409 0.0092
    SD 0.0931 0.0964 0.0966 0.1163 0.1350 0.0897 0.0948 0.0932 0.1510 0.1367
    C3 BIAS -0.0063 -0.0010 -0.0016 0.0212 0.0059 -0.0051 -0.0089 0.0028 0.0141 0.0153
    SD 0.0667 0.0707 0.0712 0.0605 0.1211 0.0602 0.0616 0.0706 0.0511 0.1362
    C4 BIAS -0.0090 -0.0014 -0.0005 0.0434 -0.0105 -0.0074 -0.0146 0.0048 0.0235 0.0107
    SD 0.1135 0.1158 0.1160 0.1720 0.1502 0.1021 0.1031 0.1154 0.1476 0.1767

     | Show Table
    DownLoad: CSV
    Table 4.  The results of Bayesian estimation when n = 100.
    Config. Para. βc1 βc2 βc3 σ2c ϕ2c βr1 βr2 βr3 σ2r ϕ2r
    C1 BIAS -0.0012 -0.0006 -0.0041 0.0037 0.0044 0.0054 0.0041 -0.0005 0.0032 0.0003
    SD 0.0336 0.0319 0.0366 0.0229 0.0766 0.0476 0.0467 0.0439 0.0381 0.0898
    C2 BIAS -0.0025 -0.0003 -0.0084 0.0040 0.0043 0.0098 -0.0004 -0.0022 0.0004 -0.0010
    SD 0.0631 0.0600 0.0673 0.0879 0.0924 0.0731 0.0682 0.0614 0.0984 0.1042
    C3 BIAS -0.0019 -0.0007 -0.0060 0.0045 0.0043 -0.0003 0.0097 0.0006 0.0057 0.0017
    SD 0.0460 0.0437 0.0494 0.0446 0.0817 0.0465 0.0472 0.0494 0.0393 0.0936
    C4 BIAS -0.0030 0.0001 -0.0102 0.0027 0.0040 -0.0010 0.0142 0.0000 0.0056 0.0013
    SD 0.0761 0.0726 0.0806 0.1311 0.1034 0.0765 0.0788 0.0797 0.1160 0.1197

     | Show Table
    DownLoad: CSV

    1) Comparing configurations C1, C2, and C4, it can be found that when n is the same, SD of C1 are basically the smallest among the three configurations, while SD of C4 are basically the largest, which indicates that the smaller the variance of the error term, the smaller the estimated error.

    2) As the sample size increases, the accuracy of Bayesian estimation for the BCRM gradually gets better. For example, in configuration C1, when n = 50 and 100, the SD of βc1 is 0.0476 and 0.0336, respectively.

    In general, the BIAS and SD of all parameters in the four configurations are very small, indicating that the Bayesian estimation effect is good.

    To investigate the convergence of the proposed algorithm, we visualize the values of 5000 iteration updates in a loop for each configuration when n = 50, and the results are shown in Figure 1. It can be seen that all the parameters are roughly in a straight line, indicating that the algorithm has a considerable convergence speed.

    Figure 1.  Parameter convergence with different configurations for n = 50.

    The next step is to do a predictive study of the models. We compare the Bayesian method based on the center and range method with the other two methods mentioned above, and all results are listed in Tables 58, where the standard deviation is in parentheses. 75% of the data is selected as the training set and 25% of the data as the test set. Each case is repeated 100 times and the results of the prediction error are averaged. The following conclusions can be drawn:

    Table 5.  Prediction results of each method in C1 configuration.
    n T RMSEL RMSEU
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 0.723 6.531 6.296 1.585 6.664 2.360
    (0.041) (0.194) (0.147) (0.079) (0.246) (0.131)
    100 12 0.717 6.495 6.281 1.590 6.640 2.349
    (0.027) (0.155) (0.106) (0.060) (0.178) (0.091)
    n T RMSEH RI
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 1.631 7.094 6.320 0.931 0.714 0.768
    (0.076) (0.215) (0.145) (0.004) (0.007) (0.008)
    100 12 1.635 7.068 6.303 0.931 0.715 0.768
    (0.057) (0.158) (0.103) (0.003) (0.005) (0.005)

     | Show Table
    DownLoad: CSV
    Table 6.  Prediction results of each method in C2 configuration.
    n T RMSEL RMSEU
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 0.738 6.530 6.296 2.913 7.086 3.398
    (0.044) (0.195) (0.147) (0.149) (0.312) (0.187)
    100 12 0.725 6.494 6.281 2.930 7.072 3.400
    (0.029) (0.155) (0.106) (0.108) (0.233) (0.131)
    n T RMSEH RI
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 2.932 7.749 6.465 0.894 0.708 0.747
    (0.147) (0.256) (0.146) (0.007) (0.008) (0.009)
    100 12 2.947 7.731 6.450 0.894 0.709 0.747
    (0.108) (0.190) (0.102) (0.005) (0.006) (0.006)

     | Show Table
    DownLoad: CSV
    Table 7.  Prediction results of each method in C3 configuration.
    n T RMSEL RMSEU
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 1.418 6.645 6.415 1.423 6.625 2.254
    (0.081) (0.226) (0.183) (0.070) (0.262) (0.122)
    100 12 1.424 6.605 6.394 1.423 6.594 2.240
    (0.055) (0.176) (0.129) (0.056) (0.181) (0.084)
    n T RMSEH RI
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 1.817 7.323 6.440 0.916 0.714 0.770
    (0.073) (0.231) (0.181) (0.005) (0.007) (0.008)
    100 12 1.821 7.292 6.419 0.916 0.715 0.771
    (0.050) (0.160) (0.127) (0.004) (0.005) (0.006)

     | Show Table
    DownLoad: CSV
    Table 8.  Prediction results of each method in C4 configuration.
    n T RMSEL RMSEU
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 2.451 6.940 6.723 2.459 6.915 3.015
    (0.140) (0.267) (0.231) (0.120) (0.305) (0.152)
    100 12 2.464 6.899 6.699 2.462 6.883 3.008
    (0.096) (0.210) (0.170) (0.098) (0.222) (0.118)
    n T RMSEH RI
    BCRM BCM BMMM BCRM BCM BMMM
    50 12 1.817 7.323 6.857 0.860 0.706 0.753
    (0.127) (0.264) (0.225) (0.008) (0.009) (0.010)
    100 12 1.821 7.292 6.836 0.860 0.707 0.754
    (0.087) (0.183) (0.161) (0.006) (0.007) (0.007)

     | Show Table
    DownLoad: CSV

    1) When n is the same, the BCRM is better than the BCM and the BMMM. For example, in C1 configuration, when n = 50, the RMSEL, RMSEU, and RMSEH of the BCRM are 0.723, 1.585, and 1.631, respectively, and RI is 0.931. The RMSEL, RMSEU, RMSEH, and RI of the BCM and BMMM methods are 6.531, 6.664, 7.094, 0.714 and 6.296, 2.360, 6.320, 0.768, respectively. The first three data are larger than in the BCRM, while RI is smaller than in the BCRM. This shows that the BCRM has better prediction effect than the BCM and the BMMM.

    2) Also in the case of n = 50, the RMSEL of the BCRM method in C1 configuration is 0.723 and the standard deviation is 0.041, and that of the BCRM method in C2 configuration is 0.738, 0.044, indicating that the smaller the variance of the random error, the higher the accuracy of the prediction.

    3) As the sample size increases, the effectiveness of Bayesian prediction gradually improves. For example, in C2 configuration, the standard deviation of the RMSEL for the BCRM decreases with increasing the sample size, changing from 0.044 to 0.029.

    In general, the prediction effect of the BCRM based on the center and range method is satisfactory in different configurations and different sample sizes.

    This section applies the proposed model to the estimation and prediction of the AQI and compares it with other methods. The concentration of all kinds of pollutants changes with space and time. The panel interval-valued data can be used to describe this variation, and this paper aims to construct panel interval-valued data models for the AQI. Based on the AQI-related data in Beijing, Tianjin, Shijiazhuang, and Chongqing in China, this study selected AQI-related data from 4 representative cities for 40 consecutive days (2023.7.20–2023.8.28). Among them, the data of the first 30 days are used to train the models, and the remaining data are used to test the models.

    For the panel interval-valued data set S=(Xit,yit), i=1,2,,N; t=1,2,,T, yit=[yLit,yUit] is considered to be the observed interval-valued dependent variable, which is the AQI. yLit represents the minimum value of the AQI in the ith city on date t and yUit represents the maximum value of the AQI in the ith city on date t. Xit=(Xit1,Xit2,,Xit6)T are interval-valued independent vectors, which represent the values of CO, NO2, O3, PM10, PM2.5, and SO2, respectively. Xitj=[aitj,bitj], i=1,2,3,4, t=1,2,,40, j=1,2,,6, aitj indicates the minimum value of the jth pollutant in the ith city on date t, bitj indicates the maximum value of the kth pollutant in the ith city on date t.

    First of all, the center data and range data of all dependent variables are made into a Q-Q plot and the results are shown in Figure 2. It is considered that the data is approximately normally distributed.

    Figure 2.  Q-Q plot of the data.

    αi is a random effect and βs=(βs1,βs2,,βs6), where the prior information βs0 is generated from the linear least squares method. Other prior information is consistent with the numerical simulation. We take the mean value of the last 2000 observations as the estimated value of the parameters based on 5000 iterations, and the results are shown in Table 9.

    Table 9.  The results of Bayesian estimation.
    Para. COc NOc2 Oc3 PMc10 PMc2.5 SOc2
    Mean -5.4727 0.1152 0.2858 0.4476 0.2669 -0.5015
    SD 2.6396 0.1042 0.0202 0.0820 0.1254 0.2977
    Para. COr NOr2 Or3 PMr10 PMr2.5 SOr2
    Mean 12.5602 -0.5210 0.2283 0.3567 0.2533 -0.1633
    SD 2.9272 0.1192 0.0260 0.1236 0.1797 0.4666

     | Show Table
    DownLoad: CSV

    It can be seen from the parameters of the central data that NO2, O3, PM10, and PM2.5 have a positive impact on the AQI, that is, the greater the concentration of the pollutants, the greater the AQI and the more severe the air pollution, while CO and SO2 have a negative impact on the AQI. Similarly, in the range data, CO, O3, PM10, and PM2.5 have a positive impact on the AQI, while NO2 and SO2 have a negative impact on the AQI. The first 75% data (the first 30 days) are selected as the training set, and the last 25% data (the last 10 days) as the test set, and the AQI is predicted by the three methods mentioned above. The prediction results are shown in Table 10.

    Table 10.  Prediction results of the AQI.
    Models RMSEL RMSEU RMSEH RI
    BCRM 8.0494 9.6009 11.5262 0.5820
    BCM 12.3272 17.2655 18.7467 0.5109
    BMMM 6.8774 9.9299 10.9141 0.6292

     | Show Table
    DownLoad: CSV

    From the table, we can see that the BCRM and the BMMM are relatively similar, but consistently better than the BCM. Specifically, the results obtained for the BCRM method are as follows. First, the BCRM method has relatively small values in the indicators of RMSEL, RMSEU, and RMSEH, which proves the accuracy of BCRM method estimation. It reached 0.5820 on RI, which is relatively good. In addition, the BMMM method also predicts well and even better than the BCRM method in some indicators, which may be due to the fact that the interval-valued data in the empirical study of this paper are presented in the form of minimum and maximum values. Then, comparison charts of different methods based on real data and forecast data are listed in Figure 3, where a solid line represents the predicted data, a dotted line represents the real data, the horizontal coordinate represents the predicted frequency, and the vertical coordinate represents the predicted value of the AQI. As can be seen from Figure 3, the trend of the forecast data based on the BCRM is generally consistent with the real data and the distance between the real data and forecast data is small, which also explains the accuracy of the estimation of the BCRM method to a certain extent. The distance between real data and forecast data is larger for the BCM method, while the distance between real data and forecast data is also smaller for the BMMM method.

    Figure 3.  Comparison chart of different methods based on real data and forecast data.

    In this paper, a Bayesian model based on the center and range method (BCRM) of random effects panel interval-valued data is proposed and compared with the Bayesian model based on the center method (BCM) of random effects panel interval-valued data and the Bayesian model based on the maximum and minimum method (BMMM) of random effects panel interval-valued data. The results show that: on the one hand, the deviation and standard deviation of parameters estimated by the BCRM model are very small; and on the other hand, in the case of different configurations and different sample sizes, the prediction errors RMSEL, RMSEU, and RMSEH of the BCRM are basically the smallest, while the interval coverage RI is basically the largest, that is, the BCRM has the best prediction effect. The model is applied to the empirical study of AQI prediction, and the results show the effectiveness of the proposed Bayesian model.

    In addition, the model deserves further study. Specific considerations are as follows: Quantile regression has greater flexibility in the distribution of random errors and is therefore robust to non-normal errors and outliers. Therefore, how to combine quantile regression with a Bayesian model of random effect panel interval-valued data is worthy of further study.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was supported by the National Social Science Fund of China (Grant No. 23BTJ069).

    The authors declare there are no conflicts of interest.



    [1] Y. Y. Sun, X. Y. Zhang, A. T. K. Wan, S. Y. Wang, Model averaging for interval-valued data, Eur. J. Oper. Res., 301 (2022), 772–784. https://doi.org/10.1016/j.ejor.2021.11.015 doi: 10.1016/j.ejor.2021.11.015
    [2] L. Billard, E. Diday, Regression Analysis for Interval-Valued Data, Data Analysis, Classification and Related Methods, Berlin, Heidelberg: Springer, (2000), 369–374. https://doi.org/10.1007/978-3-642-59789-3-58
    [3] L. Billard, E. Diday, From the statistics of data to the statistics of knowledge: Symbolic data analysis, J. Am. Stat. Assoc., 98 (2003), 470–487. https://doi.org/10.1198/016214503000242 doi: 10.1198/016214503000242
    [4] L. Billard, E. Diday, Descriptive statistics for interval-valued observations in the presence of rules, Comput. Stat., 21 (2006), 187–210. https://doi.org/10.1007/s00180-006-0259-6 doi: 10.1007/s00180-006-0259-6
    [5] E. D. Lima, F. D. T. de Carvalho, Centre and range method for fitting a linear regression model to symbolic interval data, Comput. Stat. Data Anal., 52 (2008), 1500–1515. https://doi.org/10.1016/j.csda.2007.04.014 doi: 10.1016/j.csda.2007.04.014
    [6] E. D. Lima, F. D. T. de Carvalho, Constrained linear regression models for symbolic interval-valued variablesk, Comput. Stat. Data Anal., 54 (2010), 333–347. https://doi.org/10.1016/j.csda.2009.08.010 doi: 10.1016/j.csda.2009.08.010
    [7] P. Giordani, Lasso-constrained regression analysis for interval-valued data, Adv. Data Anal. Classif., 9 (2015), 5–19. https://doi.org/10.1007/s11634-014-0164-8 doi: 10.1007/s11634-014-0164-8
    [8] A. L. S. Maia, F. D. T. de Carvalho, Holt's exponential smoothing and neural network models for forecasting interval-valued time series, Int. J. Forecasting, 27 (2011), 740–759. https://doi.org/10.1016/j.ijforecast.2010.02.012 doi: 10.1016/j.ijforecast.2010.02.012
    [9] L. C. Souza, R. M. C. R. Souza, G. J. A. Amaral, T. M. Silva, A parametrized approach for linear-regression of interval data, Knowl.-Based Syst., 131 (2017), 149–159. https://doi.org/10.1016/j.knosys.2017.06.012 doi: 10.1016/j.knosys.2017.06.012
    [10] L. T. Kong, X. J. Song, X. M. Wang, Nonparametric regression for interval-valued data based on local linear smoothing approach, Neurocomputing, 501 (2022), 834–843. https://doi.org/10.1016/j.neucom.2022.06.073 doi: 10.1016/j.neucom.2022.06.073
    [11] L. T. Kong, X. W. Gao, A regularized MM estimate for interval-valued regression, Expert Syst. Appl., 238 (2024). https://doi.org/10.1016/j.eswa.2023.122044 doi: 10.1016/j.eswa.2023.122044
    [12] E. Nuroglu, R. M. Kunst, The effects of exchange rate volatility on international trade flows: evidence from panel data analysis and fuzzy approach, in Proceedings of Rijeka Faculty of Economics: Journal of Economics and Business, 30 (2012), 9–31.
    [13] F. He, D. D. Ma, X. N. Xu, Interval environmental efficiency across provinces in China under the constraint of haze with SBM-undesirable interval model, J. Arid Land Res. Environ., 30 (2016), 28–33. https://doi.org/10.13581/j.cnki.rdm.20160906.004 doi: 10.13581/j.cnki.rdm.20160906.004
    [14] S. N. Zhao, R. Q. Liu, Z. F. Shang, Statistical inference on panel data models: A kernel ridge regression method, J. Bus. Econ. Stat., 39 (2019), 325–337. https://doi.org/10.1080/07350015.2019.1660176 doi: 10.1080/07350015.2019.1660176
    [15] E. Aristodemou, Semiparametric identification in panel data discrete response models, J. Econom., 220 (2021), 253–271. https://doi.org/10.1016/j.jeconom.2020.04.002 doi: 10.1016/j.jeconom.2020.04.002
    [16] L. R. Liu, H. R. Moon, F. Schorfheide, Forecasting with a panel Tobit model, Quant. Econom., 14 (2023), 117–159. https://doi.org/10.3982/QE1505 doi: 10.3982/QE1505
    [17] B. H. Beyaztas, S. Bandyopadhyay, Robust estimation for linear panel data models, Stat. Med., 39 (2020), 4421–4438. https://doi.org/10.1002/sim.8732 doi: 10.1002/sim.8732
    [18] H. Liu, Y. Q. Pei, Q. F. Xu, Estimation for varying coefficient panel data model with cross-sectional dependence, Metrika, 83 (2020), 377–410. https://doi.org/10.1007/s00184-019-00739-0 doi: 10.1007/s00184-019-00739-0
    [19] S. Y. Ke, P. C. B. Phillips, L. J. Su, Robust inference of panel data models with interactive fixed effects under long memory: A frequency domain approach, J. Econom., 241 (2024). https://doi.org/10.1016/j.jeconom.2024.105761 doi: 10.1016/j.jeconom.2024.105761
    [20] A. B. Ji, J. J. Zhang, X. He, Y. H. Zhang, Fixed effects panel interval-valued data models and applications, Knowl.-Based Syst., 237 (2022). https://doi.org/10.1016/j.knosys.2021.107798 doi: 10.1016/j.knosys.2021.107798
    [21] T. Park, G. Casella. The bayesian lasso, J. Am. Stat. Assoc., 103 (2008), 681–686. https://doi.org/10.1198/016214508000000337 doi: 10.1198/016214508000000337
    [22] D. K. Xu, Z. Z. Zhang, A semiparametric Bayesian approach to joint mean and variance model, Stat. Probab. Lett., 83 (2013), 1624–1631. https://doi.org/10.1016/j.spl.2013.02.023 doi: 10.1016/j.spl.2013.02.023
    [23] I. Castillo, J. Schmidt-Hieber, A. V. der Vaart, Bayesian linear regression with sparse priors, Ann. Stat., 43 (2015), 1986–2018. https://doi.org/10.1214/15-AOS1334 doi: 10.1214/15-AOS1334
    [24] M. Pfarrhofer, P. Piribauer, Flexible shrinkage in high-dimensional Bayesian spatial autoregressive models, Spatial Stat., 29 (2019), 109–128. https://doi.org/10.1016/j.spasta.2018.10.004 doi: 10.1016/j.spasta.2018.10.004
    [25] Z. Q. Wang, N. S. Tang, Bayesian quantile regression with mixed discrete and Nonignorable missing covariates, Bayesian Anal., 15 (2020), 579–604. https://doi.org/10.1214/19-BA1165 doi: 10.1214/19-BA1165
    [26] D. Zhang, L. C. Wu, K. Y. Ye, M. Wang, Bayesian quantile semiparametric mixed-effects double regression models, Stat. Theory Relat. Fields, 5 (2021), 303–315. https://doi.org/10.1080/24754269.2021.1877961 doi: 10.1080/24754269.2021.1877961
    [27] L. Z. Tang, Y. J. Li, L. J. Zhao, Study on the Bayesian Elastic Net quantile regression for panel data: methods and applications, Stat. Res., 37 (2020), 94–113. https://doi.org/10.19343/j.cnki.11-1302/c.2020.03.008 doi: 10.19343/j.cnki.11-1302/c.2020.03.008
    [28] C. Q. Tao, Y. T. Xu, Study on Bayesian adaptive lasso quantile regression using asymmetric exponential power distribution for panel data, Stat. Res., 39 (2022), 128–144. https://doi.org/10.19343/j.cnki.11-1302/c.2022.09.010 doi: 10.19343/j.cnki.11-1302/c.2022.09.010
    [29] J. Zhang, M. Liu, M. Dong, Variational Bayesian inference for interval regression with an asymmetric Laplace distribution, Neurocomputing, 323 (2019), 214–230. https://doi.org/10.1016/j.neucom.2018.09.083 doi: 10.1016/j.neucom.2018.09.083
    [30] M. Xu, Z. F. Qin, A bivariate Bayesian method for interval-valued regression models, Knowl.-Based Syst., 235 (2022). https://doi.org/10.1016/j.knosys.2021.107396 doi: 10.1016/j.knosys.2021.107396
    [31] J. H. Ding, Z. Q. Zhang, Bayesian Statistical Models with Uncertainty Variables, J. Intell. Fuzzy Syst., 39 (2020), 1109–1117. https://doi.org/10.3233/JIFS-192014 doi: 10.3233/JIFS-192014
    [32] F. D. T. de Carvalho, R. M. C. R. de Souza, M. Chavent, Y. Lechevallier, Adaptive Hausdorff distances and dynamic clustering of symbolic interval data, Pattern Recognit. Lett., 27 (2006), 167–179. https://doi.org/10.1016/j.patrec.2005.08.014 doi: 10.1016/j.patrec.2005.08.014
    [33] C. Y. Hu, L. T. He, An application of interval methods to stock market forecasting, Reliab. Comput., 13 (2007), 423–434. https://doi.org/10.1007/s11155-007-9039-4 doi: 10.1007/s11155-007-9039-4
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(202) PDF downloads(23) Cited by(0)

Figures and Tables

Figures(3)  /  Tables(10)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog