Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Time delay estimation of traffic congestion propagation due to accidents based on statistical causality


  • The accurate estimation of time delays is crucial in traffic congestion analysis, as this information can be used to address fundamental questions regarding the origin and propagation of traffic congestion. However, the exact measurement of time delays during congestion remains a challenge owing to the complex propagation process between roads and high uncertainty regarding future behavior. To overcome this challenge, we propose a novel time delay estimation method for the propagation of traffic congestion due to accidents using lag-specific transfer entropy (TE). The proposed method adopts Markov bootstrap techniques to quantify uncertainty in the time delay estimator. To the best of our knowledge, our proposed method is the first to estimate time delays based on causal relationships between adjacent roads. We validated the method's efficacy using simulated data, as well as real user trajectory data obtained from a major GPS navigation system in South Korea.

    Citation: YongKyung Oh, JiIn Kwak, Sungil Kim. Time delay estimation of traffic congestion propagation due to accidents based on statistical causality[J]. Electronic Research Archive, 2023, 31(2): 691-707. doi: 10.3934/era.2023034

    Related Papers:

    [1] Jongho Kim, Woosuk Kim, Eunjeong Ko, Yong-Shin Kang, Hyungjoo Kim . Estimation of spatiotemporal travel speed based on probe vehicles in mixed traffic flow. Electronic Research Archive, 2024, 32(1): 317-331. doi: 10.3934/era.2024015
    [2] Jiping Xing, Yunchi Wu, Di Huang, Xin Liu . Transfer learning for robust urban network-wide traffic volume estimation with uncertain detector deployment scheme. Electronic Research Archive, 2023, 31(1): 207-228. doi: 10.3934/era.2023011
    [3] Gang Cheng, Yadong Liu . Hybrid short-term traffic flow prediction based on the effect of non-linear sequence noise. Electronic Research Archive, 2024, 32(2): 707-732. doi: 10.3934/era.2024034
    [4] Ping Wang, Changgui Gu, Huijiu Yang, Haiying Wang . Identify the characteristic in the evolution of the causality between the gold and dollar. Electronic Research Archive, 2022, 30(10): 3660-3678. doi: 10.3934/era.2022187
    [5] Jie Ren, Shiru Qu, Lili Wang, Yu Wang, Tingting Lu, Lijing Ma . Research on en route capacity evaluation model based on aircraft trajectory data. Electronic Research Archive, 2023, 31(3): 1673-1690. doi: 10.3934/era.2023087
    [6] Jian Wan, Peiyun Yang, Wenbo Zhang, Yaxing Cheng, Runlin Cai, Zhiyuan Liu . A taxi detour trajectory detection model based on iBAT and DTW algorithm. Electronic Research Archive, 2022, 30(12): 4507-4529. doi: 10.3934/era.2022229
    [7] Jiangtao Zhai, Zihao Wang, Kun Duan, Tao Wang . A novel method for mobile application recognition in encrypted channels. Electronic Research Archive, 2024, 32(1): 193-223. doi: 10.3934/era.2024010
    [8] Nuri Park, Junhan Cho, Juneyoung Park . Assessing crash severity of urban roads with data mining techniques using big data from in-vehicle dashcam. Electronic Research Archive, 2024, 32(1): 584-607. doi: 10.3934/era.2024029
    [9] Seongmin Park, Juneyoung Park, Youngkwon Yoon, Jinhee Kim, Jaehyun So . Operation standards for exclusive bus lane on expressway using simulation and traffic big data. Electronic Research Archive, 2024, 32(4): 2323-2341. doi: 10.3934/era.2024106
    [10] Shaohu Zhang, Jianxiao Ma, Boshuo Geng, Hanbin Wang . Traffic flow prediction with a multi-dimensional feature input: A new method based on attention mechanisms. Electronic Research Archive, 2024, 32(2): 979-1002. doi: 10.3934/era.2024048
  • The accurate estimation of time delays is crucial in traffic congestion analysis, as this information can be used to address fundamental questions regarding the origin and propagation of traffic congestion. However, the exact measurement of time delays during congestion remains a challenge owing to the complex propagation process between roads and high uncertainty regarding future behavior. To overcome this challenge, we propose a novel time delay estimation method for the propagation of traffic congestion due to accidents using lag-specific transfer entropy (TE). The proposed method adopts Markov bootstrap techniques to quantify uncertainty in the time delay estimator. To the best of our knowledge, our proposed method is the first to estimate time delays based on causal relationships between adjacent roads. We validated the method's efficacy using simulated data, as well as real user trajectory data obtained from a major GPS navigation system in South Korea.



    Traffic congestion represents a universal problem for urban life owing to the dramatic growth in vehicle use, expansion of the economy and infrastructure, and proliferation of delivery services, among other factors. Traffic congestion frequently spreads into adjacent roads [1], resulting in greater damage to the overall traffic network.

    Consequently, the accurate estimation of time delays has become crucial in addressing fundamental questions regarding the origin points and propagation of traffic congestion. Figure 1 from [2] illustrates this study's objective by showing how the impact of a traffic accident propagates to incoming roads. The black solid line represents the average speed on the road segment where the accident occurs, while the orange, green, and blue solid lines represent the average speeds on the three adjacent incoming roads. The average speed on the road segment was computed from GPS trajectory data provided by the NAVER Corporation using a map-matching process [3]. We can observe a time lag when the impact of an accident propagates to incoming roads. The time delay increases in the order of blue, green, and orange.

    Figure 1.  Motivating example: traffic congestion propagation [2].

    However, certain aspects of traffic congestion propagation pose statistical challenges to the accuracy of time delay estimation. First, the average speed on an incoming road is affected by various geographic and topological characteristics, such as road length and width, as well as road network topology. The impact of a traffic accident is distributed among all incoming roads in a complex pattern according to these characteristics. Furthermore, the duration of congestion is dynamic. As shown in the figure, the road denoted in blue exhibits a longer congestion duration than the other incoming roads. That is, the time delay in traffic congestion propagation does not simply denote a temporal pattern shift, but involves complicated temporal dynamics. Finally, data uncertainty is inherent in average road speeds. Trajectory-based average road speeds may be highly volatile depending on the availability of user data over a specific period.

    To overcome the challenges outlined above, we propose a novel time delay estimation method for traffic congestion propagation between roads using lag-specific transfer entropy (TE). Our main contributions are as follows:

    ● We provide a model-free approach to estimate congestion propagation delays using a lag-specific TE estimator in complex urban road systems.

    ● We quantify uncertainty in time delay estimation using bootstrap techniques. This uncertainty quantification is employed to evaluate the reliability of time delay estimates and serves as a basis for hyperparameter optimization.

    ● We show that decomposition and nonlinear normalization with a sliding window are effective time series preprocessing methods for revealing causal relationships between traffic speed data.

    ● We validate the proposed method through numerical simulations and real user trajectory data obtained from a major GPS navigation system in South Korea.

    The remainder of this paper is organized as follows: Section 2 provides an overview of related studies. Section 3 presents crucial background information pertaining to the proposed method. Section 4 outlines our proposed time delay estimation method. Sections 5 and 6 validate the proposed method using simulated and real congestion propagation data, respectively. Finally, concluding remarks are presented in Section 7.

    Various topics have been studied regarding the estimation of time delays due to traffic accidents, including travel time delay [4], incident duration [5,6], real-time crash identification [7], and incident impact quantification [2,8]. However, unlike the aforementioned studies, we focused on the propagation delay of traffic congestion caused by accidents in a road network.

    Although our proposed method is the first to estimate time delays for congestion propagation in road traffic networks, time delay estimation (TDE) is not a new problem. In digital signal processing, TDE refers to the task of ascertaining the differences in arrival times between signals received at the sensor array. The most widespread approach for TDE is cross-correlation [9,10]. Supposing that signals are received from two sensors, the delay between the sensors can be estimated using the time lag that maximizes the cross-correlation between filtered versions of the received signal.

    Limited attempts have been made to perform cross-correlation analyses with traffic speed data. Conventional TDE methods based on cross-correlation have been applied to a real road vehicle pass-by measurement to enable traffic monitoring using passive acoustic sensors [11]. Similarly, [12] conducted a cross-correlation analysis to prove the existence of a significant relationship between the current value of speed at a specific station, as well as past speed values at upstream and downstream stations in a freeway traffic network. We refer to this method as time-lagged cross-correlation (TLCC). This approach is not applicable in the presence of non-stationarity.

    To quantify the TLCC level between two nonstationary time series at different scales, [13] proposed a time-lagged detrended cross-correlation analysis approach. This method, referred to as DCCA, divides an entire time series into overlapping boxes to handle nonstationarity [14].

    The method proposed in the present study was compared with TLCC and DCCA for evaluation purposes.

    Various traffic causal analysis methods have been developed to identify causal relationships among congested roads and detect congestion propagation patterns. The authors of [15] forecasted future traffic flow by ranking input variables to identify a subset of the Bayesian network as the set of cause nodes using the Pearson correlation coefficient. A two-step mining architecture has been proposed to capture the origin points of road traffic congestion [16]. TE was used to reveal the delay propagation network among multiple airports with a time series of airport delays [17]. However, the aforementioned approaches primarily focus on revealing causal relationships and do not provide information regarding estimated time delays during congestion propagation.

    Suppose that {Xt}t1 is a stationary Markov chain with a finite state space S={s1,,sn}, where nN. Let P=(pij)Rn×n be the transition probability matrix of the chain and the stationary distribution by π=(π1,,πn). Thus, for any 1i,jn, pij=P(Xt+1=sj|Xt=si) and πi=P(Xt=si). Given a time series {X1,,XL} of size L from a stationary Markov chain, πi and pij can be estimated as

    ˆπi=1LLt=11(Xt=si),ˆpij=1ˆπiLLt=11(Xt=si,Xt+1=sj). (3.1)

    The bootstrap observations {X1,,XL} can now be generated using the estimated transition matrix and marginal distribution in Eq (3.1) [18].

    1) Generate a random variable X1 from the discrete distribution on {1,,n} that assigns mass ˆπi to si, 1in.

    2) Generate a random variable Xt+1 from the discrete distribution on {1,,n} that assigns mass ˆpij to j, 1jn, where si is the value of Xt.

    3) Repeat Step (2) until a simulated time series {X1,,XL} has been obtained.

    TE is a measurement of directed information flow [19] based on the concept of Shannon entropy [20] in the field of information theory. For a discrete random variable I with probability distribution p(i), the Shannon entropy represents the average number of bits required to optimally encode independent draws, calculated as follows:

    H(I)=ip(i)log2p(i). (3.2)

    Equation (3.2) can be easily extended to the concept of conditional entropy using two discrete random variables I and J:

    H(I|J)=p(i,j)log2p(i|j) (3.3)

    This equation can be used to measure information flow between two discrete random variables.

    Consider two discrete random variables, I and J, with marginal probability distributions p(i) and p(j), and joint probability p(i,j). Suppose both variables represent stationary Markov processes of orders k and l, respectively. For the order k Markov process I, Eq (3.2) can be extended to

    H(k)(I)=ip(it,i(k)t1)logp(it|i(k)t1),

    where i(k)t1=(it1,,itk). Analogously, the information flow from J to I is measured by quantifying the deviation from the generalized Markov property p(it|i(k)t1)=p(it|i(k)t1,j(l)tu) for an arbitrary source–target lag u, as follows:

    T(k,l)JI(t,u)=p(it,i(k)t1,j(l)tu)logp(it|i(k)t1,j(l)tu)p(it|i(k)t1). (3.4)

    Equation (3.4) preserves the computational interpretation of TE as an information transfer, which is the only relevant option in keeping with Wiener's principle of causality [21]. The transfer entropy is known to be biased in small samples [22]. To correct any bias, [22] proposed the effective transfer entropy (ETE):

    ETE(k,l)JI(t,u)=T(k,l)JI(t,u)T(k,l)JshuffledI(t,u). (3.5)

    where T(k,l)JshuffledI indicates the transfer entropy using a shuffled version of time series J. The shuffling process randomly draws values from the original time series and realigns them to generate a new time series. Thus, shuffling eliminates any time series dependencies of J, as well as statistical dependencies between J and I. Note that T(k,l)JshuffledI converges to zero as the sample size increases and any nonzero value of T(k,l)JshuffledI(t,u) is a result of the small sample effect. To ensure estimation consistency, shuffling is repeated, and the average of the shuffled transfer entropy estimates across all iterations serves as an estimator for the small sample bias, which is subsequently subtracted from the Shannon or Rényi transfer entropy estimate to correct any bias.

    Suppose that congestion information is transferred from a source road to a destination road with a time delay of u. The objective of time delay estimation is to estimate u given previously observed traffic speed data on the two roads denoted as {Xt}Lt=1 and {Yt}Lt=1, respectively. Therefore, the time delay estimation task entails formulating a function f() that computes the source–target lag u, [{Xt}Lt=1,{Yt}Lt=1]f()u.

    The proposed time delay estimation algorithm comprises three steps. First, bootstrapping for each time series is performed using preprocessing methods. Next, the transfer entropy computation provides the estimated time delay lag u. Finally, the distribution of time delay lag is estimated to determine the existence of a statistical causal relationship.

    Consider a time series of congested traffic speed data, {Xt}Lt=1, which has the properties of scale dependence, nonlinearity, and nonstationarity. To effectively identify the causal relationship within such a time series, appropriate preprocessing methods are essential.

    To this end, we first decompose the time series into a trend and its residual, as follows:

    t,Xt=Tt+Rt=1mm1j=0Xtj+Rt, (4.1)

    where Tt and Rt are the trend and residual components, respectively. The trend component is a moving average of order m, representing the mean forefront value at time t. The purpose of the trend component is to smooth the time series for estimating the underlying trend. After extracting the underlying trend from {Xt}Lt=1, the residual time series {Rt}Lt=1 is assumed to be a stationary Markov process. The assumption of Markovian property in traffic speed is not a novel notion. Many prior traffic speed prediction and modeling studies have been conducted under this assumption [23,24,25,26,27]. Based on {Rt}Lt=1, we can generate the bootstrap residuals {R(b)t}Lt=1 as explained in Section 3.1. Subsequently, we can easily obtain a bootstrap time series {X(b)t}Lt=1 by Tt+R(b)t for t=1,,L.

    Nonlinear normalization with a sliding window is then applied to the obtained time series to address the scale-dependency, nonlinearity, and nonstationarity of traffic speed data. To ensure the data are scale-independent and close to linear [29], the standard normal cumulative distribution function Φ is applied. A sliding window technique has similarly been employed to handle a nonstationary time series in [28]. Let Xt,w={X(b)k}tk=tw+1 be the forefront sequence of X(b)t with a sliding window size of w, and F25,t, F50,t and F75,t be the 25, 50 and 75th percentiles of Xt,w, respectively. Note that these percentiles depend on the location of the sliding window. Then, a normalized time series {˜X(b)t}Lt=1 can be obtained by

    ˜X(b)t=Φ(0.5×X(b)tF50,tF75,tF25,t). (4.2)

    To verify the effectiveness of the nonlinear normalization method expressed in Eq (4.2), we compared its performance with that of existing normalization methods [28,29], including the min-max method (˜X(b)t=X(b)tmaxXt,w) and the z-score method (˜X(b)t=X(b)tμ(Xt,w)σ(Xt,w)) with and without a sliding window technique (see Section 5).

    Using Eq (3.5), the time lag in a causal relationship JI can be estimated by solving the following optimization problem:

    ˆu=argmaxuNETE(k,l)JI(t,u). (4.3)

    In this study, we assume k==1.

    To compute the lag-specific ETE in Eq (4.3), we discretize continuous data using symbolic encoding. This discretization can be performed by partitioning the data into a finite number of bins. We denote the bounds specified for the n bins by q1,q2,,qn1, where q1<q2<<qn1. For the normalized time series in Eq (4.2), we obtain the encoded time series {J(b)t}Lt=1 by the following equation:

    J(b)t={1for ˜X(b)tq12for q1<˜X(b)t<q2nfor ˜X(b)tqn1. (4.4)

    The choice of bins depends on the distribution of data. In the case where tail observations are of particular interest, binning is typically based on empirical quantiles, such that the left and right tail observations are allocated into separate bins. In this study, we implemented symbolic encoding with n=3 based on 5% and 95% empirical quantiles, thereby emphasizing speed extremes caused by dynamic speed changes and traffic accidents.

    Consequently, we obtain {J(b)t}Lt=1 from {˜X(b)t}Lt=1, and {I(b)t}Lt=1 from {˜Y(b)t}Lt=1, respectively, for b=1,,B. Given {J(b)t}Lt=1 and {I(b)t}Lt=1, b=1,,B, we obtain bootstrap observations of the time lag, u(1),,u(B) using Eq. (4.3).

    Suppose those bootstrap observations of the time lag follow a distribution G,

    u(1),,u(B)G,

    which is unknown in practice. Let μ and σ2 denote the mean and variance of G, respectively, which can be estimated by

    ˆμB=1BBb=1u(b),ˆσ2B=1BBb=1(u(b))2ˆμ2B.

    Proposition 1 implies that 1) the bootstrap estimate ˆμB is an unbiased estimate of μ, and 2) 1Bˆσ2B quantifies the uncertainty of ˆμB. That is, we can evaluate the uncertainty of the bootstrap estimate ˆμB using 1Bˆσ2B, which is practically useful because μ is unknown. This approach can be applied to hyperparameter tuning. In this study, we used a grid search to determine a set of hyperparameters (length of time series (L) and sliding window size (w)) that minimizes 1Bˆσ2B.

    Proposition 1. Let u(1),,u(B) be a bootstrap sample, and E(u(b))=μ, Var(u(b))=σ2. Then, the sample mean ˆμB=1BBb=1u(b) approximately follows N(μ,1Bˆσ2B), where ˆσ2B is the sample variance of the bootstrap sample.

    Proof. As ˆσ2Bσ2 in probability, B(ˆμBμ)ˆσB=σˆσBB(ˆμBμ)σdN(0,1) by the central limit theorem and Slutsky's theorem [30]. Thus, ˆμB approximately follows a normal distribution, ˆμBN(μ,1Bˆσ2B).

    To determine whether the bootstrap estimate ˆμB is reliable, we compare ˆσ2B with a predetermined threshold σ2T. That is, if ˆσ2B>σ2T, we can conclude that ˆμB is not reliable, and congestion is therefore not propagated on the corresponding road segment.

    To determine an appropriate value of σ2T, we employ the concept of the tolerance interval (TI). The TI is a statistical interval in which a specified proportion γ of a population will fall with a certain level of confidence (1α). By definition, a (γ,1α)-TI of ˆμB,

    TI=[μkγ,α1Bˆσ2B,μ+kγ,α1Bˆσ2B],

    satisfies

    P[P(ˆμBTI)γ]=1α,

    where kγ,α is the tolerance factor [31]. Then, σ2T can be determined by kγ,α1Bσ2T=1 (min) to yield a ±1 minute TI. With B=100, γ=0.9, and α=0.01, the present study uses σ2T=100k20.9,0.01=5.052.

    Consider the propagation of traffic congestion caused by accidents. Here, we define the propagation path by a sequence of incoming roads in the direction opposite to the traffic flow, where kth element in the propagation path is denoted as Hop(k-1). Let ˆμB,k1 and ˆσ2B,k1 denote the bootstrap estimate and sample variance at Hop(k-1), respectively. Hop0 corresponds to the road where the accident occurred. We consider Hop(k-1) to be statistically significant if ˆμB,k1 and ˆσ2B,k1 satisfy the following conditions: 1) ˆσ2B,k1<σ2T and 2) ˆμB,k2<ˆμB,k1. The second condition states that the time delay between Hop0 and Hop(k-1) must exceed that between Hop0 and Hop(k-2).

    The proposed method was validated using simulated data. Two-time series – {Xt}120t=1 and {Yt}120t=1 – were generated by

    Xt={100+ϵx,tfor t<100.95Xt1+ϵx,tfor 10t<951.10Xt1+ϵx,tfor t120,Yt={70+ϵy,tfor t<100.5Xtu0+20+ϵy,tfor t10,

    where ϵx,tN(0,2) and ϵy,tN(0,2). A predetermined source–target lag (u0) exists such that a significant information flow from X to Y is formed, but not vice versa. Figure 2(a) depicts two time series with u0=10, where the black solid and red dashed lines represent {Xt}120t=1 and {Yt}120t=1, respectively. This simulation represents a typical congestion propagation scenario between two adjacent roads RX and RY, assuming that there was a traffic accident on RX at t=10 and that congestion was resolved at t=95. With the time shift u0=10, the congestion on RX propagates to RY.

    Figure 2.  Results of simulation study.

    The proposed method was applied to simulated data with m=2 and w=20, as described in Section 4. Figure 2(b) compares the distributions of bootstrap observations obtained from two-time series without normalization, to those obtained from two time series with nonlinear normalization. The red and blue lines in the figure denote the values of (ˆμB,ˆσ2B) in a distribution form, that is, (9.54,3.942) and (10.30,1.352), respectively. Here, a normal distribution is used for visualization purposes. We confirmed that nonlinear normalization with a sliding window improved the accuracy of the time delay estimation, as 1.352<3.942.

    For comparison purposes, the TLCC and DCCA methods were also applied to the simulated data. The DCCA method requires a hyperparameter n, which indicates the size of the overlapping box [14]. Here, we used multiple values of n (10,20,30,40) to obtain the results of time delay estimation, as shown in Figure 2(c) for the DCCA method. Furthermore, Table 1 summarizes the results of conventional TDE methods. These results show that both the TLCC and DCCA methods generally yield reasonable time delay estimates for the simulated data. In particular, it is recommended to set n to be greater than 20.

    Table 1.  Simulation results with comparison methods.
    TLCC DCCA(10) DCCA(20) DCCA(30) DCCA(40)
    ˆu 11.00 22.00 10.00 10.00 10.00

     | Show Table
    DownLoad: CSV

    To investigate the proposed method's performance, we conducted simulation experiments under various settings of (1) decomposition, (2) normalization and (3) length of the sliding window. Performance was evaluated using ˆμB, ˆσ2B and MAE, where MAE=1BBi=1|ˆuiu0|. Values closer to u0 indicate a more accurate estimate ˆμB. Likewise, smaller values of ˆσ2B and MAE indicate better results. As summarized in Table 2, the decomposition technique improved the overall performance, and nonlinear normalization with w=20 generally performed better than all other normalization methods in terms of ˆσB and MAE. This implies that nonlinear normalization with w=20 produced the most precise and accurate estimates among the tested schemes.

    Table 2.  Simulation results comparison (u0=10) with B=100.
    Decomposition Normalization Metrics Window length Average
    10 20 30 40 120 (all)
    false none ˆμB 11.23 11.23
    ˆσ2B 7.03 7.03
    MAE 6.13 6.13
    min-max ˆμB 12.93 12.64 13.14 12.71 14.89 13.26
    ˆσ2B 7.21 7.49 7.27 7.39 6.72 7.21
    MAE 6.73 6.94 6.92 6.82 7.20 6.92
    z-score ˆμB 13.06 13.51 12.97 13.29 14.84 13.53
    ˆσ2B 6.98 6.97 7.00 7.15 6.73 6.97
    MAE 6.49 6.67 6.52 6.75 7.23 6.73
    nonlinear ˆμB 13.29 12.86 12.28 13.17 14.27 13.17
    ˆσ2B 7.12 7.24 7.13 7.07 6.89 7.09
    MAE 6.77 6.66 6.36 6.63 7.03 6.69
    true none ˆμB 9.54 9.54
    ˆσ2B 3.94 3.94
    MAE 2.45 2.45
    min-max ˆμB 14.12 10.57 8.88 10.28 16.97 12.16
    ˆσ2B 4.24 2.81 2.51 3.53 4.94 3.61
    MAE 4.75 1.88 1.95 2.26 7.38 3.64
    z-score ˆμB {10.09 10.28 10.63 11.97 16.83 11.96
    ˆσ2B 2.87 3.58 3.90 4.19 5.90 4.09
    MAE 1.58 2.26 2.55 2.75 7.96 3.42
    nonlinear ˆμB 10.78 10.30 10.78 10.99 13.38 11.25
    ˆσ2B 2.98 1.35 1.49 2.04 6.72 2.91
    MAE 1.53 0.94 1.25 1.72 6.04 2.30

     | Show Table
    DownLoad: CSV

    Two types of datasets were considered from various sources: A traffic dataset provided by the NAVER corporation navigation team, and an accident dataset provided by the Korean National Police Agency. The traffic dataset encompasses trajectory-based speed and traffic road networks of the major metropolitan area of Seoul, where nearly half of the country's population resides. The speed data are described by GPS trajectories. A GPS trajectory consists of a series of points with latitude, longitude, and timestamp information generated during travel. To align a sequence of observed user positions within the road network, we used a map-matching process [3]. Each accident record includes the reported time, information source, category, incident description, and point of origin described by both geographical coordinates and road segment ID, as described in Table 3.

    Table 3.  An example of a real accident record.
    Event ID 3786580
    Created datetime 2021-01-30 19:00
    Information source Korean National Police Agency
    Category Accident
    Description Traffic accident on the first lane from Guro IC on Nambu Beltway to Anyang Bridge

     | Show Table
    DownLoad: CSV

    The proposed method was validated on 3197 real traffic accidents that occurred between September 2020 and February 2021 in Seoul. Figure 3 presents the accident locations, as denoted by red stars, where the blue lines indicate significant propagation paths. Let ˆμB,k1 and ˆσ2B,k1 denote the bootstrap estimate and sample variance at Hop(k-1), respectively, for k=2,3,4. In this study, we investigated up to k=4. k=1 was excluded because Hop0 refers to the road where the accident occurred.

    Figure 3.  Locations of traffic accidents and their significant propagation paths in the city of Seoul from September 2020 to February 2021 (red dot: location of the accident, blue line: significant propagation path).

    Table 4 summarizes the time delay estimation results for all 3197 traffic accidents. To ensure consistency within results, the hyperparameter (w,L)=(60,180) was used for all accidents based on the grid search. There are 5036 roads at Hop1 associated with accidents, approximately 69.16% of which were revealed to be significant, with an average time delay of 8.95 minutes. For Hop2 and Hop3, 65.33% and 63.15% of the roads were revealed to be significant with average time delays of 11.10 and 11.97 minutes, respectively.

    Table 4.  Summary of time delay estimation results for 3197 traffic accidents.
    Number of roads Significant roads Significance ratio Average time delay (min)
    Hop1 5, 036 3, 483 69.16 8.95
    Hop2 6, 856 4, 479 65.33 11.10
    Hop3 9, 721 6, 139 63.15 11.97

     | Show Table
    DownLoad: CSV

    We selected two representative cases among the traffic data to detail how the proposed method identifies causal relationships and estimates time lag. Case 1 represents a simple road network with few propagation paths, whereas Case 2 represents a complex road network with many propagation paths. For comparison purposes, the TLCC and DCCA methods were also applied with equivalent settings to those used in the simulation study.

    The accident occurred on September 8, 2020, at 06:44 AM. The blue star in Figure 4 denotes the exact location of the accident. Case 1 has one propagation path [A,B,C,D]. The black, red, blue, and green line segments in the figures indicate Hop0, Hop1, Hop2, and Hop3, respectively. Time delays were estimated using average speed data recorded at one-minute intervals from the previous hour to the subsequent two hours based on the time when the accident was reported.

    Figure 4.  Traffic accident for Case 1.

    Figure 5(a), (b) show the results of time delay estimation for the propagation path [A,B,C,D]. It is apparent from the significantly smaller values of ˆσ2B that the time series with nonlinear normalization produced a more consistent estimate of time delays that increased with each hop. We, therefore, conclude that the congestion effect of the accident propagated along the path to Hop1, Hop2 and Hop3 at 3.60, 7.30, and 19.97 min after the accident, respectively.

    Figure 5.  Time delay estimation for Case 1.

    Unlike the proposed method, both TLLC and DCCA failed to identify the congestion propagation effect of the accident. Furthermore, the DCCA method produced inconsistent time delay estimates for varying values of n. Note that both the TLLC and DCCA methods do not provide uncertainty quantification of the time delay estimates.

    The accident occurred on September 4, 2020, at 10:16 PM, and affected five propagation paths: [A,B,C,D], [A,E,F,G], [A,H,I,J], [A,H,K,M] and [A,H,K,L]. Figure 6 depicts the exact location of the accident, along with the five propagation paths. For each path, the previous hour and the subsequent two hours were considered for time delay estimation. Figure 7 and Table 6 present the results of time delay estimation. In Path 1, no specific causal relationship could be found, as shown in Figure 7(a). This finding is supported by the corresponding high values of ˆσ2B (>σ2T=5.052) in Table 6. Similarly, we can conclude that the congestion effect of the accident on road A propagated along the second hop of Paths 2 and 3, and the third hop of Paths 4 and 5. Moreover, the values of ˆσ2B in Table 6 indicate that the congestion effect propagated along Path 3 up to Hop2, as depicted in Figure 7(b), and along Paths 4 and 5 up to Hop3, as depicted in Figure 7(c). For Paths 4 and 5, the time delay estimates are (8.23, 15.65, 22.06) and (8.22, 15.55, 20.75), respectively.

    Table 5.  Results of time delay estimation for Case 1.
    Hop1 Hop2 Hop3
    ˆμB ˆσ2B ˆμB ˆσ2B ˆμB ˆσ2B
    TLCC 7.00 - 7.00 - 0.00 -
    DCCA(10) 0.00 - 19.00 - 0.00 -
    DCCA(20) 0.00 - 24.00 - 17.00 -
    DCCA(30) 0.00 - 10.00 - 16.00 -
    DCCA(40) 0.00 - 2.00 - 8.00 -
    without normalization 9.62 83.70 8.07 61.31 12.08 34.83
    nonlinear normalization 3.60 2.88 7.30 13.83 19.97 8.93

     | Show Table
    DownLoad: CSV
    Table 6.  Results of time delay estimation for Case 2.
    Propagation Path Methods Hop1 Hop2 Hop3
    ˆμB ˆσ2B ˆμB ˆσ2B ˆμB ˆσ2B
    Path 1 [A,B,C,D] TLCC 24.00 - 24.00 - 23.00 -
    DCCA(30) 8.00 - 6.00 - 6.00 -
    Proposed 16.03 47.09 11.59 73.56 15.17 30.46
    Path 2 [A,E,F,G] TLCC 22.00 - 12.00 - 4.00 -
    DCCA(30) 2.00 - 5.00 - 5.00 -
    Proposed 4.15 3.72 10.49 6.97 8.64 16.32
    Path 3 [A,H,I,J] TLCC 24.00 - 24.00 - 1.00 -
    DCCA(30) 8.00 - 12.00 - 14.00 -
    Proposed 8.21 7.97 12.13 7.03 15.72 57.16
    Path 4 [A,H,K,M] TLCC 24.00 - 24.00 - 24.00 -
    DCCA(30) 8.00 - 2.00 - 19.00 -
    Proposed 8.23 7.62 15.65 9.01 22.06 3.24
    Path 5 [A,H,K,L] TLCC 24.00 - 24.00 - 24.00 -
    DCCA(30) 8.00 - 2.00 - 2.00 -
    Proposed 8.22 7.67 15.55 11.01 20.75 3.17

     | Show Table
    DownLoad: CSV
    Figure 6.  Traffic accident for Case 2.
    Figure 7.  Time delay estimation for Case 2.

    As in Case 1, both the TLCC and DCCA methods failed to estimate consistent time delays. The DCCA method with 30 overlapping boxes (n=30) obtained comparable results with the proposed method only for Path 3, as seen in Table 6. From the results of both cases, it can be concluded that the proposed method identifies causal relationships and estimates the time lag more accurately than the conventional TDE methods.

    Traffic congestion spreads its effects to the inflow roads, creating a causal relationship between the accident site and adjacent roads. To identify the said relationship, we developed a new method for estimating differences in congestion time. The proposed method utilizes a lag-specific TE estimator with decomposition and normalization techniques. Furthermore, we conducted extensive performance comparisons under varying experimental settings and found that the proposed decomposition and nonlinear normalization methods yield substantial performance improvements. We also confirmed that the proposed method produces more stable and robust results than the conventional TDE methods. Thus, the proposed time delay estimation method helps quantitatively understand the propagation of traffic congestion.

    Moreover, the bootstrap technique and its density estimation of statistical functionals enable the uncertainty quantification of time delay estimates. This uncertainty quantification allows us to evaluate the reliability of time delay estimates and serves as a basis for optimal hyperparameter tuning. Specifically, ˆσ2B serves as a key indicator of a causal relationship between two-time series. We developed a rigorous and practical guidance for decision-making based on the tolerance interval.

    In this study, we only considered the method to obtain accurate time delay estimates using historical traffic data. Eventually, the proposed method will be used to predict time delays in GPS navigation systems, thereby providing users with more accurate estimated arrival times using real-time traffic data. Using our proposed method as a foundation, real-time delay prediction methods can be developed in future work.

    This work was partly supported by NAVER Corp. and the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MSIT) (NRF-2021R1F1A1061038)

    The authors declare that there is no conflict of interest.



    [1] H. Nguyen, W. Liu, F. Chen, Discovering congestion propagation patterns in spatio-temporal traffic data, IEEE Trans. Big Data, 3 (2016), 169–180. https://doi.org/10.1109/TBDATA.2016.2587669 doi: 10.1109/TBDATA.2016.2587669
    [2] J. Y. Lee, J. I. Kwak, Y. K. Oh, S. Kim, Quantifying incident impacts and identifying influential features in urban traffic networks, Transportmetrica B Transp. Dyn., 2022 (2022), 1–22. https://doi.org/10.1080/21680566.2022.2063205 doi: 10.1080/21680566.2022.2063205
    [3] P. Newson, J. Krumm, Hidden markov map matching through noise and sparseness, in Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, (2009), 336–343. https://doi.org/10.1145/1653771.1653818
    [4] F. G. Habtemichael, M. Cetin, K. A. Anuar, Incident-induced delays on freeways: Quantification method by grouping similar traffic patterns, Transp. Res. Record, 2484 (2015), 60–69. https://doi.org/10.3141/2484-07 doi: 10.3141/2484-07
    [5] A. Garib, A. E. Radwan, H. Al-Deek, Estimating magnitude and duration of incident delays, J. Transp. Eng., 123 (1997), 459–466. https://doi.org/10.1061/(ASCE)0733-947X(1997)123:6(459) doi: 10.1061/(ASCE)0733-947X(1997)123:6(459)
    [6] D. Nam, F. Mannering, An exploratory hazard-based analysis of highway incident duration, Transp. Res. Part A Policy Pract., 34 (2000), 85–102. https://doi.org/10.1016/S0965-8564(98)00065-2 doi: 10.1016/S0965-8564(98)00065-2
    [7] M. Zhu, H. F. Yang, C. Liu, Z. Pu, Y. Wang, Real-time crash identification using connected electric vehicle operation data, Accid. Anal. Prev., 173 (2022), 106708. https://doi.org/10.1016/j.aap.2022.106708 doi: 10.1016/j.aap.2022.106708
    [8] D. Cao, J. Wu, X. Dong, H. Sun, X. Qu, Z. Yang, Quantification of the impact of traffic incidents on speed reduction: A causal inference based approach, Accid. Anal. Prev., 157 (2021), 10616. https://doi.org/10.1016/j.aap.2021.106163 doi: 10.1016/j.aap.2021.106163
    [9] C. Knapp, G. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. Acoust. Speech Signal Process., 24 (1976), 320–327. https://doi.org/10.1109/TASSP.1976.1162830 doi: 10.1109/TASSP.1976.1162830
    [10] M. Souden, J. Benesty, S. Affes, Broadband source localization from an eigenanalysis perspective, IEEE Trans. Audio Speech Lang. Process., 18 (2009), 1575–1587. https://doi.org/10.1109/TASL.2009.2038556 doi: 10.1109/TASL.2009.2038556
    [11] P. Marmaroli, X. Falourd, H. Lissek, A comparative study of time delay estimation techniques for road vehicle tracking, in Acoustics 2012, 2012.
    [12] S. R. Chandra, H. Al-Deek, Cross-correlation analysis and multivariate prediction of spatial time series of freeway traffic speeds, Transp. Res. Record, 2061 (2008), 64–76. https://doi.org/10.3141/2061-08 doi: 10.3141/2061-08
    [13] C. Shen, Analysis of detrended time-lagged cross-correlation between two nonstationary time series, Phys. Lett. A, 379 (2015), 680–687. https://doi.org/10.1016/j.physleta.2014.12.036 doi: 10.1016/j.physleta.2014.12.036
    [14] R. T. Vassoler, G. F. Zebende, Dcca cross-correlation coefficient apply in time series of air temperature and air relative humidity, Phys. A Stat. Mech. Its Appl., 391 (2012), 2438–2443. https://doi.org/10.1016/j.physa.2011.12.015 doi: 10.1016/j.physa.2011.12.015
    [15] S. Sun, C. Zhang, Y. Zhang, Traffic flow forecasting using a spatio-temporal bayesian network predictor, in International Conference on Artificial Neural Networks, (2005), 273–278. https://doi.org/10.1007/11550907_43
    [16] S. Chawla, Y. Zheng, J. Hu, Inferring the root cause in road traffic anomalies, in 2012 IEEE 12th International Conference on Data Mining, (2012), 141–150. https://doi.org/10.1109/ICDM.2012.104
    [17] Y. Xiao, Y. Zhao, G. Wu, Y. Jing, Study on delay propagation relations among airports based on transfer entropy, IEEE Access, 8 (2020), 97103–97113. https://doi.org/10.1109/ACCESS.2020.2996301 doi: 10.1109/ACCESS.2020.2996301
    [18] W. Härdle, J. Horowitz, J. P. Kreiss, Bootstrap methods for time series, in Handbook of Statistics, 30 (2012), 3–26. https://doi.org/10.1111/j.1751-5823.2003.tb00485.x
    [19] T. Schreiber, Measuring information transfer, Phys. Rev. Lett., 85 (2000), 461. https://doi.org/10.1103/PhysRevLett.85.461 doi: 10.1103/PhysRevLett.85.461
    [20] C. E. Shannon, A mathematical theory of communication, Bell Syst. Technical J., 27 (1948), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x doi: 10.1002/j.1538-7305.1948.tb01338.x
    [21] M. Wibral, N. Pampu, V. Priesemann, F. Siebenhühner, H. Seiwert, M. Lindner, et al., Measuring information-transfer delays, PloS One, 8 (2013), e55809. https://doi.org/10.1371/journal.pone.0055809 doi: 10.1371/journal.pone.0055809
    [22] R. Marschinski, H. Kantz, Analysing the information flow between financial time series, Eur. Phys. J. B Condens. Matter Complex Syst., 30 (2002), 275–281. https://doi.org/10.1140/epjb/e2002-00379-2 doi: 10.1140/epjb/e2002-00379-2
    [23] W. C. Hong, P. F. Pai, S. L. Yang, R. Theng, Highway traffic forecasting by support vector regression model with tabu search algorithms, in The 2006 IEEE International Joint Conference on Neural Network Proceedings, (2006), 1617–1621. https://doi.org/10.1109/IJCNN.2006.246627
    [24] S. R. Chandra, H. Al-Deek, Predictions of freeway traffic speeds and volumes using vector autoregressive models, J. Intell. Transp. Syst., 13 (2009), 53–72. https://doi.org/10.1080/15472450902858368 doi: 10.1080/15472450902858368
    [25] E. I. Vlahogianni, M. G. Karlaftis, J. C. Golias, Short-term traffic forecasting: Where we are and where we're going, Transp. Res. Part C Emerging Technol., 43 (2014), 3–19. https://doi.org/10.1016/j.trc.2014.01.005 doi: 10.1016/j.trc.2014.01.005
    [26] D. Pavlyuk, Short-term traffic forecasting using multivariate autoregressive models, Procedia Eng., 178 (2017), 57–66. https://doi.org/10.1016/j.proeng.2017.01.062 doi: 10.1016/j.proeng.2017.01.062
    [27] Z. Song, Y. Guo, Y. Wu, J. Ma, Short-term traffic speed prediction under different data collection time intervals using a sarima-sdgm hybrid prediction model, PloS One, 14 (2019), e0218626. https://doi.org/10.1371/journal.pone.0218626 doi: 10.1371/journal.pone.0218626
    [28] E. Ogasawara, L. C. Martinez, D. De Oliveira, G. Zimbr{ã}o, G. L. Pappa, M. Mattoso, Adaptive normalization: A novel data normalization approach for non-stationary time series, in The 2010 International Joint Conference on Neural Networks (IJCNN), (2010), 1–8. https://doi.org/10.1109/IJCNN.2010.5596746
    [29] J. Wang, S. Su, Y. Li, J. Chen, D. Shi, Desaturated probability integral transform for normalizing power system measurements in data-driven manipulation detection, in 2019 IEEE Power & Energy Society General Meeting (PESGM), (2019), 1–5. https://doi.org/10.1109/PESGM40551.2019.8973800
    [30] G. Casella, R. L. Berger, Statistical Inference, Cengage Learning, 2021.
    [31] V. Witkovsky, On the exact two-sided tolerance intervals for univariate normal distribution and linear regression, Austrian J. Stat., 43 (2014), 279–292.
  • This article has been cited by:

    1. Chuanyao Li, Yichao Lu, Yuqiang Wang, Gege Jiang, Congestion behavior and tolling strategies in a bottleneck model with exponential scheduling preference, 2022, 31, 2688-1594, 1065, 10.3934/era.2023053
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2714) PDF downloads(166) Cited by(1)

Figures and Tables

Figures(7)  /  Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog