Citation: Shahid Latif, Firuza Mustafa. Trivariate distribution modelling of flood characteristics using copula function—A case study for Kelantan River basin in Malaysia[J]. AIMS Geosciences, 2020, 6(1): 92-130. doi: 10.3934/geosci.2020007
[1] | Guojun Gan, Qiujun Lan, Shiyang Sima . Scalable Clustering by Truncated Fuzzy c-means. Big Data and Information Analytics, 2016, 1(2): 247-259. doi: 10.3934/bdia.2016007 |
[2] | Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002 |
[3] | Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang . A clustering based mate selection for evolutionary optimization. Big Data and Information Analytics, 2017, 2(1): 77-85. doi: 10.3934/bdia.2017010 |
[4] | Zhouchen Lin . A Review on Low-Rank Models in Data Analysis. Big Data and Information Analytics, 2016, 1(2): 139-161. doi: 10.3934/bdia.2016001 |
[5] | Pawan Lingras, Farhana Haider, Matt Triff . Fuzzy temporal meta-clustering of financial trading volatility patterns. Big Data and Information Analytics, 2017, 2(3): 219-238. doi: 10.3934/bdia.2017018 |
[6] | Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015 |
[7] | Ming Yang, Dunren Che, Wen Liu, Zhao Kang, Chong Peng, Mingqing Xiao, Qiang Cheng . On identifiability of 3-tensors of multilinear rank (1; Lr; Lr). Big Data and Information Analytics, 2016, 1(4): 391-401. doi: 10.3934/bdia.2016017 |
[8] | Subrata Dasgupta . Disentangling data, information and knowledge. Big Data and Information Analytics, 2016, 1(4): 377-390. doi: 10.3934/bdia.2016016 |
[9] | Robin Cohen, Alan Tsang, Krishna Vaidyanathan, Haotian Zhang . Analyzing opinion dynamics in online social networks. Big Data and Information Analytics, 2016, 1(4): 279-298. doi: 10.3934/bdia.2016011 |
[10] | Ugo Avila-Ponce de León, Ángel G. C. Pérez, Eric Avila-Vales . A data driven analysis and forecast of an SEIARD epidemic model for COVID-19 in Mexico. Big Data and Information Analytics, 2020, 5(1): 14-28. doi: 10.3934/bdia.2020002 |
In data clustering or cluster analysis, the goal is to divide a set of objects into homogeneous groups called clusters [10,18,20,26,12,1]. For high-dimensional data, clusters are usually formed in subspaces of the original data space and different clusters may relate to different subspaces. To recover clusters embedded in subspaces, subspace clustering algorithms have been developed, see for example [2,15,19,17,9,21,16,22,3,25,7,11,13]. Subspace clustering algorithms can be classified into two categories: hard subspace clustering algorithms and soft subspace clustering algorithms.
In hard subspace clustering algorithms, the subspaces in which clusters embed are determined exactly. In other words, each attribute of the data is either associated with a cluster or not associated with the cluster. For example, the subspace clustering algorithms developed in [2] and [15] are hard subspace clustering algorithms. In soft subspace clustering algorithms, the subspaces of clusters are not determined exactly. Each attribute is associated to a cluster with some probability. If an attribute is important to the formation of a cluster, then the attribute is associated to the cluster with high probability. Examples of soft subspace clustering algorithms include [19], [9], [21], [16], and [13].
In soft subspace clustering algorithms, the attribute weights associated with clusters are automatically determined. In general, the weight of an attribute for a cluster is inversely proportional to the dispersion of the attribute in the cluster. If the values of an attribute in a cluster is relatively compact, then the attribute will be assigned a relatively high value. In the FSC algorithm [16], for example, the attribute weights are calculated as
wlj=1∑dh=1(Vlj+ϵVlh+ϵ)1α−1, l=1,2,…,k,j=1,2,…,d, | (1) |
where
Vlj=∑x∈Cl(xj−zlj)2. | (2) |
Here
wlj=exp(−Vljγ)∑ds=1exp(−Vlsγ), k=1,2,…,n,l=1,2,…,d, | (3) |
where
One drawback of the FSC algorithm is that a positive value of
w1=e−10e−10+e−30=11+e−20=1, w2=e−30e−10+e−30=11+e20=0. |
If we use
w1=e−1e−1+e−3=11+e−2=0.88, w2=e−3e−1+e−3=11+e2=0.12. |
From the above example we see that choosing an appropriate value for the parameter
In this paper, we address the issue from a different perspective. Unlike the group feature weighting approach, the approach we employ in this paper involves using the log transformation to transform the distances so that the attribute weights are not dominated by a single attribute with the smallest dispersion. In particular, we present a soft subspace clustering algorithm called the LEKM algorithm (log-transformed entropy weighting
The remaining part of this paper is structured as follows. In Section 2, we give a brief review of the LAC algorithm [9] and the EWKM algorithm [21]. In Section 3, we present the LEKM algorithm in detail. In Section 4, we present numerical experiments to demonstrate the performance of the LEKM algorithm. Section 5 concludes the paper with some remarks.
In this section, we introduce the EWKM algorithm [21] and the LAC algorithm [9], which are soft subspace clustering algorithms using the entropy weighting.
Let
F(U,W,Z)=k∑l=1[n∑i=1d∑j=1uilwlj(xij−zlj)2+γd∑j=1wljlnwlj], | (4) |
where
k∑l=1uil=1, i=1,2,…,n, | (5a) |
uil∈{0,1}, i=1,2,…,n,l=1,2,…,k, | (5b) |
d∑j=1wlj=1, l=1,2,…,k, | (5c) |
and
wlj>0, l=1,2,…,k,j=1,2,…,d. | (5d) |
Like the
uil={1, if ∑dj=1wlj(xij−zlj)2≤∑dj=1uiswsj(xij−zsj)2 for 1≤s≤k,0, if otherwise, |
for
wlj=exp(−Vljγ)∑ds=1exp(−Vlsγ) |
for
Vlj=n∑i=1uil(xij−zlj)2. |
Given
zlj=∑ni=1uilxij∑ni=1uil |
for
The parameter
The LAC algorithm (Locally Adaptive Clustering) [9] and the EWKM algorithm are similar soft subspace clustering algorithms in that both algorithms discover subspace clusters via exponential weighting of attributes. However, the LAC algorithm differs from the EWKM algorithm in the definition of objective function. Clusters found by the LAC algorithm are referred to as weighted clusters. The objective function of the LAC algorithm is defined as
E(C,Z,W)=k∑l=1d∑j=1(wlj1|Cl|∑x∈Cl(xj−zlj)2+hwljlogwlj), | (6) |
where
Like the
Sl={x:d∑j=1wlj(xj−zlj)2<d∑j=1wsj(xj−zsj)2,∀s≠l} | (7) |
for
wlj=exp(−Vlj)/h∑ds=1exp(−Vls/h) | (8) |
for
Vlj=1|Sl|∑x∈Sl(xj−zlj)2. |
Given the set of clusters
zlj=1|Sl|∑x∈Slxj | (9) |
for
Comparing Equation (6) with Equation (4), we see that the distances in the objective function of the LAC algorithm are normalized by the sizes of the corresponding clusters. As a result, the dispersions (i.e.,
In this section, we present the LEKM algorithm. The LEKM algorithm is similar to the EWKM algorithm [21] and the LAC algorithm [9] in that the entropy weighting is used to determine the attribute weights.
Let
P(U,W,Z)=k∑l=1n∑i=1uild∑j=1wljln[1+(xij−zlj)2]+λk∑l=1n∑i=1uild∑j=1wljlnwlj=k∑l=1n∑i=1uil[d∑j=1wljln[1+(xij−zlj)2]+λd∑j=1wljlnwlj], | (10) |
where
Similar to the EWKM algorithm, the LEKM algorithm tries to minimize the objective function given in Equation (10) iteratively by finding the optimal value of
Theorem 3.1. Let
uil={1, if D(xi,zl)≤D(xi,zs) for all s=1,2,…,k;0, if otherwise, | (11) |
for
D(xi,zs)=d∑j=1wljln[1+(xij−zsj)2]+λd∑j=1wljlnwlj. |
Proof. Since
f(ui1,ui2,…,uik)=k∑l=1uilD(xi,zl) | (12) |
is minimized. Note that
k∑l=1uil=1. |
The function defined in Equation (12) is minimized if Equation (11) holds. This completes the proof.
Theorem 3.2. Let
wlj=exp(−Vljλ)∑ds=1exp(−Vlsλ) | (13) |
for
Vlj=∑ni=1uilln[1+(xij−zlj)2]∑ni=1uil. |
Proof. The weight matrix
d∑j=1wlj=1, l=1,2,…,k, |
is the matrix
f(W)=P(U,W,Z)+k∑l=1βl(d∑j=1wlj−1) =k∑l=1n∑i=1uil[d∑j=1wljln[1+(xij−zlj)2]+λd∑j=1wljlnwlj] +k∑l=1βl(d∑j=1wlj−1). | (14) |
The weight matrix
∂f(W)∂wlj=n∑i=1uil(ln[1+(xij−zlj)2]+λlnwlj+λ)+βl=0 |
for
∂f(W)∂βl=d∑j=1wlj−1=0 |
for
From Equation (13) we see that the attribute weights of the
Theorem 3.3. Let
zlj=∑ni=1uil[1+(xij−zlj)2]−1xij∑ni=1uil[1+(xij−zlj)2]−1 | (15) |
for
Proof. If the set of cluster centers
∂P∂zlj=wljn∑i=1uil[1+(xij−zlj)2]−1[−2(xij−zlj)]=0. |
Since
n∑i=1uil[1+(xij−zlj)2]−1[−2(xij−zlj)]=0, |
from which Equation (15) follows.
In the standard
zlj=∑ni=1uil[1+(xij−z∗lj)2]−1xij∑ni=1uil[1+(xij−z∗lj)2]−1 | (16) |
for
To find the optimal values of
![]() |
The LEKM algorithm requires four parameters:
Parameter | Default Value |
1 | |
100 |
In this section, we present numerical experiments based on both synthetic data and real data to demonstrate the performance of the LEKM algorithm. We also compare the LEKM algorithm with the EWKM algorithm and the LAC algorithm in terms of accuracy and runtime. We implemented all three algorithms in Java and used the same convergence criterion as shown in Algorithm 1.
In our experiments, we use the corrected Rand index [8,13] to measure the accuracy of clustering results. The corrected Rand index is calculated from two partitions of the same dataset and its value ranges from -1 to 1, with 1 indicating perfect agreement between the two partitions and 0 indicating agreement by chance. In general, the higher the corrected Rand index, the better the clustering result.
Since the all the three algorithms are
To test the performance of the LEKM algorithm, we generated two synthetic datasets. The first synthetic dataset is a 2-dimensional dataset with two clusters and is shown in Figure 1. From the figure we see that the cluster in the top is compact but the cluster in the bottom contains several points that are far away from the cluster center. We can consider this dataset as a dataset containing noises.
Table 2 shows the average corrected Rand index of 100 runs of the three algorithms on the first synthetic dataset. From the table we see that the LEKM algorithm produced more accurate results than the LAC algorithm and the EWKM algorithm. The EWKM produced the least accurate results. Since the dispersion of an attribute in a cluster is normalized by the size of the cluster in the LAC and LEKM algorithms, the LAC and LEKM algorithms are less sensitive to the parameter.
Parameter | EWKM | LAC | LEKM |
1 | 0.0351 (0.0582) | 0.0024 (0.0158) | 0.9154 (0.2704) |
2 | 0.0378 (0.0556) | 0.9054 (0.2322) | 0.9063 (0.2827) |
4 | 0.012 (0.031) | 0.8019 (0.2422) | 0.9067 (0.2815) |
8 | -0.0135 (0.0125) | 0.7604 (0.2406) | 0.9072 (0.2799) |
16 | -0.013 (0.0134) | 0.7527 (0.2501) | 0.9072 (0.2799) |
Table 3 shows the confusion matrices produced by the best run of the three algorithms on the first synthetic dataset. We run the EWKM algorithm, the LAC algorithm, and the LEKM algorithm 100 times on the first synthetic dataset with parameter 2 (i.e.,
1 | 2 | 1 | 2 | 1 | 2 | |||||
C2 | 35 | 25 | C2 | 59 | 0 | C2 | 60 | 0 | ||
C1 | 25 | 15 | C1 | 1 | 40 | C1 | 0 | 40 | ||
(a) | (b) | (c) |
Table 4 shows the attribute weights of the two clusters produced by the best runs of the three algorithms. As we can see from the table that the attribute weights produced by the EWKM algorithm are dominated by one attribute. The attribute weights of one cluster produced by the LAC algorithm is also affected by the noises in the cluster. The attribute weights of the clusters produced by the LEKM algorithm seem reasonable as the two clusters are formed in the full space and approximate the same attribute weights are expected.
Weight | Weight | Weight | ||||||||
C1 | 1 | 3.01E-36 | C1 | 0.8931 | 0.1069 | C1 | 0.5448 | 0.4552 | ||
C2 | 1 | 2.85E-51 | C2 | 0.5057 | 0.4943 | C2 | 0.5055 | 0.4945 | ||
(a) | (b) | (c) |
Table 5 shows the average runtime of the 100 runs of the three algorithms on the first synthetic dataset. From the table we see that the EWKM algorithm converged the fastest. The LAC algorithm and the LEKM algorithm converged in about the same time.
Parameter | EWKM | LAC | LEKM |
1 | 0.0005 (0.0005) | 0.0021 (0.0032) | 0.0016 (0.0009) |
2 | 0.0002 (0.0004) | 0.0018 (0.0026) | 0.0013 (0.0006) |
4 | 0.0002 (0.0004) | 0.0017 (0.0025) | 0.0014 (0.0011) |
8 | 0.0003 (0.0004) | 0.0018 (0.0026) | 0.0016 (0.0017) |
16 | 0.0002 (0.0004) | 0.0018 (0.0025) | 0.0016 (0.002) |
The second synthetic dataset is a 100-dimensional dataset with four clusters. Table 6 shows the sizes and dimensions of the four clusters. This dataset was also used to test the SAP algorithm developed in [13]. Table 7 summarizes the clustering results of the three algorithms. From the table we see that the LEKM algorithm produced the most accurate results when the parameter is small. When the parameter is large, the attribute weights calculated by the LEKM algorithm become approximately the same. Since the clusters are embedded in subspaces, assigning approximately the same weight to attributes prevents the LEKM algorithm from recovering these clusters.
Cluster | Cluster Size | Subspace Dimensions |
A | 500 | 10, 15, 70 |
B | 300 | 20, 30, 80, 85 |
C | 500 | 30, 40, 70, 90, 95 |
D | 700 | 40, 45, 50, 55, 60, 80 |
Parameter | EWKM | LAC | LEKM |
1 | 0.557 (0.1851) | 0.5534 (0.1857) | 0.9123 (0.147) |
2 | 0.557 (0.1851) | 0.5572 (0.1883) | 0.928 (0.1361) |
4 | 0.557 (0.1851) | 0.5658 (0.1902) | 0.6128 (0.1626) |
8 | 0.557 (0.1851) | 0.574 (0.2028) | 0.3197 (0.1247) |
16 | 0.5573 (0.1854) | 0.6631 (0.2532) | 0.2293 (0.0914) |
Table 8 shows the confusion matrices produced by the runs of the three algorithms with the lowest objective function value. From the table we see that only three points were clustered incorrectly by the LEKM algorithm. Many points were clustered incorrectly by the EWKM algorithm and the LAC algorithm. Figures 2, 3, and Figure 4 plot the attribute weights of the four clusters corresponding to the confusion matrices given in Table 8. From Figures 2 and 3 we can see that the attribute weights were dominated by a single attribute. Figure 4 shows that the LEKM algorithm was able to recover all the subspace dimensions correctly.
Table 9 shows the average runtime of 100 runs of the three algorithms on the second synthetic dataset. From the table we see that the LEKM algorithm is slower than the other two algorithms. Since the center calculation of the LEKM algorithm is more complicate than that of the EWKM algorithm and the LAC algorithm, it is expected that the LEKM algorithm is slower than the other two algorithms.
Parameter | EWKM | LAC | LEKM |
1 | 0.7849 (0.4221) | 1.1788 (0.763) | 10.4702 (0.1906) |
2 | 0.7687 (0.4141) | 0.8862 (0.4952) | 10.3953 (0.1704) |
4 | 0.7619 (0.4101) | 0.8412 (0.4721) | 10.5236 (0.2023) |
8 | 0.7567 (0.4074) | 0.8767 (0.4816) | 10.5059 (0.2014) |
16 | 0.7578 (0.4112) | 0.8136 (0.5069) | 10.4122 (0.189) |
In summary, the test results on synthetic datasets have shown that the LEKM algorithm is able to recover clusters from noise data and recover clusters embedded in subspaces. The test results also show that the LEKM algorithm is less sensitive to noises and parameter values that the EWKM algorithm and the LEKM algorithm. However, the LEKM algorithm is in general slower than the other two algorithm due to its complex center calculation.
To test the algorithms on real data, we obtained two cancer gene expression datasets from [8]1. The first dataset contains gene expression data of human liver cancers and the second dataset contains gene expression data of breast tumors and colon tumors. Table 10 shows the information of the two real datasets. The two datasets have known labels, which tell the type of sample of each data point. The two datasets were also used to test the SAP algorithm in [13].
Dataset | Samples | Dimensions | Cluster sizes |
Chen-2002 | 179 | 85 | 104, 76 |
Chowdary-2006 | 104 | 182 | 62, 42 |
The datasets are available at http://bioinformatics.rutgers.edu/Static/Supplements/CompCancer/datasets.htm
Table 11 and Table 12 summarize the average accuracy and the average runtime of 100 runs of the three algorithms on the Chen-2002 dataset, respectively. From the average corrected Rand index shown in Table 11 we see that the LEKM algorithm produced more accurate results than the EWKM algorithm and the LAC algorithm did. However, the LEKM algorithm was slower than the other two algorithm.
Parameter | EWKM | LAC | LEKM |
1 | 0.025 (0.0395) | 0.0042 (0.0617) | 0.2599 (0.2973) |
2 | 0.0203 (0.0343) | 0.0888 (0.1903) | 0.2563 (0.2868) |
4 | 0.0135 (0.0279) | 0.041 (0.1454) | 0.2743 (0.2972) |
8 | 0.0141 (0.0449) | 0.0484 (0.1761) | 0.2856 (0.2993) |
16 | 0.0002 (0.0416) | 0.0445 (0.1726) | 0.2789 (0.2984) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0111 (0.0031) | 0.0162 (0.0083) | 0.102 (0.0297) |
2 | 0.0123 (0.0033) | 0.0124 (0.006) | 0.1035 (0.0286) |
4 | 0.0143 (0.006) | 0.0151 (0.0105) | 0.1046 (0.0316) |
8 | 0.0122 (0.0043) | 0.0137 (0.0089) | 0.1068 (0.0337) |
16 | 0.0144 (0.007) | 0.014 (0.0091) | 0.105 (0.0323) |
The average accuracy and runtime of 100 runs of the three algorithms on the Chowdary-2006 dataset are shown in Table 13 and Table 14, respectively. From Table 13 we see than the LEKM algorithm again produced more accurate clustering results than the other two algorithm did. When the parameter was set to be 1, the LAC produced better results than the EWKM algorithm did. For other cases, however, the EWKM algorithm produced better results than the LAC algorithm did. The LAC algorithm and the EWKM algorithm are much faster than the LEKM algorithm as shown in Table 14.
Parameter | EWKM | LAC | LEKM |
1 | 0.3952 (0.3943) | 0.5197 (0.2883) | 0.5826 (0.3199) |
2 | 0.3819 (0.3825) | 0.19 (0.2568) | 0.5757 (0.3261) |
4 | 0.3839 (0.3677) | 0.0772 (0.1016) | 0.5823 (0.3221) |
8 | 0.4188 (0.3584) | 0.0595 (0.0224) | 0.5756 (0.3383) |
16 | 0.4994 (0.3927) | 0.0625 (0.0184) | 0.582 (0.3363) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0115 (0.0048) | 0.0109 (0.0042) | 0.1369 (0.0756) |
2 | 0.011 (0.0046) | 0.0156 (0.0093) | 0.1446 (0.0723) |
4 | 0.0103 (0.0042) | 0.0147 (0.0076) | 0.1514 (0.0805) |
8 | 0.0107 (0.005) | 0.0141 (0.0063) | 0.1524 (0.0769) |
16 | 0.0113 (0.0047) | 0.0138 (0.0068) | 0.1542 (0.0854) |
In summary, the test results on real datasets show that the LEKM algorithm produced more accurate clustering results on average than the EWKM algorithm and the LAC algorithm did. However, the LEKM algorithm was slower than the other two algorithms.
The EWKM algorithm [21] and the LAC algorithm [9] are two soft subspace clustering algorithms that are similar to each other. In both algorithms, the attribute weights of a cluster are calculated as exponential normalizations of the negative attribute dispersions in the cluster scaled by a parameter. Setting the parameter is a challenge when the attribute dispersions in a cluster have a large range. In this paper, we proposed the LEKM (log-transformed entropy weighting
We tested the performance of the LEKM algorithm and compared it with the EWKM algorithm and the LAC algorithm. The test results on both synthetic datasets and real datasets have shown that the LEKM algorithm is able to outperform the EWKM algorithm and the LAC algorithm in terms of accuracy. However, one limitation of the LEKM algorithm is that it is slower than the other two algorithm because updating the cluster centers in each iteration in the LEKM algorithm is more complicate than that in the other two algorithms.
Another limitation of the LEKM algorithm is that it is sensitive to initial cluster centers. This limitation is common to most of the
The authors would like to thank referees for their insightful comments that greatly improve the quality of the paper.
[1] | Rao AR, Hameed KH (2000) Flood frequency analysis. CRC Press, Boca Raton, Fla. |
[2] | Zhang L (2005) Multivariate hydrological frequency analysis and risk mapping. Doctoral dissertation, Beijing Normal University. |
[3] |
Ganguli P, Reddy MJ (2013) Probabilistic assessments of flood risks using trivariate copulas. Theor Appl Climatol 111: 341-360. doi: 10.1007/s00704-012-0664-4
![]() |
[4] |
Yue S (2000) The bivariate lognormal distribution to model a multivariate flood episode. Hydrol Processes 14: 2575-2588. doi: 10.1002/1099-1085(20001015)14:14<2575::AID-HYP115>3.0.CO;2-L
![]() |
[5] |
Yue S, Rasmussen P (2002) Bivariate frequency analysis: discussion of some useful concepts in hydrological applications. Hydrol Processes 16: 2881-2898. doi: 10.1002/hyp.1185
![]() |
[6] |
Yue S, Wang CY (2004) A comparison of two bivariate extreme value distribution. Stoch Environ Res Risk Assess 18: 61-66. doi: 10.1007/s00477-003-0124-x
![]() |
[7] |
Zhang L, Singh VP (2006) Bivariate flood frequency analysis using copula method. J Hydrol Eng 11: 150-164. doi: 10.1061/(ASCE)1084-0699(2006)11:2(150)
![]() |
[8] |
Zhang L, Singh VP (2007) Trivariate flood frequency analysis using the Gumbel-Hougaard copula. J Hydrol Eng 12: 431-439. doi: 10.1061/(ASCE)1084-0699(2007)12:4(431)
![]() |
[9] |
Reddy MJ, Ganguli P (2012) Bivariate Flood Frequency Analysis of Upper Godavari River Flows Using Archimedean Copulas. Water Resour Manage 26: 3995-4018. doi: 10.1007/s11269-012-0124-z
![]() |
[10] |
Salvadori G (2004) Bivariate return periods via-2 copulas. Stat Methodol 1: 129-144. doi: 10.1016/j.stamet.2004.07.002
![]() |
[11] |
Graler B, van den Berg M, Vandenberg S, et al. (2013) Multivariate return periods in hydrology: a critical and practical review focusing on synthetic design hydrograph estimation. Hydrol Earth Syst Sci 17: 1281-1296. doi: 10.5194/hess-17-1281-2013
![]() |
[12] |
Krstanovic PF, Singh VP (1987) A multivariate stochastic flood analysis using entropy. In: Singh VP (Ed.), Hydrologic Frequency Modelling, Baton Rouge, U.S.A., 515-539. doi: 10.1007/978-94-009-3953-0_37
![]() |
[13] |
Escalante-Sanboval CA, Raynal-Villasenor JA (1998) Multivariate estimation of floods: the trivariate gumble distribution. J Stat Comput Simul 61: 313-340. doi: 10.1080/00949659808811917
![]() |
[14] |
Sandoval CE, Raynal-Villasenor J (2008) Trivariate generalized extreme value distribution in flood frequency analysis. Hydrol Sci J 53: 550-567. doi: 10.1623/hysj.53.3.550
![]() |
[15] |
Song S, Singh VP (2010) Meta-elliptical copulas for drought frequency analysis of periodic hydrologic data. Environ Res Hazard Assess 24: 425-444. doi: 10.1007/s00477-009-0331-1
![]() |
[16] |
De Michele C, Salvadori G (2003) A generalized Pareto intensity-duration model of storm rainfall exploiting 2-copulas. J Geophys Res Atmos 108: 4067. doi: 10.1029/2002JD002534
![]() |
[17] |
Grimaldi S, Serinaldi F (2006) Asymmetric copula in multivariate flood frequency analysis. Adv Water Resour 29: 1155-1167. doi: 10.1016/j.advwatres.2005.09.005
![]() |
[18] |
Salvadori G, De Michele C (2006) Statistical characterization of temporal structure of storms. Adv Water Resour 29: 827-842. doi: 10.1016/j.advwatres.2005.07.013
![]() |
[19] | Saklar A (1959) Functions de repartition n dimensions et leurs marges. Publications de l'Institut de Statistique de l'Université de Paris 8: 229-231. |
[20] | Nelsen RB (2006) An introduction to copulas, Springer, New York. |
[21] |
Genest C, Favre AC (2007) Everything you always wanted to know about copula modelling but were afraid to ask. J Hydrol Eng 12: 347-368. doi: 10.1061/(ASCE)1084-0699(2007)12:4(347)
![]() |
[22] | Favre AC, El Adlouni S, Perreault L, et al. (2004) Multivariate hydrological frequency analysis using copulas. Water Resour Res 40. |
[23] |
Renard B, Lang M (2007) Use of a Gaussian copula for multivariate extreme value analysis: Some case studies in hydrology. Adv Water Resour 30: 897-912. doi: 10.1016/j.advwatres.2006.08.001
![]() |
[24] |
Serinaldi F, Grimaldi S (2007) Fully nested 3-copula procedure and application on hydrological data. J Hydrol Eng 12: 420-430. doi: 10.1061/(ASCE)1084-0699(2007)12:4(420)
![]() |
[25] |
Genest C, Favre AC, Beliveau J, et al. (2007) Metaelliptical copulas and their use in frequency analysis of multivariate hydrological data. Water Resour Res 43: W09401. doi: 10.1029/2006WR005275
![]() |
[26] |
Li F, Zheng Q (2016) Probabilistic modelling of flood events using the entropy copula. Adv Water Resour 97: 233-240. doi: 10.1016/j.advwatres.2016.09.016
![]() |
[27] | Drainage and Irrigation Department Malaysia (2004) Annual flood report of DID for Peninsular Malaysia. Unpublished report. DID: Kuala Lumpur. |
[28] | Malaysian Meteorological Department (2007) Report on Heavy Rainfall that Caused Floods in Kelantan and Terengganu. Unpublished report. MMD: Kuala Lumpur. |
[29] |
Adnan NA, Atkinson PM (2011) Exploring the impact of climate and land use changes on streamflow trends in a monsoon catchment. Int J Climatol 31:815-831. doi: 10.1002/joc.2112
![]() |
[30] |
Madadgar S, Moradkhani H (2013) Drought Analysis under Climate Change Using Copula. J Hydrol Eng 18: 746-759. doi: 10.1061/(ASCE)HE.1943-5584.0000532
![]() |
[31] | Salvadori G, De Michele C (2010) Multivariate multiparameters extreme value models and return periods: A Copula approach. Water Resour Res 46. |
[32] |
Shiau JT (2006) Fitting drought duration and severity with two dimensional copulas. Water Resour Manage 20: 795-815. doi: 10.1007/s11269-005-9008-9
![]() |
[33] |
Zhang R, Chen X, Cheng Q, et al. (2016) Joint probability of precipitation and reservoir storage for drought estimation in the headwater basin of the Huaihe River, China. Stoch Environ Res Risk Assess 30: 1641-1657. doi: 10.1007/s00477-016-1249-z
![]() |
[34] | Kamarunzaman IF, Zin WZW, Ariff NM (2018) A Generalized Bivariate Copula for Flood Analysis in Peninsular Malaysia. Preprints, 2018080118. |
[35] | Couasnon A, Sebastian A, Morales-Napoles O (2018) A Copula-Based Bayesian Network for Modeling Compound Flood Hazard from Riverine and Coastal Interactions at the Catchment Scale: An Application to the Houston Ship Channel, Texas. Water 10: 1190. |
[36] |
Genest C, Ghoudi K, Rivest LP (1995) A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika 82: 543-552. doi: 10.1093/biomet/82.3.543
![]() |
[37] |
Xu Y, Huang G, Fan Y (2015) Multivariate flood risk analysis for Wei River. Stoch Environ Res Risk Assess 31: 225-242. doi: 10.1007/s00477-015-1196-0
![]() |
[38] |
De Michele C, Salvadori G, Canossi M, et al. (2005) Bivariate statistical approach to check the adequacy of dam spillway. J Hydrol Eng 10: 50-57. doi: 10.1061/(ASCE)1084-0699(2005)10:1(50)
![]() |
[39] |
Klein B, Pahlow M, Hundecha Y, et al. (2010) Probability analysis of hydrological loads for the design of flood control system using copulas. J Hydrol Eng 15: 360-369. doi: 10.1061/(ASCE)HE.1943-5584.0000204
![]() |
[40] |
Genest C, Rémillard B (2008) Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models. Annales de l'Institut Henri Poincare: Probabilites et Statistiques 44: 1096-1127. doi: 10.1214/07-AIHP148
![]() |
[41] |
Genest C, Rémillard B, Beaudoin D (2009) Goodness-of-fit tests for copulas: A review and a power study. Insur Math Econ 44: 199-214. doi: 10.1016/j.insmatheco.2007.10.005
![]() |
[42] |
Kojadinovic I, Yan J, Holmes M (2011) Fast large-sample goodness-of-fit tests for copulas. Stat Sin 21: 841-871. doi: 10.5705/ss.2011.037a
![]() |
[43] |
Kojadinovic I, Yan J (2011) A goodness-of-fit test for multivariate multiparameter copulas based on multiplier central limit theorems. Stat Comput 21: 17-30. doi: 10.1007/s11222-009-9142-y
![]() |
[44] |
Zhang S, Okhrin O, Zhou QM, et al. (2016) Goodness-of-fit Test for Specification of Semiparametric Copula Dependence Models. J Econometrics 193: 215-233. doi: 10.1016/j.jeconom.2016.02.017
![]() |
[45] |
Salvadori G, De Michele C (2004) Frequency analysis via copulas: theoretical aspects and applications to hydrological events. Water Resour Res 40: W12511. doi: 10.1029/2004WR003133
![]() |
[46] |
Fisher NI, Switzer P (2001) Graphical assessments of dependence: is a picture worth 100 tests? Am Stat 55: 233-239. doi: 10.1198/000313001317098248
![]() |
[47] |
Genest C, Boies JC (2003) Detecting dependence with Kendall plots. Am Stat 57: 275-284. doi: 10.1198/0003130032431
![]() |
[48] |
Gringorten II (1963) A plotting rule of extreme probability paper. J Geophys Res 68: 813-814. doi: 10.1029/JZ068i003p00813
![]() |
[49] | Karmakar S, Simonovic SP (2008) Bivariate flood frequency analysis. Part-1: Determination of marginal by parametric and non-parametric techniques. J Flood Risk Manage 1: 190-200. |
[50] |
Cohn TA, Lane WL, Baier WG (1997) An algorithm for computing moments-based flood quantile estimates when historical flood information is available. Water Resour Res 33: 2089-2096. doi: 10.1029/97WR01640
![]() |
[51] |
Hosking JRM, Walis JR (1987) Parameter and quantile estimations for the generalized Pareto distributions. Technometrics 29: 339-349. doi: 10.1080/00401706.1987.10488243
![]() |
[52] |
Anderson TW, Darling DA (1954) A test of goodness of fit. J Am Stat Assoc 49: 765-769. doi: 10.1080/01621459.1954.10501232
![]() |
[53] |
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19: 716-723. doi: 10.1109/TAC.1974.1100705
![]() |
[54] |
Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6: 461-464. doi: 10.1214/aos/1176344136
![]() |
[55] | Hannan EJ, Quinn BG (1979) The Determination of the Order of an Autoregression. J R Stat Soc Series B Stat Methodol 41: 190-195. |
[56] |
Moriasi DN, Arnold JG, Van Liew MW, et al. (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50: 885-900. doi: 10.13031/2013.23153
![]() |
[57] | Genest C, Huang W, Dufour JM (2013) A regularized goodness-of-fit test for copulas. J Soc Fr Stat 154: 64-77. |
1. | Tongfeng Sun, 2018, Chapter 15, 978-3-030-00827-7, 140, 10.1007/978-3-030-00828-4_15 | |
2. | Qi He, Zhenxiang Chen, Ke Ji, Lin Wang, Kun Ma, Chuan Zhao, Yuliang Shi, 2020, Chapter 49, 978-3-030-16656-4, 530, 10.1007/978-3-030-16657-1_49 | |
3. | Guojun Gan, Yuping Zhang, Dipak K. Dey, Clustering by propagating probabilities between data points, 2016, 41, 15684946, 390, 10.1016/j.asoc.2016.01.034 |
Parameter | Default Value |
1 | |
100 |
Parameter | EWKM | LAC | LEKM |
1 | 0.0351 (0.0582) | 0.0024 (0.0158) | 0.9154 (0.2704) |
2 | 0.0378 (0.0556) | 0.9054 (0.2322) | 0.9063 (0.2827) |
4 | 0.012 (0.031) | 0.8019 (0.2422) | 0.9067 (0.2815) |
8 | -0.0135 (0.0125) | 0.7604 (0.2406) | 0.9072 (0.2799) |
16 | -0.013 (0.0134) | 0.7527 (0.2501) | 0.9072 (0.2799) |
1 | 2 | 1 | 2 | 1 | 2 | |||||
C2 | 35 | 25 | C2 | 59 | 0 | C2 | 60 | 0 | ||
C1 | 25 | 15 | C1 | 1 | 40 | C1 | 0 | 40 | ||
(a) | (b) | (c) |
Weight | Weight | Weight | ||||||||
C1 | 1 | 3.01E-36 | C1 | 0.8931 | 0.1069 | C1 | 0.5448 | 0.4552 | ||
C2 | 1 | 2.85E-51 | C2 | 0.5057 | 0.4943 | C2 | 0.5055 | 0.4945 | ||
(a) | (b) | (c) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0005 (0.0005) | 0.0021 (0.0032) | 0.0016 (0.0009) |
2 | 0.0002 (0.0004) | 0.0018 (0.0026) | 0.0013 (0.0006) |
4 | 0.0002 (0.0004) | 0.0017 (0.0025) | 0.0014 (0.0011) |
8 | 0.0003 (0.0004) | 0.0018 (0.0026) | 0.0016 (0.0017) |
16 | 0.0002 (0.0004) | 0.0018 (0.0025) | 0.0016 (0.002) |
Cluster | Cluster Size | Subspace Dimensions |
A | 500 | 10, 15, 70 |
B | 300 | 20, 30, 80, 85 |
C | 500 | 30, 40, 70, 90, 95 |
D | 700 | 40, 45, 50, 55, 60, 80 |
Parameter | EWKM | LAC | LEKM |
1 | 0.557 (0.1851) | 0.5534 (0.1857) | 0.9123 (0.147) |
2 | 0.557 (0.1851) | 0.5572 (0.1883) | 0.928 (0.1361) |
4 | 0.557 (0.1851) | 0.5658 (0.1902) | 0.6128 (0.1626) |
8 | 0.557 (0.1851) | 0.574 (0.2028) | 0.3197 (0.1247) |
16 | 0.5573 (0.1854) | 0.6631 (0.2532) | 0.2293 (0.0914) |
Parameter | EWKM | LAC | LEKM |
1 | 0.7849 (0.4221) | 1.1788 (0.763) | 10.4702 (0.1906) |
2 | 0.7687 (0.4141) | 0.8862 (0.4952) | 10.3953 (0.1704) |
4 | 0.7619 (0.4101) | 0.8412 (0.4721) | 10.5236 (0.2023) |
8 | 0.7567 (0.4074) | 0.8767 (0.4816) | 10.5059 (0.2014) |
16 | 0.7578 (0.4112) | 0.8136 (0.5069) | 10.4122 (0.189) |
Dataset | Samples | Dimensions | Cluster sizes |
Chen-2002 | 179 | 85 | 104, 76 |
Chowdary-2006 | 104 | 182 | 62, 42 |
Parameter | EWKM | LAC | LEKM |
1 | 0.025 (0.0395) | 0.0042 (0.0617) | 0.2599 (0.2973) |
2 | 0.0203 (0.0343) | 0.0888 (0.1903) | 0.2563 (0.2868) |
4 | 0.0135 (0.0279) | 0.041 (0.1454) | 0.2743 (0.2972) |
8 | 0.0141 (0.0449) | 0.0484 (0.1761) | 0.2856 (0.2993) |
16 | 0.0002 (0.0416) | 0.0445 (0.1726) | 0.2789 (0.2984) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0111 (0.0031) | 0.0162 (0.0083) | 0.102 (0.0297) |
2 | 0.0123 (0.0033) | 0.0124 (0.006) | 0.1035 (0.0286) |
4 | 0.0143 (0.006) | 0.0151 (0.0105) | 0.1046 (0.0316) |
8 | 0.0122 (0.0043) | 0.0137 (0.0089) | 0.1068 (0.0337) |
16 | 0.0144 (0.007) | 0.014 (0.0091) | 0.105 (0.0323) |
Parameter | EWKM | LAC | LEKM |
1 | 0.3952 (0.3943) | 0.5197 (0.2883) | 0.5826 (0.3199) |
2 | 0.3819 (0.3825) | 0.19 (0.2568) | 0.5757 (0.3261) |
4 | 0.3839 (0.3677) | 0.0772 (0.1016) | 0.5823 (0.3221) |
8 | 0.4188 (0.3584) | 0.0595 (0.0224) | 0.5756 (0.3383) |
16 | 0.4994 (0.3927) | 0.0625 (0.0184) | 0.582 (0.3363) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0115 (0.0048) | 0.0109 (0.0042) | 0.1369 (0.0756) |
2 | 0.011 (0.0046) | 0.0156 (0.0093) | 0.1446 (0.0723) |
4 | 0.0103 (0.0042) | 0.0147 (0.0076) | 0.1514 (0.0805) |
8 | 0.0107 (0.005) | 0.0141 (0.0063) | 0.1524 (0.0769) |
16 | 0.0113 (0.0047) | 0.0138 (0.0068) | 0.1542 (0.0854) |
Parameter | Default Value |
1 | |
100 |
Parameter | EWKM | LAC | LEKM |
1 | 0.0351 (0.0582) | 0.0024 (0.0158) | 0.9154 (0.2704) |
2 | 0.0378 (0.0556) | 0.9054 (0.2322) | 0.9063 (0.2827) |
4 | 0.012 (0.031) | 0.8019 (0.2422) | 0.9067 (0.2815) |
8 | -0.0135 (0.0125) | 0.7604 (0.2406) | 0.9072 (0.2799) |
16 | -0.013 (0.0134) | 0.7527 (0.2501) | 0.9072 (0.2799) |
1 | 2 | 1 | 2 | 1 | 2 | |||||
C2 | 35 | 25 | C2 | 59 | 0 | C2 | 60 | 0 | ||
C1 | 25 | 15 | C1 | 1 | 40 | C1 | 0 | 40 | ||
(a) | (b) | (c) |
Weight | Weight | Weight | ||||||||
C1 | 1 | 3.01E-36 | C1 | 0.8931 | 0.1069 | C1 | 0.5448 | 0.4552 | ||
C2 | 1 | 2.85E-51 | C2 | 0.5057 | 0.4943 | C2 | 0.5055 | 0.4945 | ||
(a) | (b) | (c) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0005 (0.0005) | 0.0021 (0.0032) | 0.0016 (0.0009) |
2 | 0.0002 (0.0004) | 0.0018 (0.0026) | 0.0013 (0.0006) |
4 | 0.0002 (0.0004) | 0.0017 (0.0025) | 0.0014 (0.0011) |
8 | 0.0003 (0.0004) | 0.0018 (0.0026) | 0.0016 (0.0017) |
16 | 0.0002 (0.0004) | 0.0018 (0.0025) | 0.0016 (0.002) |
Cluster | Cluster Size | Subspace Dimensions |
A | 500 | 10, 15, 70 |
B | 300 | 20, 30, 80, 85 |
C | 500 | 30, 40, 70, 90, 95 |
D | 700 | 40, 45, 50, 55, 60, 80 |
Parameter | EWKM | LAC | LEKM |
1 | 0.557 (0.1851) | 0.5534 (0.1857) | 0.9123 (0.147) |
2 | 0.557 (0.1851) | 0.5572 (0.1883) | 0.928 (0.1361) |
4 | 0.557 (0.1851) | 0.5658 (0.1902) | 0.6128 (0.1626) |
8 | 0.557 (0.1851) | 0.574 (0.2028) | 0.3197 (0.1247) |
16 | 0.5573 (0.1854) | 0.6631 (0.2532) | 0.2293 (0.0914) |
Parameter | EWKM | LAC | LEKM |
1 | 0.7849 (0.4221) | 1.1788 (0.763) | 10.4702 (0.1906) |
2 | 0.7687 (0.4141) | 0.8862 (0.4952) | 10.3953 (0.1704) |
4 | 0.7619 (0.4101) | 0.8412 (0.4721) | 10.5236 (0.2023) |
8 | 0.7567 (0.4074) | 0.8767 (0.4816) | 10.5059 (0.2014) |
16 | 0.7578 (0.4112) | 0.8136 (0.5069) | 10.4122 (0.189) |
Dataset | Samples | Dimensions | Cluster sizes |
Chen-2002 | 179 | 85 | 104, 76 |
Chowdary-2006 | 104 | 182 | 62, 42 |
Parameter | EWKM | LAC | LEKM |
1 | 0.025 (0.0395) | 0.0042 (0.0617) | 0.2599 (0.2973) |
2 | 0.0203 (0.0343) | 0.0888 (0.1903) | 0.2563 (0.2868) |
4 | 0.0135 (0.0279) | 0.041 (0.1454) | 0.2743 (0.2972) |
8 | 0.0141 (0.0449) | 0.0484 (0.1761) | 0.2856 (0.2993) |
16 | 0.0002 (0.0416) | 0.0445 (0.1726) | 0.2789 (0.2984) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0111 (0.0031) | 0.0162 (0.0083) | 0.102 (0.0297) |
2 | 0.0123 (0.0033) | 0.0124 (0.006) | 0.1035 (0.0286) |
4 | 0.0143 (0.006) | 0.0151 (0.0105) | 0.1046 (0.0316) |
8 | 0.0122 (0.0043) | 0.0137 (0.0089) | 0.1068 (0.0337) |
16 | 0.0144 (0.007) | 0.014 (0.0091) | 0.105 (0.0323) |
Parameter | EWKM | LAC | LEKM |
1 | 0.3952 (0.3943) | 0.5197 (0.2883) | 0.5826 (0.3199) |
2 | 0.3819 (0.3825) | 0.19 (0.2568) | 0.5757 (0.3261) |
4 | 0.3839 (0.3677) | 0.0772 (0.1016) | 0.5823 (0.3221) |
8 | 0.4188 (0.3584) | 0.0595 (0.0224) | 0.5756 (0.3383) |
16 | 0.4994 (0.3927) | 0.0625 (0.0184) | 0.582 (0.3363) |
Parameter | EWKM | LAC | LEKM |
1 | 0.0115 (0.0048) | 0.0109 (0.0042) | 0.1369 (0.0756) |
2 | 0.011 (0.0046) | 0.0156 (0.0093) | 0.1446 (0.0723) |
4 | 0.0103 (0.0042) | 0.0147 (0.0076) | 0.1514 (0.0805) |
8 | 0.0107 (0.005) | 0.0141 (0.0063) | 0.1524 (0.0769) |
16 | 0.0113 (0.0047) | 0.0138 (0.0068) | 0.1542 (0.0854) |