
We launch a systematic study of the refined Wilf-equivalences by the statistics
● Bijective proofs of the symmetry of the joint distribution
● A complete classification of
● A further refinement of Wang's descent-double descent-Wilf equivalence between separable permutations and
Citation: Shishuo Fu, Zhicong Lin, Yaling Wang. Refined Wilf-equivalences by Comtet statistics[J]. Electronic Research Archive, 2021, 29(5): 2877-2913. doi: 10.3934/era.2021018
[1] | Robert A. Moss, Jarrod Moss . The Role of Dynamic Columns in Explaining Gamma-band Synchronization and NMDA Receptors in Cognitive Functions. AIMS Neuroscience, 2014, 1(1): 65-88. doi: 10.3934/Neuroscience.2014.1.65 |
[2] | Ziphozethu Ndlazi, Oualid Abboussi, Musa Mabandla, Willie Daniels . Memantine increases NMDA receptor level in the prefrontal cortex but fails to reverse apomorphine-induced conditioned place preference in rats. AIMS Neuroscience, 2018, 5(4): 211-220. doi: 10.3934/Neuroscience.2018.4.211 |
[3] | Suresh D Muthukumaraswamy . Introduction to AIMS Special Issue “How do Gamma Frequency Oscillations and NMDA Receptors Contribute to Normal and Dysfunctional Cognitive Performance”. AIMS Neuroscience, 2014, 1(2): 183-184. doi: 10.3934/Neuroscience.2014.2.183 |
[4] | Robert A. Moss, Jarrod Moss . Commentary on the Pinotsis and Friston Neural Fields DCM and the Cadonic and Albensi Oscillations and NMDA Receptors Articles. AIMS Neuroscience, 2014, 1(2): 158-162. doi: 10.3934/Neuroscience.2014.2.158 |
[5] | Mazyar Zahir, Amir Rashidian, Mohsen Hoseini, Reyhaneh Akbarian, Mohsen Chamanara . Pharmacological evidence for the possible involvement of the NMDA receptor pathway in the anticonvulsant effect of tramadol in mice. AIMS Neuroscience, 2022, 9(4): 444-453. doi: 10.3934/Neuroscience.2022024 |
[6] | Nicholas J. D. Wright . A review of the direct targets of the cannabinoids cannabidiol, Δ9-tetrahydrocannabinol, N-arachidonoylethanolamine and 2-arachidonoylglycerol. AIMS Neuroscience, 2024, 11(2): 144-165. doi: 10.3934/Neuroscience.2024009 |
[7] | Dai Mitsushima . Contextual Learning Requires Functional Diversity at Excitatory and Inhibitory Synapses onto CA1 Pyramidal Neurons. AIMS Neuroscience, 2015, 2(1): 7-17. doi: 10.3934/Neuroscience.2015.1.7 |
[8] | Tevzadze Gigi, Zhuravliova Elene, Barbakadze Tamar, Shanshiashvili Lali, Dzneladze Davit, Nanobashvili Zaqaria, Lordkipanidze Tamar, Mikeladze David . Gut neurotoxin p-cresol induces differential expression of GLUN2B and GLUN2A subunits of the NMDA receptor in the hippocampus and nucleus accumbens in healthy and audiogenic seizure-prone rats. AIMS Neuroscience, 2020, 7(1): 30-42. doi: 10.3934/Neuroscience.2020003 |
[9] | Thomas Papikinos, Marios Krokidis, Aris Vrahatis, Panagiotis Vlamos, Themis P. Exarchos . Drug repurposing for obsessive-compulsive disorder using deep learning-based binding affinity prediction models. AIMS Neuroscience, 2024, 11(2): 203-211. doi: 10.3934/Neuroscience.2024013 |
[10] | Valentina Bashkatova, Athineos Philippu . Role of nitric oxide in psychostimulant-induced neurotoxicity. AIMS Neuroscience, 2019, 6(3): 191-203. doi: 10.3934/Neuroscience.2019.3.191 |
We launch a systematic study of the refined Wilf-equivalences by the statistics
● Bijective proofs of the symmetry of the joint distribution
● A complete classification of
● A further refinement of Wang's descent-double descent-Wilf equivalence between separable permutations and
Clustering stands as a foundational research area in the realm of computer vision and machine learning. Over decades of exploration, traditional clustering methods have achieved a certain level of maturity, yielding promising results. However, when confronted with complex signals (e.g., images and videos), clustering methods reliant on Euclidean distance measure often yield unsatisfactory performance. Although the data observed in practical applications is complicated, the clusters to which the samples belong are generally located near the corresponding low-dimensional space. Therefore, some researchers have turned to subspace clustering to characterize the geometrical structure of the original data manifold. These algorithms transform each data point into multi-dimensional subspaces, fitting the data point to which it belongs through the low-dimensional subspace clusters to achieve the final clustering prediction. At present, subspace clustering methods catering to implicit subspace structures include iterative methods [1,2], algebraic methods [3,4], statistical methods [5,6], and self-representation methods [7,8,9]. Among them, self-representation has received widespread attention due to their effective utilization of the latent subspace structure and attributes of the data. Notable representatives of such an approach include sparse subspace clustering (SSC) [7] and low-rank representation (LRR) [9]. These methods typically contain two steps: (1) exploring the data structure by using Euclidean distance measure and obtaining a coefficient matrix; (2) leveraging the learned coefficient matrix to conduct spectral clustering [10]. Then, the data is assigned to k clusters.
Nevertheless, previous studies reveal that image sets or video data often inhabit nonlinear manifold structure [11,12]. Therefore, the Euclidean distance cannot accurately measure the similarity between any two data points residing on a non-Euclidean space. Wei et al. [13] proposed a discrete metric learning method for fast image set classification, which significantly improves classification efficiency and accuracy by learning metrics and hashing techniques on the Riemannian manifold. Furthermore, video data commonly feature varying frames per video clip, leading to exaggerated data dimensions upon the vectorization of video frames. These challenges pose difficulties for original clustering algorithms such as SSC or LRR.
Deep learning methods emerge as capable learners of discriminative feature representations [14]. A number of studies propose combining metric learning and deep learning for classification [14,15]. Recently, deep learning techniques have started to be used in the scenario of data clustering and have shown remarkable performance [16,17,18]. To name a few, Xie et al. [19] proposed a clustering method based on deep embedding, where a deep neural network learns feature representations and clustering assignments concurrently. Wang et al. [20] proposed a multi-level representation learning method for incomplete multiview clustering, which incorporates contrastive and adversarial regularization to improve clustering performance. Yang et al. [21] proposed a novel graph contrastive learning framework for clustering multi-layer networks, effectively combining nonnegative matrix factorization and contrastive learning to enhance the discriminative features of vertices across different network layers. Guo et al. [22] proposed an adaptive multiview subspace learning method using distributed optimization, which enhances clustering performance by capturing high-order correlations among views. Li et al. [23] introduced a deep adversarial MVC method that explores the intrinsic structure embedded in multiview data. Ma et al. [24] suggested a deep generative clustering model based on variational autoencoders, being able to yield a more appropriate embedding subspace for clustering with respect to complex data distributions. These methods address challenges faced by traditional data modeling (e.g., preprocessing high-dimensional data and characterizing nonlinear relationships) by nonlinearly mapping data points into a latent space through a series of encoder layers. However, existing deep clustering algorithms have an inherent limitation, i.e., they utilize Euclidean computations to analyze the semantic features generated by convolutional neural networks, which leads to unfaithful data representation. The fundamental reason is that these features inherently have a submanifold structure [25,26]. While techniques such as fine-tuning, optimization, and feature activation in deep neural networks can affect the experimental results, manifold learning provides a theoretical basis for the analysis of non-Euclidean data structures.
Recognizing the nonlinear structure of high-dimensional data, some studies [11,27,28,29,30,31] extended the traditional clustering paradigm to the context of Riemannian manifolds. Typically, a manifold represents a smooth surface embedded in Euclidean space [32]. In image analysis, covariance matrices often serve as region descriptors, with these matrices regarded as points on the symmetric positive definite (SPD) manifold. Hu et al. [12] proposed a multi-geometry SSC method, aiming to mine complementary geometric representations. Wei et al. [33] proposed a method called sparse representation classifier guided Grassmann reconstruction metric learning (GRML), which enhances image set classification by learning a robust metric on the Grassmann manifold, making it effective in handling noisy data. Linear subspaces have attracted widespread attention in many scientific fields, as their underlying space is a Grassmannian manifold, which provides a solid theoretical foundation for the characterization of signal data (e.g., video clips, image sets, and point clouds). In addition, the Grassmannian manifold plays a crucial role in handling non-Euclidean data structures, particularly through its compact manifold structure. Wang et al. [34] extended the ideology of LRR to the context of the Grassmannian manifold, facilitating more accurate representation and analysis in complicated data scenarios. Piao et al. [35] proposed a double-kernel norm low-rank representation based on the Grassmannian manifold for clustering. However, the issues such as integrating manifold representation learning with clustering and preserving subspace structure in the data tramsformation process are still challenging for such a framework.
It can be concluded that deep learning-based clustering struggles to learn effective representations from the data with a submanifold structure, and the underlying space of subspace features is a Grassmannian manifold. This motivates us to propose the framework of DGMVCL to achieve a more reasonable view-invariant representation on a non-Euclidean space. The proposed framework comprises four main components: the FEM, MMM, Grassmannian manifold learning module (GMLM), and the contrastive leaning module (CLM). Specifically, the FEM is responsible for mapping the original data into a feature subspace, the MMM is designed for the projection of subspace features onto a Grassmannian manifold, the GMLM is used to facilitate deep subspace learning on the Grassmannian manifold, and the CLM is focused on discriminative cluster assignments through contrastive learning. Additionally, the positive and negative samples relied upon in the contrastive learning process are constructed on the basis of Riemannian distance on the Grassmannian manifold, which can better reflect the geometric distribution of the data. Extensive experiments across five benchmarking datasets validate the effectiveness of the proposed DGMVCL.
Our main contributions are summarized as follows:
● A lightweight geometric learning model build upon the Grassmannian manifold is proposed for multiview subspace clustering in an end-to-end manner.
● A Grassmannian-level contrastive learning strategy is suggested to help improve the accuracy of cluster assignments among multiple views.
● Extensive experiments conducted on five multiview datasets demonstrate the effectiveness of the proposed DGMVCL.
In this section, we will give a brief introduction to some related works, including multiview subspace clustering, contrastive learning, and the Riemannian geometry of Grassmannian manifold.
Although there exists a number of subspace clustering methods, such as LRR [9] and SSC [7], most of them adopt self-representation to obtain subspace features. Low-rank tensor-constrained multiview subspace clustering (LMSC) [36] is able to generate a common subspace for different views instead of individual representations. Flexible multiview representation learning for subspace clustering (FMR) [37] avoids using partial information for data reconstruction. Zhou et al. [38] proposed an end-to-end adversarial attention network to align latent feature distributions and assess the importance of different modalities. Despite the improved clustering performance, these methods do not consider the semantic label consistency across multiple views, potentially resulting in challenges when learning consistent clustering assignments.
To address these challenges, Kang et al. [39] proposed a structured graph learning method that constructs a bipartite graph to manage the relationships between samples and anchor points, effectively reducing computational complexity in large-scale data and enabling support for out-of-sample extensions. This approach is particularly advantageous in scalable subspace clustering scenarios, extending from single-view to multiview settings.
Furthermore, to enhance clustering performance by leveraging high-order structural information, Pan and Kang [40] introduced a high-order multiview clustering (HMvC) method. This approach utilizes graph filtering and an adaptive graph fusion mechanism to explore long-distance interactions between different views. By capturing high-order neighborhood relationships, HMvC effectively improves clustering results on both graph and non-graph data, addressing some of the limitations in prior methods that overlooked the intrinsic high-order information from data attributes.
Contrastive learning has made significant progress in self-supervised learning [41,42,43,44]. The methods based on contrastive learning essentially rely on a large number of pairwise feature comparisons. Specifically, they aim to maximize the similarity between positive samples in the latent feature space while minimizing the similarity between negative samples simultaneously. In the field of clustering, positive samples are constructed from the invariant representations of all multiview instances of the same sample, while negative samples are obtained by pairing representations from different samples across various views. Chen et al. [42] proposed a contrastive learning framework that maximizes consistency between differently augmented views of the same example in the latent feature space. Wang et al. [44] investigated two key properties of the contrastive loss, namely feature alignment from positive samples and uniformity of induced feature distribution on the hypersphere, which can be used to measure the quality of generated samples. Although these methods can learn robust representations based on data augmentation, learning invariant representations across multiple views remains challenging.
The Grassmannian manifold G(q,d) comprises a collection of q-dimensional linear subspaces within Rd. Each of them can be naturally represented by an orthonormal basis denoted as Y, with the size of d×q (YTY=Iq, where Iq is a q×q identity matrix). Consequently, the matrix representation of each Grassmannian element is comprised of an equivalence class of this orthonormal basis:
[Y]={˜Y∣˜Y=YO,O∈O(q)}, | (2.1) |
where Y represents a d×q column-wise orthonormal matrix. The definition in Eq (2.1) is commonly referred to as the orthonormal basis (ONB) viewpoint [32].
As demonstrated in [32], each point of the Grassmannian manifold can be alternatively represented as an idempotent symmetric matrix of rank q, given by Φ(Y)=YYT. This representation, known as the projector perspective (PP), signifies that the Grassmannian manifold is a submanifold of the Euclidean space of symmetric matrices. Therefore, an extrinsic distance can be induced by the ambient Euclidean space, termed the projection metric (PM) [49]. The PM is defined as:
dPM(Y1,Y2)=2−1/2‖Y1YT1−Y2YT2‖F, | (2.2) |
where ‖⋅‖F denotes the Frobenius norm. As evidenced in [45], the distance computed by the PM deviates from the true geodesic distance on the Grassmannian manifold by a scale factor of √2, making it a widely used Grassmannian distance.
We are given a group of multiview data X = Xv∈Rdv×Nv}Vv=1, where V denotes the number of views, N_v signifies the number of instances contained in the v-th view, and d_v represents the dimensionality of each instance in Xv. The proposed DGMVCL is an end-to-end neural network built upon the Grassmannian manifold, aiming to generate clustering semantic labels from the original instances across multiple views. As shown in Figure 1, the proposed framework mainly consists of four modules, which are the FEM, MMM, GMLM, and CLM, respectively. The following is a detailed introduction to them.
To obtain an effective semantic space for the subsequent computations, we exploit a CNN network to transform the input images into subspace representations with lower redundancy. The proposed FEM contains two blocks, each of which comprises a convolutional layer, a ReLU activation layer, and a max-pooling layer, respectively. The difference of the two blocks lies in the number of convolutional kernels involved, i.e., 32 for the first convolutional layer and 64 for the second one. To characterize the geometric structure of the generated features, the following manifold modeling module is designed.
Let Evi∈Rc×l be the i-th (i∈{1,2,...,Nv}) feature matrix generated by FEM with respect to the i-th input instance of the v-th view. Here, c represents the number of channels while l indicates the length of a vectorized feature map. Since each point of G(q,d) represents a q-dimentional linear subspace of the d-dimentional vector space Rd (see Section 2.3), the Grassmannian manifold thus becomes a reasonable and efficient tool for parametrizing the q-dimentional real vector subspace embedded in Evi [49,51].
Cov Layer: To capture complementary statistical information embodied in different channel features, a similarity matrix is computed for each Evi:
˜Evi=Evi(Evi)T. | (3.1) |
Orth Layer: Following the Cov layer, the SVD operation is applied to obtain a q-dimensional linear subspace spanned by an orthonormal matrix Yvi∈Rc×q, that is ˜Evi≃YviΣvi(Yvi)T, wherein, Yvi and Σvi are two matrices consisting of q leading eigenvalues and the corresponding eigenvectors, respectively. Now, the resulting Grassmannian representation with respect to the input Xv is denoted by Υv=[Yv1,Yv2,...,YvNv].
An overview of the designed GMLM is shown in Figure 1. We can note that the input to this module is a series of orthonormal matrices. For simplicity, we abbreviate Yvi as Yi in the following. Besides, since Yi∈G(q,c), we replace the symbol c with the symbol d. To implement deep subspace learning, the following three basic layers are designed.
FRMap Layer: This layer transforms the input orthogonal matrices into new ones through a linear mapping function ffr:
Yi,k=f(k)fr(Yi,k−1;Wk)=WkYi,k−1, | (3.2) |
where Yi,k−1∈G(q,dk−1) is the input data of the k-th layer, Wk∈Rdk×dk−1 (dk<dk−1) is the to-be-learned transformation matrix (connection weights), essentially required to be a row full-rank matrix, and Yi,k∈Rdk×q is the resulting matrix. Due to the fact that the weight matrices lie in a non-compact Stiefel manifold and the geodesic distance has no upper bound [46,47], direct optimization on such a manifold is unfeasible. To address this challenge, we follow [46] to impose an orthogonality constraint on each weight matrix Wk. As a consequence, the weight space Rdk×dk−1∗ becomes a compact Stiefel manifold St(dk,dk−1) [48], facilitating better optimization.
ReOrth Layer: Inspired by [46], we design the ReOrth layer for the sake of preventing the output matrices of the FRMap layer from degeneracy. Specifically, we first impose QR decomposition on the input matrix Yi,k−1:
Yi,k−1=Qi,k−1Ri,k−1, | (3.3) |
where Qi,k−1∈Rdk−1×q is an orthogonal matrix, and Ri,k−1∈Rq×q is an invertible upper triangular matrix. Therefore, Yi,k−1 can be normalized into an orthonormal basis matrix via the following transformation function f(k)ro:
Yi,k=f(k)ro(Yi,k−1)=Yi,k−1R−1i,k−1=Qi,k−1. | (3.4) |
ProjMap Layer: As studied in [47,49,50,51,52], the PM is one of the most popular Grassmannian distance measures, providing a specific inner product structure to a concrete Riemannian manifold. In such a case, the original Grassmannian manifold reduces to a flat space, in which the Euclidean computations can be generalized to the projection domain of orthogonal matrices. Formally, by applying the projection operator [47] to the orthogonal matrix Yi,k−1 of the k-th layer, the designed ProjMap layer can be formulated as:
Yi,k=f(k)pm(Yi,k−1)=Yi,k−1YTi,k−1. | (3.5) |
Subsequently, we embed a contrastive learning module at the end of the GMLM to enhance the discriminatory power of the learned features.
For simplicity, we assume that nx=1. In this case, the output data representation with respect to Υv is denoted by ˜Υv=[Yv1,3,Yv2,3,...,YvNv,3]. Then, the projection metric, defined in Eq (2.2), is applied to compute the geodesic distance between any two data points, enabling the execution of K-means clustering within each view. At the same time, we can obtain the membership degree tij, computed as follows:
tij=1d2PM(Yvi,3,Uj)∑Kr=11d2PM(Yvi,3,Ur), | (3.6) |
where K represents the total number of clusters, and Uj denotes the j-th (j∈{1,2,...,K}) cluster. To enhance the discriminability of the global soft assignments, we consider a unified target distribution probability Pv∈RNv×K, which is formulated as [53]:
pvij=(t2ij/∑Nvitij)∑Kj(t2ij/∑Nvitij), | (3.7) |
where each pvij represents the soft cluster assignment of the i-th sample to the j-th cluster. Therefore, pvj represents the cluster assignments of the same semantic cluster.
The similarity between the two cluster assignments pv1j and pv2j for cluster j is measured by the following cosine similarity [54]:
s(pv1j,pv2j)=pv1j⋅pv2j‖pv1j‖⋅‖pv2j‖, | (3.8) |
where v1 and v2 represent two different views. Since these instances belong to the same labels in each view, the cluster assignment probabilities of instances from different views should be similar. Moreover, instances from multiple views are independent of each other for different samples. Therefore, for V views with K clusters, there are (V−1) positive cluster assignment pairs and V(K−1) negative cluster assignment pairs.
The goal of CLM is to maximize the similarity between cluster assignments within clusters and minimize the similarity between cluster assignments across clusters. Inspired by [55], the cross-view contrastive loss between pv1k and pv2k is defined as follows:
l(v1,v2)=−1KK∑k=1loges(pv1k,pv2k)/τ∑Kj=1,j≠kes(pv1j,pv1k)/τ+∑Kj=1es(pv1j,pv2k)/τ, | (3.9) |
where τ is the temperature parameter, (pv1k,pv2k) is a positive cluster assignment pair between views v1 and v2, and (pv1j,pv1k) (j≠k), (pv1j,pv2k) are negative cluster assignment pairs in the views of v1 and v2, respectively. The introduced cross-view contrastive loss across multiple views is given below:
Lg=12V∑v1=1V∑v2=1,v2≠v1l(v1,v2). | (3.10) |
The cross-view contrastive loss explicitly pulls together cluster assignments within the same cluster and pushes apart cluster assignment pairs from different clusters. This inspiration comes from the recently proposed contrastive learning paradigm, which is applied to semantic labels to explore consistent information across multiple views.
Additionally, to ensure consistency between cluster labels on the Grassmannian manifold and Euclidean space, as shown in Figure 1, we append two linear layers and a softmax function to the tail of the ProjMap layer to generate a probability matrix Qv∈RNv×K for cluster assignments. Let qvi be the i-th row in Qv, and qvij represents the probability that the i-th instance belongs to the j-th cluster of the v-th view. The semantic label of the i-th instance is determined by the maximum value in qvi. Following similar steps as processing Lg, we get the contrastive loss Lc for different views in Euclidean space:
Lc=−12KV∑v1=1V∑v2=1,v2≠v1K∑k=1loges(qv1k,qv2k)/τ∑Kj=1,j≠kes(qv1j,qv1k)/τ+∑Kj=1es(qv1j,qv2k)/τ. | (3.11) |
Inspired by [53], we introduce the following regularization term to prevent all the instances from being assigned to a specific cluster:
La=V∑v=1K∑j=1hvjloghvj, | (3.12) |
where hvj=1Nv∑Nvi=1qvij. This term is considered as the cross-view consistency loss in DGMVCL.
The objective function of the proposed method comprises three primary components: the Grassmannian contrastive loss, the Euclidean contrastive loss, and the cross-view consistency loss, given below:
Lobj=αLg+βLc+γLa, | (3.13) |
where α, β, and γ are three trade-off parameters.
The goal of Lobj is to learn common semantic labels from feature representations in multiple views. Let pvi be the i-th row of Pv, and pvij represents the j-th element of pvi. Specifically, pvi is a K-dimensional soft assignment probability, where ∑Kj=1pvij=1. Once the training process of the proposed network is completed, the semantic label of sample i (i∈1,2,...,Nv) can be predicted by:
yi=argmaxj(1VV∑v=1pvij). | (3.14) |
For the proposed GMLM, the composition of a series of successive functions f=f(ρ)∘f(ρ−1)∘f(ρ−2)∘...∘f(2)∘f(1) with W={Wρ,Wρ−1,...,W1} is the parameter tuple which can be viewed as the data embedding model, which satisfies the properties of metric space. Here, f(k) and Wk are, respectively, the operation function and weight parameter of the k-th layer, and ρ denotes the number of layers contained in GMLM. The loss of the k-th layer can be signified as: L(k)=ℓ∘f(ρ)∘...∘f(k), where ℓ is actually the Lobj.
Due to the fact that the weight space of the FRMap layer is a compact Stiefel manifold St(dk,dk−1), we refer to the method studied in [46] to realize parameter optimization by generalizing the traditional stochastic gradient descent (SGD) settings to the context of Stiefel manifolds. The updating rule for Wk is given below:
According to Eq (3.2), we can have that: Yk=f(k)(Wk,Yk−1)=WkYk−1. Then, the following variation of Yk can be derived:
dYk=dWkYk−1+WkdYk−1. | (3.15) |
Based on the invariance of the first-order differential, the following chain rule can be deduced:
∂L(k+1)∂Yk:dYk=∂L(k)∂Wk:dWk+∂L(k)∂Yk−1:dYk−1. | (3.16) |
By replacing the left-hand side of Eq (3.16) with Eq (3.15) and exploiting the matrix inner product ":" property, the following two formulas can be derived:
∂L(k+1)∂Yk:dWkYk−1=∂L(k+1)∂YkYTk−1:dWk, | (3.17) |
∂L(k+1)∂Yk:WkdYk−1=WTk∂L(k+1)∂Yk:dYk−1. | (3.18) |
Combining Eqs (3.16)–(3.18), the partial derivatives of L(k) with respect to Wk and Yk−1 can be computed by:
∂L(k)∂Wk=∂L(k+1)∂YkYTk−1,∂L(k)∂Yk−1=WTk∂L(k+1)∂Yk. | (3.19) |
At this time, the updating criterion of Wk on the Stiefel manifold is given below:
Wt+1k=RWtk(−ηΠWtk(∇L(k)Wtk)), | (3.20) |
where R signifies the retraction operation used to map the optimized parameter back onto the Stiefel manifold, η is the learning rate, and Π represents the projection operator used to convert the Euclidean gradient into the corresponding Riemannian counterpart:
˜∇L(k)Wtk=ΠWtk(∇L(k)Wtk)=∇L(k)Wtk−Wtk(∇L(k)Wtk)TWtk, | (3.21) |
where ∇L(k)Wtk is the Euclidean gradient, computed by the first term of Eq (3.19), and ˜∇L(k)Wtk denotes the Riemannian gradient. After that, the weight parameter can be updated by: Wt+1k=R(Wtk−η˜∇L(k)Wtk). For detailed information about the Riemannian geometry of a Stiefel manifold and its associated retraction operation, please kindly refer to [48].
In this section, we conduct experiments on five benchmarking datasets to evaluate the performance of the proposed DGMVCL. All the experiments are run on a Linux workstation with a GeForce RTX 4070 GPU (12 GB caches).
The proposed DGMVCL is evaluated on five publicly available multiview datasets. The MNIST-USPS [56] dataset contains 5000 samples with two different styles of digital images. The Multi-COIL-10 dataset [57] is comprised of 720 grayscale images collected from 10 categories with the image size of 32 × 32, where different views represent different object poses. The Fashion dataset [58] is made up of 10,000 images belonging to 10 categories, where three different styles of an object are regarded as its three different views, i.e., front view, side view, and back view, respectively. The ORL dataset [59] consists of 400 face images collected from 40 volunteers, with each volunteer providing 10 images under different expressions and lighting conditions. The Scene-15 [60] dataset contains 4485 scene images belonging to 15 categories.
We evaluate the clustering performance using the following three metrics, i.e., clustering accuracy (ACC), normalized mutual information (NMI), and purity. Here, ACC is the ratio of correctly classified samples to the total number of samples, NMI is an indicator to measure the consistency between the clustering result and the true class distribution, and purity refers to the ratio of the number of samples in the largest cluster to the total number of samples, indicating whether each cluster contains samples belonging to the same class.
To validate the effectiveness of the proposed method, we compare DGMVCL with several state-of-the-art methods, including augmented sparse representation (ASR) [41], deep save IMVC (DSIMVC) [63], dual contrastive prediction (DCP) [64], multi-level feature learning (MFL) [65], and cross-view contrastive learning (CVCL) [55]. For DCP, we report the best clustering results obtained for each pair of individual views in each dataset. For better comparison, we include two additional baselines. Specifically, we first apply spectral clustering [61] to each individual view and report the best clustering results obtained across multiple views, termed as best single view clustering (BSVC). Then, we utilize an adaptive neighborhood graph learning method [62] to generate a similarity matrix for each individual view. We aggregate all the similarity matrices into a new one for spectral clustering, denoted as SCAgg.
The clustering results of various algorithms obtained on the five used datasets are reported in Tables 1 and 2, respectively. The best results are highlighted in bold. It can be seen that the clustering performance of the contrastive learning-based methods, including DGMVCL, CVCL, MFL, and DCP, are superior to other competitors (e.g., BSVC and SCAgg) on the large-scale MNIST-USPS, Scene-15, and Fashion datasets. This is mainly attributed to the fact that the contrastive learning-based self-supervised learning mechanism is capable of learning more discriminative feature representations by maximizing the similarity between positive samples and minimizing the similarity between negative samples simultaneously. Furthermore, our proposed DGMVCL is the best performer on all the used datasets, demonstrating its effectiveness. To name a few, on the Scene-15 dataset, the DGMVCL achieves approximately 16.7%, 34.22%, and 17.68% improvements in ACC, NMI, and purity in comparison with the second-best CVCL method. The fundamental reason is that the proposed subspace-based geometric learning method can characterize and analyze the structural information of the input subspace features more faithfully. In addition, the introduced dual contrastive losses makes it possible to learn a more discriminative network embedding.
Datasets | Methods | ACC | NMI | Purity |
BSVC [61] | 67.98 | 74.43 | 72.34 | |
SCAgg [62] | 89.00 | 77.12 | 89.18 | |
ASR [41] | 97.90 | 94.72 | 97.90 | |
MNIST-USPS | DSIMVC [63] | 99.34 | 98.13 | 99.34 |
DCP [64] | 99.02 | 97.29 | 99.02 | |
MFL [65] | 99.66 | 99.01 | 99.66 | |
CVCL [55] | 99.58 | 98.79 | 99.58 | |
DGMVCL | 99.82 | 99.52 | 99.82 | |
BSVC [61] | 60.32 | 64.91 | 63.84 | |
SCAgg [62] | 98.00 | 94.80 | 97.56 | |
ASR [41] | 96.52 | 93.04 | 96.52 | |
Fashion | DSIMVC [63] | 88.21 | 83.99 | 88.21 |
DCP [64] | 89.37 | 88.61 | 89.37 | |
MFL [65] | 99.20 | 98.00 | 99.20 | |
CVCL [55] | 99.31 | 98.21 | 99.31 | |
DGMVCL | 99.52 | 98.73 | 99.52 | |
BSVC [61] | 73.32 | 76.91 | 74.11 | |
SCAgg [62] | 68.34 | 70.18 | 69.26 | |
ASR [41] | 84.23 | 65.47 | 84.23 | |
Multi-COIL-10 | DSIMVC [63] | 99.38 | 98.85 | 99.38 |
DCP [64] | 70.14 | 81.9 | 70.14 | |
MFL [65] | 99.20 | 98.00 | 99.20 | |
CVCL [55] | 99.43 | 99.04 | 99.43 | |
DGMVCL | 100.00 | 100.00 | 100.00 |
Datasets | Methods | ACC | NMI | Purity |
BSVC | 61.31 | 64.91 | 61.31 | |
SCAgg [62] | 61.65 | 77.41 | 66.22 | |
ASR [41] | 79.49 | 78.04 | 81.49 | |
ORL | DSIMVC [63] | 25.37 | 52.91 | 25.37 |
DCP [64] | 27.70 | 49.93 | 27.70 | |
MFL [65] | 80.03 | 89.34 | 80.03 | |
CVCL [55] | 85.50 | 93.17 | 86.00 | |
DGMVCL | 92.25 | 98.34 | 92.25 | |
BSVC | 38.05 | 38.85 | 42.08 | |
SCAgg [62] | 38.13 | 39.31 | 44.76 | |
ASR [41] | 42.70 | 40.70 | 45.60 | |
Scene-15 | DSIMVC [63] | 28.27 | 29.04 | 29.79 |
DCP [64] | 42.32 | 40.38 | 43.85 | |
MFL [65] | 42.52 | 40.34 | 44.53 | |
CVCL [55] | 44.59 | 42.17 | 47.36 | |
DGMVCL | 61.29 | 76.39 | 65.04 |
Objective Function: To verify the impact of each loss function in Eq (3.13) on the performance of the proposed method, we conduct ablation experiments on the MNIST-USPS, ORL, and Fashion datasets as three examples. Table 3 shows the experimental results of our model under different combinations of loss functions. It can be seen that under type D, that is, all the loss functions are used at the same time, our method achieves the best performance. However, when La is removed (type A), the clustering performance of DGMVCL decreases on the three used datasets. For the MVC task, similar samples usually exhibit a large intra-data diversity, while dissimilar samples usually demonstrate a large inter-data correlation. This makes it impossible for a single Lc to effectively learn invariant representations from such data variations. Therefore, La is introduced to prevent instances from being assigned to a particular cluster, and the experimental results confirm its effectiveness. From Table 3, we can also conclude that the introduced Grassmannian contrastive loss Lg is beneficial for ameliorating the discriminability of the learned geometric features. Another interesting observation from Table 3 is that the clustering performance of type C (Lc is removed) is visibly lower than type D in terms of ACC, NMI, and purity. The fundamental reason is that Lc is the most critical loss function, as it is not only used to train the proposed network, but also participates in the testing phase. This indicates that Lc is crucial for distinguishing positive and negative samples, i.e., maintaining the cross-view consistency, and plays a pivotal role in the overall clustering task. Since Lg and La act as two regularization terms to proivide additional discriminative information for Lc and are mainly confined to the training phase, neither La nor Lg can be used alone. All in all, the aforementioned experimental findings demonstrate that each term in Eq (3.13) is useful.
Loss | MNIST-USPS | ORL | Fashion | |||||||||
Types | Lg | Lc | La | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
A | ✔ | ✔ | 19.96 | 42.62 | 19.96 | 69.00 | 92.67 | 70.00 | 84.78 | 93.13 | 89.38 | |
B | ✔ | ✔ | 99.76 | 99.37 | 99.76 | 85.00 | 86.58 | 85.00 | 99.48 | 98.60 | 99.48 | |
C | ✔ | ✔ | 10.31 | 12.54 | 10.31 | 12.50 | 26.69 | 12.50 | 25.75 | 38.72 | 25.75 | |
D | ✔ | ✔ | ✔ | 99.82 | 99.52 | 99.82 | 92.25 | 98.34 | 92.25 | 99.52 | 98.73 | 99.52 |
E | ✔ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | ||
F | ✔ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
The Components of GMLM: To verify the significance of each operation layer defined in GMLM, we make ablation experiments on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets as three examples. The clustering results of different subarchitectures are listed in Table 4. It is evident that group A (served as the reference group, with only the loss function Lg removed) is superior to group D in terms of ACC, NMI, and purity. Wherein, group D is generated by removing the ProjMap layer from group A. This demonstrates the necessity of Riemannian computation in preserving the geometric structure of the original data manifold. From Table 4, we can also find that the performance of group C (obtained by removing the layers of FRMap, ReOrth, and ProjMap from group A) is significantly inferior to that of group D on the Fashion and Multi-COIL-10 datasets, suggesting that deep subspace learning is able to improve the effectiveness of the features. Another interesting observation from Table 4 is that after expurgating the ReOrth layer from group A (this becomes group B), the clustering results are reduced to a certain extent on the three used datasets. The basic reason is that the Grassmannian properties, e.g., orthogonality, of the input feature matrices cannot be maintained, resulting in imprecise Riemannian distances computed in Lg. All in all, the experimental results mentioned above confirm the usefulness of each part in GMLM.
MNIST-USPS | Fashion | Multi-COIL-10 | |||||||
Groups | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
A | 99.76 | 99.37 | 99.76 | 99.48 | 98.60 | 99.48 | 100.00 | 100.00 | 100.00 |
B | 99.70 | 99.11 | 99.70 | 99.02 | 97.60 | 99.02 | 99.14 | 98.71 | 99.14 |
C | 98.00 | 97.00 | 97.00 | 81.25 | 86.24 | 81.40 | 31.70 | 31.00 | 32.00 |
D | 89.92 | 94.51 | 89.92 | 98.49 | 96.52 | 98.49 | 99.00 | 97.31 | 99.01 |
Network Depth: Inspired by the experimental results presented in Table 4, we carry out ablation experiments on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets as three examples to study the impact of nx (the number of blocks in GMLM) on the model performance. According to Table 5, we can find that nx=1 results in the best clustering results. However, the increase in network depth leads to a slight decrease in clustering performance on the three used datasets. It needs to be emphasized that the sizes of the transformation matrices are set to (49×25, 25×20) and (49×25, 25×20, 20×15) under the two settings of nx=2 and nx=3, respectively. Therefore, the lose of pivotal structural information in the process of multi-stage deep subspace learning is considered to be the fundamental reason for the degradation of model ability. In summary, these experimental findings not only reaffirm the effectiveness of the designed GMLM for subspace learning again, but also reveal the degradation issue of Grassmannian networks. In the future, we plan to generalize the Euclidean residual learning mechanism to the context of Grassmannian manifolds to mitigate the above problem.
MNIST-USPS | Fashion | Multi-COIL-10 | |||||||
Metrics | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
nx=1 | 99.82 | 99.52 | 99.82 | 99.58 | 98.83 | 99.58 | 100.00 | 100.00 | 100.00 |
nx=2 | 99.68 | 99.08 | 99.68 | 99.39 | 98.37 | 99.39 | 98.14 | 96.71 | 98.14 |
nx=3 | 99.58 | 98.73 | 99.58 | 99.53 | 98.71 | 99.53 | 99.14 | 98.13 | 99.14 |
To measure the impact of the trade-off parameters (i.e., α, β, and γ) in Eq (3.13) on the clustering performance of the proposed method, we make experiments on the ORL and MNIST-USPS datasets as two examples. The purpose of introducing these three trade-off parameters to Eq (3.13) is to balance the magnitude of the Grassmannian contrastive learning term Lg, Euclidean contrastive learning term Lc, and regularization term La, so as to learning an effective network embedding for clustering. From Figure 2, we have some interesting observations. First, it is not recommended to endow β with a relatively small value. The basic reason is that Lc is not only used to learn discriminative features, but also for the final label prediction. What is more, we can observe that when the value of β is fixed, the clustering accuracy of the proposed method exhibits a trend of first increasing and then decreasing with the change of γ. In addition, it can be found that the value of γ cannot be greater than β. Otherwise, the performance of our method will be significantly affected. The fundamental reason is that a larger γ or a smaller β will cause the gradient information related to Lc to be dramatically weakened in the network optimization process, which is not conducive to learning an effective clustering hypersphere. Besides, we can also observe that the model tends to be less sensitive to β and γ when they vary within the ranges of 0.05∼0.005 and 0.01∼0.001, respectively. These experimental comparisons support our assertion that the regularization term helps to fine-tune the clustering performance.
In this part, we further investigate the impact of α on the accuracy of the proposed method. We take the MNIST-USPS dataset as an example to conduct experiments, while changing the value of α, and we fix β and γ at their optimal values as shown in Figure 2.
It can be seen from Figure 3 that the accuracy of the proposed method generally shows a trend of first increasing and then decreasing, and reaches a peak when α is set to 0.02. Besides, α is not suggested to be endowed with a relatively large value. Otherwise, the gradient information associated with Lc will be weakened in the network optimization process, which is not conducive to the learning of a discriminative network embedding. These experimental results not only further demonstrate the criticality of Lc, but also underscore the effectiveness of α in fine-tuning the discriminability of the learned representations.
All in all, these experimental observations confirm the complementarity of these three terms in guiding the proposed model to learn more informative features for better clustering. In this paper, our guideline for choosing their values is to ensure that the order of magnitude of the Grassmannian contrastive learning term Lg and the regularization term La do not exceed that of the Euclidean contrastive learning term Lc. With this criterion, the model can better integrate the gradient information of La and Lg regarding the data distribution with Lc to learn a more reasonable hypersphere for different views. On the MNIST-USPS dataset, the eligible values of α, β, and γ are set to 0.02, 0.05, and 0.005, respectively. We use a similar way to determine that their appropriate values are respectively configured as (0.005, 0.005, 0.005), (0.01, 0.01, 0.01), (0.01, 0.01, 0.005), and (0.1, 0.1, 0.1) on the Fashion, Multi-COIL-10, ORL, and Scene-15 datasets. For a new dataset, the aforementioned principle can help the readers quickly determine the initial value ranges of α, β, and γ.
To intuitively test the effectiveness of the proposed method, we select the Fashion and Multi-COIL-10 datasets as two examples to perform 2-D visualization experiments. The experimental results generated by the t-SNE technique [66] are presented in Figure 4, where different colors denote the labels of different clusters. It can be seen that compared to the case where GMLM is not included, the final clustering results, measured by the compactness between similar samples and the diversity between dissimilar samples, are improved by using GMLM on both of the two benchmarking datasets. This further demonstrates that the designed GMLM can help improve the discriminability of cluster assignments.
Besides, we further investigate the impact of GMLM on the computational burden of the proposed method. The experimental results are summarized in Table 6, where "w/o" means "without containing". Note that the parameters of the new architecture are kept as the originals.
Datasets | MNIST-USPS | Fashion | Multi-COIL-10 | ORL | Scene-15 |
DGMVCL-"w/o" GMLM | 11.9 | 34.4 | 2.6 | 1.5 | 14.5 |
DGMVCL | 12.3 | 36.1 | 2.7 | 1.6 | 16.9 |
From Table 6, we can see that the integration of GMLM slightly increases the training time of the proposed model across all the five used datasets. According to Section 3, it can be known that the main computational burden of GMLM comes from the QR decomposition used in the ReOrth layer. Nevertheless, as shown in Figure 4, GMLM can enhance the clustering performance of our method, demonstrating its effectiveness.
Recent advancements in robust learning, such as projected cross-view learning for unbalanced incomplete multiview clustering [67], have highlighted the importance of handling noisy data and outlier instances. In this section, we evaluate the robustness of the proposed method when subjected to various levels of noise and occlusion, choosing CVCL [55] as a representative competitor. Figure 5 illustrates the accuracy (%) of different methods as a function of the variance of Gaussian noise added to the Scene-15 dataset. It can be seen that as the variance of the Gaussian noise increases, both the proposed method and CVCL experience a decline in accuracy. However, our method consistently outperforms CVCL across all the noise levels. This shows that the suggested model is robust to the Gaussian noise, maintaining good clustering ability even under severe noise levels.
We further investigate the robustness of the proposed method when handling the data with random pixel masks, selecting the Fashion dataset as an example. Some sample images of the Fashion dataset with 15%, 30%, 45%, and 60% of the pixels masked are shown in Figure 6. According to Table 7, we can see that the performance of the proposed method is superior to that of CVCL under all the corruption levels. These experimental findings again demonstrate that the suggested modifications over the baseline model are effective to learn more robust and discriminative view-invariant representations. However, as the occlusion rate and the level of Gaussian noise increase, the performance of the proposed DGMVCL shows a noticeable decline. As discussed in Section 3.1.2, each element in the orthogonal basis matrix on the Grassmannian manifold represents the correlation between the original feature dimensions. Although the ability to encode long-range dependencies between different local feature regions enables our model to capture more useful information, it is also susceptible to the influence of local prominent noise. In such a case, the contrastive loss may fail to effectively distinguish between positive and negative samples. This is mainly attributed to the fact that the contrastive learning term treats the decision space as an explicit function of the data distribution. In the future, incorporating techniques like those studied in [67], such as multiview projections or advanced data augmentation, could improve the model's ability to handle these challenges.
Methods | 15% | 30% | 45% | 60% |
CVCL | 98.49 | 97.23 | 95.06 | 89.48 |
DGMVCL | 99.16 | 98.29 | 97.28 | 92.70 |
Innovation Analysis: The novelty of our proposed DGMVCL lies not in the mere combination of several existing components but in the thoughtful and innovative way in which these components are integrated and optimized, leading to a lightweight and discriminative geometric learning framework for multiview subspace clustering. Specifically, the suggested DGMVCL introduces several pivotal innovations: ⅰ) The Grassmannian neural network, designed for geometric subspace learning, could not be treated as a simple attempt on a new vision application, but rather as an intrinsic method for encoding the underlying submanifold structure of channel features. This is crucial for enabling the model to learn more effective subspace features; ⅱ) The proposed method introduces contrastive learning in both Grassmannian manifolds and Euclidean space. Compared to the baseline model (CVCL [6]) that utilizes the Euclidean-based contrastive loss for network training (in this paper), the additional designed Grassmannian contrastive learning module enables our DGMVCL to characterize and learn the geometrical distribution of the subspace data points more faithfully. Therefore, such a dual-space contrastive learning mechanism is eligible to improve the representational capacity of our model and is capable of extracting view-invariant representations; ⅲ) Extensive evaluations across multiple benchmarking datasets not only demonstrate the superiority of our proposed DGMVCL over the state-of-the-art methods, but also underscores the significance of each individual component and their complementarity.
The Effectiveness of Grassmannian Representation: The Grassmannian manifold is a compact representation of the covariance matrix and encodes the vibrant subspace information, which has shown great success in many applications [11,27,28]. Inspired by this, the MMM is designed to capture and parameterize the q-dimensional real vector subspace formed by the features extracted from the FEM. However, the Grassmannian manifold is not a Euclidean space but a Riemannian manifold. We therefore adopt a Grassmannian network to respect the latent Riemannian geometry. Specifically, each network layer can preserve the Riemannian property of the input feature matrices by normalizing each one into an orthonormal basis matrix. Besides, each manifold-valued weight parameter of the FRMap layer is optimized on a compact Stiefel manifold, not only maintaining its orthogonality but also ensuring better network training. The projector perspective studied in [32] shows that the Grassmannian manifold is an embedded submanifold of the Euclidean space of symmetric matrices, allowing the use of an extrinsic distance, i.e., a projection metric (PM), for measuring the similarity between subspaces over the Grassmannian manifold. In addition, the PM can approximate the true geodesic distance up to a scale factor of √2 and is more efficient than the geodesic distance. By leveraging the PM-based contrastive loss, the consistency between cluster assignments across views will be intensified from the Grassmannian perspective while their underlying manifold structure is preserved.
To intuitively demonstrate the effectiveness of MMM, we conduct a new ablation study to evaluate the clustering ability of the proposed method that does not contain it. Note that the learning rate, batch size, and three trade-off parameters of the new model remain the same as the original, while the size of the input feature matrix of GMLM becomes 49×64. The experimental results on the five used datasets are summarized in Table 8, where "w/o" means "without containing". From Table 8, we can see that removing the designed MMM from DGMVCL leads to a significant decrease in its accuracy across all the five used datasets. This not only confirms the significance of MMM in capturing and parameterizing the underlying submanifold structure, but also reveals the effectiveness of our proposed model in preserving and leveraging the Riemannian geometry of the data for improved clustering performance.
Datasets | MNIST-USPS | Fashion | Multi-COIL-10 | ORL | Scene-15 |
DGMVCL-"w/o" MMM | 10.00 | 10.00 | 10.29 | 2.60 | 9.14 |
DGMVCL | 99.82 | 99.52 | 100.00 | 92.25 | 61.29 |
Selection of Network Parameters: The selection of network parameters is based on experiments and analysis to ensure optimal outcomes. Specifically, the trade-off parameters α, β, and γ play a critical role in balancing the contributions of different loss functions in the overall objective function. The magnitude of the pivotal loss functions should be slightly higher. Based on this guideline, we can roughly determine their initial value, signified as the anchor point. Then, a candidate set can be formed around the selected anchor point. After that, we can conduct experiments to determine their optimal values. Additionally, the learning rate and batch size are crucial for the convergence and effectiveness of the proposed model. A too-high learning rate might cause the model to diverge, while a too-low one would slow down the training process [68]. Since CVCL [55] is our base model, we treat its learning rate as the initial value and adjust around it to find the suitable one. The batch size is configured to balance the memory usage and training efficiency. For the Scene-15 dataset, a batch size of 69 was chosen because this dataset contains 4485 samples, and 69 divides this number evenly, ensuring efficient utilization of data in each batch. Additionally, as shown in Figure 7, this batch size can yield good performance. However, in practice, the batch size is also related to the computing device. On the MNIST-USPS, Multi-COIL-10, Fashion, ORL, and Scene-15 datasets, the learning rate and batch size are specifically configured as (0.0002, 50), (0.0001, 50), (0.0005,100), (0.0001, 50), and (0.001, 69), respectively. Furthermore, the experimental results presented in Figure 8 suggest that it is appropriate to configure the size of the transformation matrix in the FRMap layer as 49×25. When dk is assigned to a small value, some useful geometric information will be lost during feature transformation mapping. In contrast, a relatively larger dk results in more redundant information being incorporated into the generated subspace features. Both of these two cases have a negative impact on the model performance.
In short, the choice of network parameters are supported by both theoretical considerations and empirical evidence, and they contribute to the overall performance of the proposed model.
Other Datasets: In this part, the Caltech-101 dataset [39] has been applied to further evaluate the effectiveness of the proposed model. This dataset is a challenging benchmark for object detection, which consists of 101 different object categories as well as one background category, totaling approximately 9146 images.
The experimental results achieved by different comparative models on the Caltech-101 dataset are listed in Table 9. Note that the learning rate, batch size, and the size of the weight matrix in the FRMap layer of the proposed DGMVCL are configured as 0.005, 50, and 49 × 25, respectively. According to Table 9, we can see that the clustering accuracy of our proposed DGMVCL are 8.87%, 1.62%, and 3.55% higher than that of DSIMVC, DCP, and CVCL, respectively. Additionally, under the other two validation metrics, i.e., NMI and purity, our method is still the best performer. This demonstrates that the suggested Grassmannian manifold-valued deep contrastive learning mechanism can learn compact and discriminative geometric features for MVC, even in complicated data scenarios.
Methods | ACC | NMI | Purity |
DSIMVC [63] | 20.25 | 31.43 | 23.68 |
DCP [64] | 27.41 | 39.58 | 37.01 |
CVCL [55] | 25.48 | 37.67 | 36.63 |
DGMVCL | 29.03 | 45.81 | 40.02 |
In this paper, a novel framework is suggested to learn view-invariant representations for multiview clustering (MVC), called DGMVCL. Considering the submanifold structure of channel features, a Grassmannian neural network is constructed for the sake of characterizing and learning the subspace data more faithfully and effectively. Besides, the contrastive learning mechanism built upon the Grassmannian manifold and Euclidean space enables more discriminative cluster assignments. Extensive experiments and ablation studies conducted on five MVC datasets not only demonstrate the superiority of our proposed method over the state-of-the-art methods, but also confirm the usefulness of each designed component.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This work was supported in part by the National Natural Science Foundation of China (62306127, 62020106012, 62332008, U1836218), the Natural Science Foundation of Jiangsu Province (BK20231040), the Fundamental Research Funds for the Central Universities (JUSRP124015), the Key Project of Wuxi Municipal Health Commission (Z202318), and the National Key R & D Program of China (2023YFF1105102, 2023YFF1105105).
The authors declare no conflict of interest.
[1] |
Block decomposition of permutations and Schur-positivity. J. Algebraic Combin. (2018) 47: 603-622. ![]() |
[2] | C. A. Athanasiadis, Gamma-positivity in combinatorics and geometry, Sém. Lothar. Combin., 77 (2018), Article B77i, 64 pp (electronic). |
[3] | M. Barnabei, F. Bonetti and M. Silimbani, The descent statistic on 123-avoiding permutations, Sém. Lothar. Combin., 63 (2010), B63a, 8 pp. |
[4] |
Kazhdan-Lusztig polynomials for 321-Hexagon-avoiding permutations. J. Algebraic Combin. (2001) 13: 111-136. ![]() |
[5] |
M. Bóna, Combinatorics of Permutations. With a Foreword by Richard Stanley, Discrete Mathematics and its Applications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, 2004. doi: 10.1201/9780203494370
![]() |
[6] |
The number of Baxter permutations. J. Combin. Theory Ser. A (1978) 24: 382-394. ![]() |
[7] | A. Claesson and S. Kitaev, Classification of bijections between 321-and 132-avoiding permutations, Sém. Lothar. Combin., 60 (2008), B60d, 30 pp. |
[8] |
Decompositions and statistics for β(1,0)-trees and nonseparable permutations. Adv. in Appl. Math. (2009) 42: 313-328. ![]() |
[9] | L. Comtet, Advanced Combinatorics: The Art of Finite and Infinite Expansions, Revised and enlarged edition, D. Reidel Publishing Co., Dordrecht, 1974. |
[10] | S. Corteel, M. A. Martinez, C. D. Savage and M. Weselcouch, Patterns in inversion sequences I, Discrete Math. Theor. Comput. Sci., 18 (2016), Paper No. 2, 21 pp. |
[11] |
Permutation patterns and statistics. Discrete Math. (2012) 312: 2760-2775. ![]() |
[12] | P. G. Doyle, Stackable and queueable permutations, preprint, arXiv: 1201.6580. |
[13] |
Bijections for refined restricted permutations. J. Combin. Theory Ser. A (2004) 105: 207-219. ![]() |
[14] | D. Foata and M.-P. Schützenberger, Théorie Géométrique des Polynômes Eulériens, Lecture Notes in Mathematics, Vol. 138, Springer-Verlag, Berlin-New York, 1970. |
[15] |
k-arrangements, statistics, and patterns. SIAM J. Discrete Math. (2020) 34: 1830-1853. ![]() |
[16] | S. Fu, Z. Lin and Y. Wang, A combinatorial bijection on di-sk trees, preprint, arXiv: 2011.11302. |
[17] |
On two new unimodal descent polynomials. Discrete Math. (2018) 341: 2616-2626. ![]() |
[18] |
S. Fu and Y. Wang, Bijective proofs of recurrences involving two Schröder triangles, European J. Combin., 86 (2020), 103077, 18 pp. doi: 10.1016/j.ejc.2019.103077
![]() |
[19] | A. L. L. Gao, S. Kitaev and P. B. Zhang, On pattern avoiding indecomposable permutations, Integers, 18 (2018), A2, 23 pp. |
[20] |
S. Kitaev, Patterns in Permutations and Words. With a Forewrod by Jeffrey B. Remmel, Monographs in Theoretical Computer Science. An EATCS Series. Springer, Heidelberg, 2011. doi: 10.1007/978-3-642-17333-2
![]() |
[21] | D. E. Knuth, The Art of Computer Programming. Vol. 1. Fundamental Algorithms, 3rd edition, Addison-Wesley, Reading, MA, 1997. |
[22] |
Permutations with restricted patterns and Dyck paths. Special Issue in Honor of Dominique Foata's 65th birthday, Adv. in Appl. Math. (2001) 27: 510-530. ![]() |
[23] |
On γ-positive polynomials arising in pattern avoidance. Adv. in Appl. Math. (2017) 82: 1-22. ![]() |
[24] |
A sextuple equidistribution arising in pattern avoidance. J. Combin. Theory Ser. A (2018) 155: 267-286. ![]() |
[25] | OEIS Foundation Inc., The On-Line Encyclopedia of Integer Sequences, http://oeis.org, 2020. |
[26] |
T. K. Petersen, Eulerian numbers. With a Foreword by Richard Stanley, Birkhäuser Advanced Texts: Basler Lehrbücher, Birkhäuser/Springer, New York, 2015. doi: 10.1007/978-1-4939-3091-3
![]() |
[27] | M. Rubey, An involution on Dyck paths that preserves the rise composition and interchanges the number of returns and the position of the first double fall, Sém. Lothar. Combin., 77 (2016-2018), Art. B77f, 4 pp. |
[28] |
Restricted permutations. European J. Combin. (1985) 6: 383-406. ![]() |
[29] |
Forbidden subsequences. Discrete Math. (1994) 132: 291-316. ![]() |
[30] |
The Eulerian distribution on involutions is indeed γ-positive. J. Combin. Theory Ser. A (2019) 165: 139-151. ![]() |
[31] |
Generating trees and the Catalan and Schröder numbers. Discrete Math. (1995) 146: 247-262. ![]() |
1. | Vassiliki Aroniadou-Anderjaska, Volodymyr I. Pidoplichko, Taiza H. Figueiredo, Maria F.M. Braga, Oscillatory Synchronous Inhibition in the Basolateral Amygdala and its Primary Dependence on NR2A-containing NMDA Receptors, 2018, 373, 03064522, 145, 10.1016/j.neuroscience.2018.01.021 | |
2. | Hiroyuki Kanayama, Takashi Tominaga, Yoko Tominaga, Nobuo Kato, Hiroshi Yoshimura, Action of GABAB receptor on local network oscillation in somatosensory cortex of oral part: focusing on NMDA receptor, 2024, 74, 1880-6562, 10.1186/s12576-024-00911-w |
Datasets | Methods | ACC | NMI | Purity |
BSVC [61] | 67.98 | 74.43 | 72.34 | |
SCAgg [62] | 89.00 | 77.12 | 89.18 | |
ASR [41] | 97.90 | 94.72 | 97.90 | |
MNIST-USPS | DSIMVC [63] | 99.34 | 98.13 | 99.34 |
DCP [64] | 99.02 | 97.29 | 99.02 | |
MFL [65] | 99.66 | 99.01 | 99.66 | |
CVCL [55] | 99.58 | 98.79 | 99.58 | |
DGMVCL | 99.82 | 99.52 | 99.82 | |
BSVC [61] | 60.32 | 64.91 | 63.84 | |
SCAgg [62] | 98.00 | 94.80 | 97.56 | |
ASR [41] | 96.52 | 93.04 | 96.52 | |
Fashion | DSIMVC [63] | 88.21 | 83.99 | 88.21 |
DCP [64] | 89.37 | 88.61 | 89.37 | |
MFL [65] | 99.20 | 98.00 | 99.20 | |
CVCL [55] | 99.31 | 98.21 | 99.31 | |
DGMVCL | 99.52 | 98.73 | 99.52 | |
BSVC [61] | 73.32 | 76.91 | 74.11 | |
SCAgg [62] | 68.34 | 70.18 | 69.26 | |
ASR [41] | 84.23 | 65.47 | 84.23 | |
Multi-COIL-10 | DSIMVC [63] | 99.38 | 98.85 | 99.38 |
DCP [64] | 70.14 | 81.9 | 70.14 | |
MFL [65] | 99.20 | 98.00 | 99.20 | |
CVCL [55] | 99.43 | 99.04 | 99.43 | |
DGMVCL | 100.00 | 100.00 | 100.00 |
Datasets | Methods | ACC | NMI | Purity |
BSVC | 61.31 | 64.91 | 61.31 | |
SCAgg [62] | 61.65 | 77.41 | 66.22 | |
ASR [41] | 79.49 | 78.04 | 81.49 | |
ORL | DSIMVC [63] | 25.37 | 52.91 | 25.37 |
DCP [64] | 27.70 | 49.93 | 27.70 | |
MFL [65] | 80.03 | 89.34 | 80.03 | |
CVCL [55] | 85.50 | 93.17 | 86.00 | |
DGMVCL | 92.25 | 98.34 | 92.25 | |
BSVC | 38.05 | 38.85 | 42.08 | |
SCAgg [62] | 38.13 | 39.31 | 44.76 | |
ASR [41] | 42.70 | 40.70 | 45.60 | |
Scene-15 | DSIMVC [63] | 28.27 | 29.04 | 29.79 |
DCP [64] | 42.32 | 40.38 | 43.85 | |
MFL [65] | 42.52 | 40.34 | 44.53 | |
CVCL [55] | 44.59 | 42.17 | 47.36 | |
DGMVCL | 61.29 | 76.39 | 65.04 |
Loss | MNIST-USPS | ORL | Fashion | |||||||||
Types | Lg | Lc | La | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
A | ✔ | ✔ | 19.96 | 42.62 | 19.96 | 69.00 | 92.67 | 70.00 | 84.78 | 93.13 | 89.38 | |
B | ✔ | ✔ | 99.76 | 99.37 | 99.76 | 85.00 | 86.58 | 85.00 | 99.48 | 98.60 | 99.48 | |
C | ✔ | ✔ | 10.31 | 12.54 | 10.31 | 12.50 | 26.69 | 12.50 | 25.75 | 38.72 | 25.75 | |
D | ✔ | ✔ | ✔ | 99.82 | 99.52 | 99.82 | 92.25 | 98.34 | 92.25 | 99.52 | 98.73 | 99.52 |
E | ✔ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | ||
F | ✔ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
MNIST-USPS | Fashion | Multi-COIL-10 | |||||||
Groups | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
A | 99.76 | 99.37 | 99.76 | 99.48 | 98.60 | 99.48 | 100.00 | 100.00 | 100.00 |
B | 99.70 | 99.11 | 99.70 | 99.02 | 97.60 | 99.02 | 99.14 | 98.71 | 99.14 |
C | 98.00 | 97.00 | 97.00 | 81.25 | 86.24 | 81.40 | 31.70 | 31.00 | 32.00 |
D | 89.92 | 94.51 | 89.92 | 98.49 | 96.52 | 98.49 | 99.00 | 97.31 | 99.01 |
MNIST-USPS | Fashion | Multi-COIL-10 | |||||||
Metrics | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
nx=1 | 99.82 | 99.52 | 99.82 | 99.58 | 98.83 | 99.58 | 100.00 | 100.00 | 100.00 |
nx=2 | 99.68 | 99.08 | 99.68 | 99.39 | 98.37 | 99.39 | 98.14 | 96.71 | 98.14 |
nx=3 | 99.58 | 98.73 | 99.58 | 99.53 | 98.71 | 99.53 | 99.14 | 98.13 | 99.14 |
Datasets | MNIST-USPS | Fashion | Multi-COIL-10 | ORL | Scene-15 |
DGMVCL-"w/o" GMLM | 11.9 | 34.4 | 2.6 | 1.5 | 14.5 |
DGMVCL | 12.3 | 36.1 | 2.7 | 1.6 | 16.9 |
Methods | 15% | 30% | 45% | 60% |
CVCL | 98.49 | 97.23 | 95.06 | 89.48 |
DGMVCL | 99.16 | 98.29 | 97.28 | 92.70 |
Datasets | MNIST-USPS | Fashion | Multi-COIL-10 | ORL | Scene-15 |
DGMVCL-"w/o" MMM | 10.00 | 10.00 | 10.29 | 2.60 | 9.14 |
DGMVCL | 99.82 | 99.52 | 100.00 | 92.25 | 61.29 |
Datasets | Methods | ACC | NMI | Purity |
BSVC [61] | 67.98 | 74.43 | 72.34 | |
SCAgg [62] | 89.00 | 77.12 | 89.18 | |
ASR [41] | 97.90 | 94.72 | 97.90 | |
MNIST-USPS | DSIMVC [63] | 99.34 | 98.13 | 99.34 |
DCP [64] | 99.02 | 97.29 | 99.02 | |
MFL [65] | 99.66 | 99.01 | 99.66 | |
CVCL [55] | 99.58 | 98.79 | 99.58 | |
DGMVCL | 99.82 | 99.52 | 99.82 | |
BSVC [61] | 60.32 | 64.91 | 63.84 | |
SCAgg [62] | 98.00 | 94.80 | 97.56 | |
ASR [41] | 96.52 | 93.04 | 96.52 | |
Fashion | DSIMVC [63] | 88.21 | 83.99 | 88.21 |
DCP [64] | 89.37 | 88.61 | 89.37 | |
MFL [65] | 99.20 | 98.00 | 99.20 | |
CVCL [55] | 99.31 | 98.21 | 99.31 | |
DGMVCL | 99.52 | 98.73 | 99.52 | |
BSVC [61] | 73.32 | 76.91 | 74.11 | |
SCAgg [62] | 68.34 | 70.18 | 69.26 | |
ASR [41] | 84.23 | 65.47 | 84.23 | |
Multi-COIL-10 | DSIMVC [63] | 99.38 | 98.85 | 99.38 |
DCP [64] | 70.14 | 81.9 | 70.14 | |
MFL [65] | 99.20 | 98.00 | 99.20 | |
CVCL [55] | 99.43 | 99.04 | 99.43 | |
DGMVCL | 100.00 | 100.00 | 100.00 |
Datasets | Methods | ACC | NMI | Purity |
BSVC | 61.31 | 64.91 | 61.31 | |
SCAgg [62] | 61.65 | 77.41 | 66.22 | |
ASR [41] | 79.49 | 78.04 | 81.49 | |
ORL | DSIMVC [63] | 25.37 | 52.91 | 25.37 |
DCP [64] | 27.70 | 49.93 | 27.70 | |
MFL [65] | 80.03 | 89.34 | 80.03 | |
CVCL [55] | 85.50 | 93.17 | 86.00 | |
DGMVCL | 92.25 | 98.34 | 92.25 | |
BSVC | 38.05 | 38.85 | 42.08 | |
SCAgg [62] | 38.13 | 39.31 | 44.76 | |
ASR [41] | 42.70 | 40.70 | 45.60 | |
Scene-15 | DSIMVC [63] | 28.27 | 29.04 | 29.79 |
DCP [64] | 42.32 | 40.38 | 43.85 | |
MFL [65] | 42.52 | 40.34 | 44.53 | |
CVCL [55] | 44.59 | 42.17 | 47.36 | |
DGMVCL | 61.29 | 76.39 | 65.04 |
Loss | MNIST-USPS | ORL | Fashion | |||||||||
Types | Lg | Lc | La | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
A | ✔ | ✔ | 19.96 | 42.62 | 19.96 | 69.00 | 92.67 | 70.00 | 84.78 | 93.13 | 89.38 | |
B | ✔ | ✔ | 99.76 | 99.37 | 99.76 | 85.00 | 86.58 | 85.00 | 99.48 | 98.60 | 99.48 | |
C | ✔ | ✔ | 10.31 | 12.54 | 10.31 | 12.50 | 26.69 | 12.50 | 25.75 | 38.72 | 25.75 | |
D | ✔ | ✔ | ✔ | 99.82 | 99.52 | 99.82 | 92.25 | 98.34 | 92.25 | 99.52 | 98.73 | 99.52 |
E | ✔ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | ||
F | ✔ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
MNIST-USPS | Fashion | Multi-COIL-10 | |||||||
Groups | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
A | 99.76 | 99.37 | 99.76 | 99.48 | 98.60 | 99.48 | 100.00 | 100.00 | 100.00 |
B | 99.70 | 99.11 | 99.70 | 99.02 | 97.60 | 99.02 | 99.14 | 98.71 | 99.14 |
C | 98.00 | 97.00 | 97.00 | 81.25 | 86.24 | 81.40 | 31.70 | 31.00 | 32.00 |
D | 89.92 | 94.51 | 89.92 | 98.49 | 96.52 | 98.49 | 99.00 | 97.31 | 99.01 |
MNIST-USPS | Fashion | Multi-COIL-10 | |||||||
Metrics | ACC | NMI | Purity | ACC | NMI | Purity | ACC | NMI | Purity |
nx=1 | 99.82 | 99.52 | 99.82 | 99.58 | 98.83 | 99.58 | 100.00 | 100.00 | 100.00 |
nx=2 | 99.68 | 99.08 | 99.68 | 99.39 | 98.37 | 99.39 | 98.14 | 96.71 | 98.14 |
nx=3 | 99.58 | 98.73 | 99.58 | 99.53 | 98.71 | 99.53 | 99.14 | 98.13 | 99.14 |
Datasets | MNIST-USPS | Fashion | Multi-COIL-10 | ORL | Scene-15 |
DGMVCL-"w/o" GMLM | 11.9 | 34.4 | 2.6 | 1.5 | 14.5 |
DGMVCL | 12.3 | 36.1 | 2.7 | 1.6 | 16.9 |
Methods | 15% | 30% | 45% | 60% |
CVCL | 98.49 | 97.23 | 95.06 | 89.48 |
DGMVCL | 99.16 | 98.29 | 97.28 | 92.70 |
Datasets | MNIST-USPS | Fashion | Multi-COIL-10 | ORL | Scene-15 |
DGMVCL-"w/o" MMM | 10.00 | 10.00 | 10.29 | 2.60 | 9.14 |
DGMVCL | 99.82 | 99.52 | 100.00 | 92.25 | 61.29 |
Methods | ACC | NMI | Purity |
DSIMVC [63] | 20.25 | 31.43 | 23.68 |
DCP [64] | 27.41 | 39.58 | 37.01 |
CVCL [55] | 25.48 | 37.67 | 36.63 |
DGMVCL | 29.03 | 45.81 | 40.02 |