Processing math: 100%
Research article Topical Sections

Investigating the impact of biomass quality on near-infrared models for switchgrass feedstocks

  • The aim of this study was to determine the impact of incorporating switchgrass samples that have been in long term storage on the development of near-infrared (NIR) multivariate calibration models and their predictive capabilities. Stored material contains more variation in their respective spectral signatures due to chemical changes in the bales with storage time. Partial least squares (PLS) regression models constructed using NIR spectra of stored switchgrass possessed an instability that interfered with the correlation between the spectral data and measured chemical composition. The models were improved using calibration sample sets of equal parts stored and fresh switchgrass to more accurately predict the chemical composition of stored switchgrass. Acceptable correlation values (rcalibration) were obtained using a calibration sample set composed of 25 stored samples and 25 samples of fresh switchgrass for cellulose (0.91), hemicellulose (0.74), total carbohydrates (0.76), lignin (0.98), extractives (0.92), and ash (0.87). Increasing the calibration sample set to 100 samples of equal parts stored to senesced material resulted in statistically increased (p = 0.05) correlations for total carbohydrates (0.89) and ash (0.96). When these models were applied to a separate validation set (equal to 10% of the calibration sample set), high correlation coefficients (r) for predicted versus measured constituent content were observed for cellulose (0.94), total carbohydrates (0.98), lignin (0.91), extractives (0.97), and ash (0.90). For optimization of processing economics, the impact of feedstock storage must be investigated for implementation in conversion processes. While NIR is a well-known high-throughput technique for characterization of senesced switchgrass, the selection of appropriate calibration samples and consequent multivariate models must be taken into careful consideration for NIR application in a biomass storage facility for rapid chemical compositional determination.

    Citation: Lindsey M. Kline, Nicole Labbé, Christopher Boyer, T. Edward Yu, Burton C. English, James A. Larson. Investigating the impact of biomass quality on near-infrared models for switchgrass feedstocks[J]. AIMS Bioengineering, 2016, 3(1): 1-22. doi: 10.3934/bioeng.2016.1.1

    Related Papers:

    [1] Xueping Han, Xueyong Wang . MCGCL: A multi-contextual graph contrastive learning-based approach for POI recommendation. Electronic Research Archive, 2024, 32(5): 3618-3634. doi: 10.3934/era.2024166
    [2] He Ma, Weipeng Wu . A deep clustering framework integrating pairwise constraints and a VMF mixture model. Electronic Research Archive, 2024, 32(6): 3952-3972. doi: 10.3934/era.2024177
    [3] Shuwei Zhu, Hao Liu, Meiji Cui . Efficient multi-omics clustering with bipartite graph subspace learning for cancer subtype prediction. Electronic Research Archive, 2024, 32(11): 6008-6031. doi: 10.3934/era.2024279
    [4] Mohammad Mohammadi, Saad Varsaie . On the construction of $ \mathbb Z^n_2- $grassmannians as homogeneous $ \mathbb Z^n_2- $spaces. Electronic Research Archive, 2022, 30(1): 221-241. doi: 10.3934/era.2022012
    [5] Rajakumar Ramalingam, Saleena B, Shakila Basheer, Prakash Balasubramanian, Mamoon Rashid, Gitanjali Jayaraman . EECHS-ARO: Energy-efficient cluster head selection mechanism for livestock industry using artificial rabbits optimization and wireless sensor networks. Electronic Research Archive, 2023, 31(6): 3123-3144. doi: 10.3934/era.2023158
    [6] Zhuang Wang, Renting Liu, Jie Xu, Yusheng Fu . FedSC: A federated learning algorithm based on client-side clustering. Electronic Research Archive, 2023, 31(9): 5226-5249. doi: 10.3934/era.2023266
    [7] Zhiyuan Feng, Kai Qi, Bin Shi, Hao Mei, Qinghua Zheng, Hua Wei . Deep evidential learning in diffusion convolutional recurrent neural network. Electronic Research Archive, 2023, 31(4): 2252-2264. doi: 10.3934/era.2023115
    [8] Jicheng Li, Beibei Liu, Hao-Tian Wu, Yongjian Hu, Chang-Tsun Li . Jointly learning and training: using style diversification to improve domain generalization for deepfake detection. Electronic Research Archive, 2024, 32(3): 1973-1997. doi: 10.3934/era.2024090
    [9] Chuang Ma, Helong Xia . A one-step graph clustering method on heterogeneous graphs via variational graph embedding. Electronic Research Archive, 2024, 32(4): 2772-2788. doi: 10.3934/era.2024125
    [10] Liqian Bai, Xueqing Chen, Ming Ding, Fan Xu . A generalized quantum cluster algebra of Kronecker type. Electronic Research Archive, 2024, 32(1): 670-685. doi: 10.3934/era.2024032
  • The aim of this study was to determine the impact of incorporating switchgrass samples that have been in long term storage on the development of near-infrared (NIR) multivariate calibration models and their predictive capabilities. Stored material contains more variation in their respective spectral signatures due to chemical changes in the bales with storage time. Partial least squares (PLS) regression models constructed using NIR spectra of stored switchgrass possessed an instability that interfered with the correlation between the spectral data and measured chemical composition. The models were improved using calibration sample sets of equal parts stored and fresh switchgrass to more accurately predict the chemical composition of stored switchgrass. Acceptable correlation values (rcalibration) were obtained using a calibration sample set composed of 25 stored samples and 25 samples of fresh switchgrass for cellulose (0.91), hemicellulose (0.74), total carbohydrates (0.76), lignin (0.98), extractives (0.92), and ash (0.87). Increasing the calibration sample set to 100 samples of equal parts stored to senesced material resulted in statistically increased (p = 0.05) correlations for total carbohydrates (0.89) and ash (0.96). When these models were applied to a separate validation set (equal to 10% of the calibration sample set), high correlation coefficients (r) for predicted versus measured constituent content were observed for cellulose (0.94), total carbohydrates (0.98), lignin (0.91), extractives (0.97), and ash (0.90). For optimization of processing economics, the impact of feedstock storage must be investigated for implementation in conversion processes. While NIR is a well-known high-throughput technique for characterization of senesced switchgrass, the selection of appropriate calibration samples and consequent multivariate models must be taken into careful consideration for NIR application in a biomass storage facility for rapid chemical compositional determination.


    Clustering stands as a foundational research area in the realm of computer vision and machine learning. Over decades of exploration, traditional clustering methods have achieved a certain level of maturity, yielding promising results. However, when confronted with complex signals (e.g., images and videos), clustering methods reliant on Euclidean distance measure often yield unsatisfactory performance. Although the data observed in practical applications is complicated, the clusters to which the samples belong are generally located near the corresponding low-dimensional space. Therefore, some researchers have turned to subspace clustering to characterize the geometrical structure of the original data manifold. These algorithms transform each data point into multi-dimensional subspaces, fitting the data point to which it belongs through the low-dimensional subspace clusters to achieve the final clustering prediction. At present, subspace clustering methods catering to implicit subspace structures include iterative methods [1,2], algebraic methods [3,4], statistical methods [5,6], and self-representation methods [7,8,9]. Among them, self-representation has received widespread attention due to their effective utilization of the latent subspace structure and attributes of the data. Notable representatives of such an approach include sparse subspace clustering (SSC) [7] and low-rank representation (LRR) [9]. These methods typically contain two steps: (1) exploring the data structure by using Euclidean distance measure and obtaining a coefficient matrix; (2) leveraging the learned coefficient matrix to conduct spectral clustering [10]. Then, the data is assigned to $ k $ clusters.

    Nevertheless, previous studies reveal that image sets or video data often inhabit nonlinear manifold structure [11,12]. Therefore, the Euclidean distance cannot accurately measure the similarity between any two data points residing on a non-Euclidean space. Wei et al. [13] proposed a discrete metric learning method for fast image set classification, which significantly improves classification efficiency and accuracy by learning metrics and hashing techniques on the Riemannian manifold. Furthermore, video data commonly feature varying frames per video clip, leading to exaggerated data dimensions upon the vectorization of video frames. These challenges pose difficulties for original clustering algorithms such as SSC or LRR.

    Deep learning methods emerge as capable learners of discriminative feature representations [14]. A number of studies propose combining metric learning and deep learning for classification [14,15]. Recently, deep learning techniques have started to be used in the scenario of data clustering and have shown remarkable performance [16,17,18]. To name a few, Xie et al. [19] proposed a clustering method based on deep embedding, where a deep neural network learns feature representations and clustering assignments concurrently. Wang et al. [20] proposed a multi-level representation learning method for incomplete multiview clustering, which incorporates contrastive and adversarial regularization to improve clustering performance. Yang et al. [21] proposed a novel graph contrastive learning framework for clustering multi-layer networks, effectively combining nonnegative matrix factorization and contrastive learning to enhance the discriminative features of vertices across different network layers. Guo et al. [22] proposed an adaptive multiview subspace learning method using distributed optimization, which enhances clustering performance by capturing high-order correlations among views. Li et al. [23] introduced a deep adversarial MVC method that explores the intrinsic structure embedded in multiview data. Ma et al. [24] suggested a deep generative clustering model based on variational autoencoders, being able to yield a more appropriate embedding subspace for clustering with respect to complex data distributions. These methods address challenges faced by traditional data modeling (e.g., preprocessing high-dimensional data and characterizing nonlinear relationships) by nonlinearly mapping data points into a latent space through a series of encoder layers. However, existing deep clustering algorithms have an inherent limitation, i.e., they utilize Euclidean computations to analyze the semantic features generated by convolutional neural networks, which leads to unfaithful data representation. The fundamental reason is that these features inherently have a submanifold structure [25,26]. While techniques such as fine-tuning, optimization, and feature activation in deep neural networks can affect the experimental results, manifold learning provides a theoretical basis for the analysis of non-Euclidean data structures.

    Recognizing the nonlinear structure of high-dimensional data, some studies [11,27,28,29,30,31] extended the traditional clustering paradigm to the context of Riemannian manifolds. Typically, a manifold represents a smooth surface embedded in Euclidean space [32]. In image analysis, covariance matrices often serve as region descriptors, with these matrices regarded as points on the symmetric positive definite (SPD) manifold. Hu et al. [12] proposed a multi-geometry SSC method, aiming to mine complementary geometric representations. Wei et al. [33] proposed a method called sparse representation classifier guided Grassmann reconstruction metric learning (GRML), which enhances image set classification by learning a robust metric on the Grassmann manifold, making it effective in handling noisy data. Linear subspaces have attracted widespread attention in many scientific fields, as their underlying space is a Grassmannian manifold, which provides a solid theoretical foundation for the characterization of signal data (e.g., video clips, image sets, and point clouds). In addition, the Grassmannian manifold plays a crucial role in handling non-Euclidean data structures, particularly through its compact manifold structure. Wang et al. [34] extended the ideology of LRR to the context of the Grassmannian manifold, facilitating more accurate representation and analysis in complicated data scenarios. Piao et al. [35] proposed a double-kernel norm low-rank representation based on the Grassmannian manifold for clustering. However, the issues such as integrating manifold representation learning with clustering and preserving subspace structure in the data tramsformation process are still challenging for such a framework.

    It can be concluded that deep learning-based clustering struggles to learn effective representations from the data with a submanifold structure, and the underlying space of subspace features is a Grassmannian manifold. This motivates us to propose the framework of DGMVCL to achieve a more reasonable view-invariant representation on a non-Euclidean space. The proposed framework comprises four main components: the FEM, MMM, Grassmannian manifold learning module (GMLM), and the contrastive leaning module (CLM). Specifically, the FEM is responsible for mapping the original data into a feature subspace, the MMM is designed for the projection of subspace features onto a Grassmannian manifold, the GMLM is used to facilitate deep subspace learning on the Grassmannian manifold, and the CLM is focused on discriminative cluster assignments through contrastive learning. Additionally, the positive and negative samples relied upon in the contrastive learning process are constructed on the basis of Riemannian distance on the Grassmannian manifold, which can better reflect the geometric distribution of the data. Extensive experiments across five benchmarking datasets validate the effectiveness of the proposed DGMVCL.

    Our main contributions are summarized as follows:

    ● A lightweight geometric learning model build upon the Grassmannian manifold is proposed for multiview subspace clustering in an end-to-end manner.

    ● A Grassmannian-level contrastive learning strategy is suggested to help improve the accuracy of cluster assignments among multiple views.

    ● Extensive experiments conducted on five multiview datasets demonstrate the effectiveness of the proposed DGMVCL.

    In this section, we will give a brief introduction to some related works, including multiview subspace clustering, contrastive learning, and the Riemannian geometry of Grassmannian manifold.

    Although there exists a number of subspace clustering methods, such as LRR [9] and SSC [7], most of them adopt self-representation to obtain subspace features. Low-rank tensor-constrained multiview subspace clustering (LMSC) [36] is able to generate a common subspace for different views instead of individual representations. Flexible multiview representation learning for subspace clustering (FMR) [37] avoids using partial information for data reconstruction. Zhou et al. [38] proposed an end-to-end adversarial attention network to align latent feature distributions and assess the importance of different modalities. Despite the improved clustering performance, these methods do not consider the semantic label consistency across multiple views, potentially resulting in challenges when learning consistent clustering assignments.

    To address these challenges, Kang et al. [39] proposed a structured graph learning method that constructs a bipartite graph to manage the relationships between samples and anchor points, effectively reducing computational complexity in large-scale data and enabling support for out-of-sample extensions. This approach is particularly advantageous in scalable subspace clustering scenarios, extending from single-view to multiview settings.

    Furthermore, to enhance clustering performance by leveraging high-order structural information, Pan and Kang [40] introduced a high-order multiview clustering (HMvC) method. This approach utilizes graph filtering and an adaptive graph fusion mechanism to explore long-distance interactions between different views. By capturing high-order neighborhood relationships, HMvC effectively improves clustering results on both graph and non-graph data, addressing some of the limitations in prior methods that overlooked the intrinsic high-order information from data attributes.

    Contrastive learning has made significant progress in self-supervised learning [41,42,43,44]. The methods based on contrastive learning essentially rely on a large number of pairwise feature comparisons. Specifically, they aim to maximize the similarity between positive samples in the latent feature space while minimizing the similarity between negative samples simultaneously. In the field of clustering, positive samples are constructed from the invariant representations of all multiview instances of the same sample, while negative samples are obtained by pairing representations from different samples across various views. Chen et al. [42] proposed a contrastive learning framework that maximizes consistency between differently augmented views of the same example in the latent feature space. Wang et al. [44] investigated two key properties of the contrastive loss, namely feature alignment from positive samples and uniformity of induced feature distribution on the hypersphere, which can be used to measure the quality of generated samples. Although these methods can learn robust representations based on data augmentation, learning invariant representations across multiple views remains challenging.

    The Grassmannian manifold $ \mathcal{G}(q, d) $ comprises a collection of $ q $-dimensional linear subspaces within $ \mathbb{R}^d $. Each of them can be naturally represented by an orthonormal basis denoted as $ \boldsymbol{\mathrm{Y}} $, with the size of $ d \times q $ ($ \boldsymbol{\mathrm{Y}}^{\mathrm{T}}\boldsymbol{\mathrm{Y}} = \boldsymbol{\mathrm{I}}_q $, where $ \boldsymbol{\mathrm{I}}_q $ is a $ q \times q $ identity matrix). Consequently, the matrix representation of each Grassmannian element is comprised of an equivalence class of this orthonormal basis:

    $ [Y]={˜Y˜Y=YO,OO(q)},
    $
    (2.1)

    where $ \boldsymbol{\mathrm{Y}} $ represents a $ d \times q $ column-wise orthonormal matrix. The definition in Eq (2.1) is commonly referred to as the orthonormal basis (ONB) viewpoint [32].

    As demonstrated in [32], each point of the Grassmannian manifold can be alternatively represented as an idempotent symmetric matrix of rank $ q $, given by $ \Phi(\boldsymbol{\mathrm{Y}}) = \boldsymbol{\mathrm{Y}}\boldsymbol{\mathrm{Y}}^\mathrm{T} $. This representation, known as the projector perspective (PP), signifies that the Grassmannian manifold is a submanifold of the Euclidean space of symmetric matrices. Therefore, an extrinsic distance can be induced by the ambient Euclidean space, termed the projection metric (PM) [49]. The PM is defined as:

    $ dPM(Y1,Y2)=21/2Y1YT1Y2YT2F,
    $
    (2.2)

    where $ \lVert \cdot \rVert_{\mathrm{F}} $ denotes the Frobenius norm. As evidenced in [45], the distance computed by the PM deviates from the true geodesic distance on the Grassmannian manifold by a scale factor of $ \sqrt{2} $, making it a widely used Grassmannian distance.

    We are given a group of multiview data X = $ \mathbf{X}^v \in \mathbb{R}^{d_v \times N_v}\}_{v = 1}^{V} $, where V denotes the number of views, N_v signifies the number of instances contained in the v-th view, and d_v represents the dimensionality of each instance in $ \boldsymbol{\mathrm{X}}^v $. The proposed DGMVCL is an end-to-end neural network built upon the Grassmannian manifold, aiming to generate clustering semantic labels from the original instances across multiple views. As shown in Figure 1, the proposed framework mainly consists of four modules, which are the FEM, MMM, GMLM, and CLM, respectively. The following is a detailed introduction to them.

    Figure 1.  An overview of the proposed DGMVCL. It can be seen that this framework is made up of four different modules, i.e., FEM, MMM, GMLM, and CLM, respectively. Besides, $ n_x $ represents the number of Grassmannian operation blocks contained in GMLM.

    To obtain an effective semantic space for the subsequent computations, we exploit a CNN network to transform the input images into subspace representations with lower redundancy. The proposed FEM contains two blocks, each of which comprises a convolutional layer, a ReLU activation layer, and a max-pooling layer, respectively. The difference of the two blocks lies in the number of convolutional kernels involved, i.e., 32 for the first convolutional layer and 64 for the second one. To characterize the geometric structure of the generated features, the following manifold modeling module is designed.

    Let $ \boldsymbol{\mathrm{E}}_i^v\in\mathbb{R}^{c\times l} $ be the $ i $-th ($ i\in \{1, 2, ..., N_v\} $) feature matrix generated by FEM with respect to the $ i $-th input instance of the $ v $-th view. Here, $ c $ represents the number of channels while $ l $ indicates the length of a vectorized feature map. Since each point of $ \mathcal{G}(q, d) $ represents a $ q $-dimentional linear subspace of the $ d $-dimentional vector space $ \mathbb{R}^d $ (see Section 2.3), the Grassmannian manifold thus becomes a reasonable and efficient tool for parametrizing the $ q $-dimentional real vector subspace embedded in $ \boldsymbol{\mathrm{E}}_i^v $ [49,51].

    Cov Layer: To capture complementary statistical information embodied in different channel features, a similarity matrix is computed for each $ \boldsymbol{\mathrm{E}}_i^v $:

    $ ˜Evi=Evi(Evi)T.
    $
    (3.1)

    Orth Layer: Following the Cov layer, the SVD operation is applied to obtain a $ q $-dimensional linear subspace spanned by an orthonormal matrix $ \boldsymbol{\mathrm{Y}}_i^v\in\mathbb{R}^{c\times q} $, that is $ {\boldsymbol{\tilde{\mathrm{E}}}}_i^v\simeq \boldsymbol{\mathrm{Y}}_i^v\boldsymbol{\mathrm{\Sigma}}_i^v(\boldsymbol{\mathrm{Y}}{_i^v})^{\mathrm{T}} $, wherein, $ \boldsymbol{\mathrm{Y}}_i^v $ and $ \boldsymbol{\mathrm{\Sigma}}_{i}^v $ are two matrices consisting of $ q $ leading eigenvalues and the corresponding eigenvectors, respectively. Now, the resulting Grassmannian representation with respect to the input $ \boldsymbol{\mathrm{X}}^v $ is denoted by $ \boldsymbol{\mathrm{\Upsilon}}^v = [\boldsymbol{\mathrm{Y}}_1^v, \boldsymbol{\mathrm{Y}}_2^v, ..., \boldsymbol{\mathrm{Y}}_{N_v}^v] $.

    An overview of the designed GMLM is shown in Figure 1. We can note that the input to this module is a series of orthonormal matrices. For simplicity, we abbreviate $ \boldsymbol{\mathrm{Y}}_i^v $ as $ \boldsymbol{\mathrm{Y}}_i $ in the following. Besides, since $ \boldsymbol{\mathrm{Y}}_i \in \mathcal{G}(q, c) $, we replace the symbol $ c $ with the symbol $ d $. To implement deep subspace learning, the following three basic layers are designed.

    FRMap Layer: This layer transforms the input orthogonal matrices into new ones through a linear mapping function $ f_{fr} $:

    $ Yi,k=f(k)fr(Yi,k1;Wk)=WkYi,k1,
    $
    (3.2)

    where $ \mathbf{Y}_{i, k-1} \in \mathcal{G}(q, d_{k-1}) $ is the input data of the $ k $-th layer, $ \mathbf{W}_k \in \mathbb{R}^{d_k \times d_{k-1}} $ ($ d_k < d_{k-1} $) is the to-be-learned transformation matrix (connection weights), essentially required to be a row full-rank matrix, and $ \mathbf{Y}_{i, k} \in \mathbb{R}^{d_k \times q} $ is the resulting matrix. Due to the fact that the weight matrices lie in a non-compact Stiefel manifold and the geodesic distance has no upper bound [46,47], direct optimization on such a manifold is unfeasible. To address this challenge, we follow [46] to impose an orthogonality constraint on each weight matrix $ \mathbf{W}_k $. As a consequence, the weight space $ \mathbb{R}_*^{d_k \times d_{k-1}} $ becomes a compact Stiefel manifold $ St(d_k, d_{k-1}) $ [48], facilitating better optimization.

    ReOrth Layer: Inspired by [46], we design the ReOrth layer for the sake of preventing the output matrices of the FRMap layer from degeneracy. Specifically, we first impose QR decomposition on the input matrix $ \mathbf{Y}_{i, k-1} $:

    $ Yi,k1=Qi,k1Ri,k1,
    $
    (3.3)

    where $ \mathbf{Q}_{i, k-1} \in \mathbb{R}^{d_{k-1} \times q} $ is an orthogonal matrix, and $ \mathbf{R}_{i, k-1} \in \mathbb{R}^{q \times q} $ is an invertible upper triangular matrix. Therefore, $ \mathbf{Y}_{i, k-1} $ can be normalized into an orthonormal basis matrix via the following transformation function $ f^{(k)}_{ro} $:

    $ Yi,k=f(k)ro(Yi,k1)=Yi,k1R1i,k1=Qi,k1.
    $
    (3.4)

    ProjMap Layer: As studied in [47,49,50,51,52], the PM is one of the most popular Grassmannian distance measures, providing a specific inner product structure to a concrete Riemannian manifold. In such a case, the original Grassmannian manifold reduces to a flat space, in which the Euclidean computations can be generalized to the projection domain of orthogonal matrices. Formally, by applying the projection operator [47] to the orthogonal matrix $ \mathbf{Y}_{i, k-1} $ of the $ k $-th layer, the designed ProjMap layer can be formulated as:

    $ Yi,k=f(k)pm(Yi,k1)=Yi,k1YTi,k1.
    $
    (3.5)

    Subsequently, we embed a contrastive learning module at the end of the GMLM to enhance the discriminatory power of the learned features.

    For simplicity, we assume that $ n_x = 1 $. In this case, the output data representation with respect to $ \boldsymbol{\mathrm{\Upsilon}}^v $ is denoted by $ \boldsymbol{\tilde{\mathrm{\Upsilon}}}^v = [\boldsymbol{\mathrm{Y}}_{1, 3}^v, \boldsymbol{\mathrm{Y}}_{2, 3}^v, ..., \boldsymbol{\mathrm{Y}}_{N_v, 3}^v] $. Then, the projection metric, defined in Eq (2.2), is applied to compute the geodesic distance between any two data points, enabling the execution of K-means clustering within each view. At the same time, we can obtain the membership degree $ t_{ij} $, computed as follows:

    $ tij=1d2PM(Yvi,3,Uj)Kr=11d2PM(Yvi,3,Ur),
    $
    (3.6)

    where $ K $ represents the total number of clusters, and $ \boldsymbol{\mathrm{U}}_j $ denotes the $ j $-th ($ j\in \{1, 2, ..., K\} $) cluster. To enhance the discriminability of the global soft assignments, we consider a unified target distribution probability $ \mathbf{P}^{v}\in \mathbb{R}^{N_v \times K} $, which is formulated as [53]:

    $ pvij=(t2ij/Nvitij)Kj(t2ij/Nvitij),
    $
    (3.7)

    where each $ p_{ij}^{v} $ represents the soft cluster assignment of the $ i $-th sample to the $ j $-th cluster. Therefore, $ \mathbf{p}_j^{v} $ represents the cluster assignments of the same semantic cluster.

    The similarity between the two cluster assignments $ \mathbf{p}_{j}^{v_1} $ and $ \mathbf{p}_{j}^{v_2} $ for cluster $ j $ is measured by the following cosine similarity [54]:

    $ s(pv1j,pv2j)=pv1jpv2jpv1jpv2j,
    $
    (3.8)

    where $ v_1 $ and $ v_2 $ represent two different views. Since these instances belong to the same labels in each view, the cluster assignment probabilities of instances from different views should be similar. Moreover, instances from multiple views are independent of each other for different samples. Therefore, for $ V $ views with $ K $ clusters, there are $ (V - 1) $ positive cluster assignment pairs and $ V(K - 1) $ negative cluster assignment pairs.

    The goal of CLM is to maximize the similarity between cluster assignments within clusters and minimize the similarity between cluster assignments across clusters. Inspired by [55], the cross-view contrastive loss between $ \mathbf{p}_{k}^{v_1} $ and $ \mathbf{p}_{k}^{v_2} $ is defined as follows:

    $ l(v1,v2)=1KKk=1loges(pv1k,pv2k)/τKj=1,jkes(pv1j,pv1k)/τ+Kj=1es(pv1j,pv2k)/τ,
    $
    (3.9)

    where $ \tau $ is the temperature parameter, ($ \mathbf{p}_k^{v_1}, \mathbf{p}_k^{v_2} $) is a positive cluster assignment pair between views $ v_1 $ and $ v_2 $, and ($ \mathbf{p}_j^{v_1}, \mathbf{p}_k^{v_1} $) ($ j\neq k $), ($ \mathbf{p}_j^{v_1}, \mathbf{p}_k^{v_2} $) are negative cluster assignment pairs in the views of $ v_1 $ and $ v_2 $, respectively. The introduced cross-view contrastive loss across multiple views is given below:

    $ Lg=12Vv1=1Vv2=1,v2v1l(v1,v2).
    $
    (3.10)

    The cross-view contrastive loss explicitly pulls together cluster assignments within the same cluster and pushes apart cluster assignment pairs from different clusters. This inspiration comes from the recently proposed contrastive learning paradigm, which is applied to semantic labels to explore consistent information across multiple views.

    Additionally, to ensure consistency between cluster labels on the Grassmannian manifold and Euclidean space, as shown in Figure 1, we append two linear layers and a softmax function to the tail of the ProjMap layer to generate a probability matrix $ \mathbf{Q}^{v} \in \mathbb{R}^{N_v \times K} $ for cluster assignments. Let $ \mathbf{q}_i^{v} $ be the $ i $-th row in $ \mathbf{Q}^{v} $, and $ q_{ij}^{v} $ represents the probability that the $ i $-th instance belongs to the $ j $-th cluster of the $ v $-th view. The semantic label of the $ i $-th instance is determined by the maximum value in $ \mathbf{q}_i^{v} $. Following similar steps as processing $ L_g $, we get the contrastive loss $ L_c $ for different views in Euclidean space:

    $ Lc=12KVv1=1Vv2=1,v2v1Kk=1loges(qv1k,qv2k)/τKj=1,jkes(qv1j,qv1k)/τ+Kj=1es(qv1j,qv2k)/τ.
    $
    (3.11)

    Inspired by [53], we introduce the following regularization term to prevent all the instances from being assigned to a specific cluster:

    $ La=Vv=1Kj=1hvjloghvj,
    $
    (3.12)

    where $ \mathbf{h}_j^{v} = \frac{1}{N_v}\sum_{i = 1}^{N_v} q_{ij}^{v} $. This term is considered as the cross-view consistency loss in DGMVCL.

    The objective function of the proposed method comprises three primary components: the Grassmannian contrastive loss, the Euclidean contrastive loss, and the cross-view consistency loss, given below:

    $ Lobj=αLg+βLc+γLa,
    $
    (3.13)

    where $ \alpha $, $ \beta $, and $ \gamma $ are three trade-off parameters.

    The goal of $ L_\text{obj} $ is to learn common semantic labels from feature representations in multiple views. Let $ \mathbf{p}^{v}_i $ be the $ i $-th row of $ \mathbf{P}^{v} $, and $ {p}^{v}_{ij} $ represents the $ j $-th element of $ \mathbf{p}^{v}_i $. Specifically, $ \mathbf{p}^{v}_i $ is a K-dimensional soft assignment probability, where $ \sum_{j = 1}^{K} p^{v}_{ij} = 1 $. Once the training process of the proposed network is completed, the semantic label of sample $ i $ ($ i\in{1, 2, ..., N_v} $) can be predicted by:

    $ yi=argmaxj(1VVv=1pvij).
    $
    (3.14)

    For the proposed GMLM, the composition of a series of successive functions $ f = f^{(\rho)}\circ f^{(\rho-1)}\circ f^{(\rho-2)}\circ...\circ f^{(2)}\circ f^{(1)} $ with $ \boldsymbol{{W}} = \{\boldsymbol{\mathrm{W}}_{\rho}, \boldsymbol{\mathrm{W}}_{\rho-1}, ..., \boldsymbol{\mathrm{W}}_1\} $ is the parameter tuple which can be viewed as the data embedding model, which satisfies the properties of metric space. Here, $ f^{(k)} $ and $ \boldsymbol{\mathrm{W}}_{k} $ are, respectively, the operation function and weight parameter of the $ k $-th layer, and $ \rho $ denotes the number of layers contained in GMLM. The loss of the $ k $-th layer can be signified as: $ L^{(k)} = \ell\circ f^{(\rho)}\circ...\circ f^{(k)} $, where $ \ell $ is actually the $ L_{\text{obj}} $.

    Due to the fact that the weight space of the FRMap layer is a compact Stiefel manifold $ St(d_k, d_{k-1}) $, we refer to the method studied in [46] to realize parameter optimization by generalizing the traditional stochastic gradient descent (SGD) settings to the context of Stiefel manifolds. The updating rule for $ \boldsymbol{\mathrm{W}}_{k} $ is given below:

    According to Eq (3.2), we can have that: $ \boldsymbol{\mathrm{Y}}_{k} = f^{(k)}(\boldsymbol{\mathrm{W}}_k, \boldsymbol{\mathrm{Y}}_{k-1}) = \boldsymbol{\mathrm{W}}_k\boldsymbol{\mathrm{Y}}_{k-1} $. Then, the following variation of $ \boldsymbol{\mathrm{Y}}_k $ can be derived:

    $ dYk=dWkYk1+WkdYk1.
    $
    (3.15)

    Based on the invariance of the first-order differential, the following chain rule can be deduced:

    $ L(k+1)Yk:dYk=L(k)Wk:dWk+L(k)Yk1:dYk1.
    $
    (3.16)

    By replacing the left-hand side of Eq (3.16) with Eq (3.15) and exploiting the matrix inner product ":" property, the following two formulas can be derived:

    $ L(k+1)Yk:dWkYk1=L(k+1)YkYTk1:dWk,
    $
    (3.17)
    $ L(k+1)Yk:WkdYk1=WTkL(k+1)Yk:dYk1.
    $
    (3.18)

    Combining Eqs (3.16)–(3.18), the partial derivatives of $ L^{(k)} $ with respect to $ \boldsymbol{\mathrm{W}}_k $ and $ \boldsymbol{\mathrm{Y}}_{k-1} $ can be computed by:

    $ L(k)Wk=L(k+1)YkYTk1,L(k)Yk1=WTkL(k+1)Yk.
    $
    (3.19)

    At this time, the updating criterion of $ \boldsymbol{W}_k $ on the Stiefel manifold is given below:

    $ Wt+1k=RWtk(ηΠWtk(L(k)Wtk)),
    $
    (3.20)

    where $ \mathrm{\mathcal{R}} $ signifies the retraction operation used to map the optimized parameter back onto the Stiefel manifold, $ \eta $ is the learning rate, and $ \Pi $ represents the projection operator used to convert the Euclidean gradient into the corresponding Riemannian counterpart:

    $ ˜L(k)Wtk=ΠWtk(L(k)Wtk)=L(k)WtkWtk(L(k)Wtk)TWtk,
    $
    (3.21)

    where $ \nabla{{L}}_{\boldsymbol{W}_k^t}^{(k)} $ is the Euclidean gradient, computed by the first term of Eq (3.19), and $ \tilde{\nabla}{{L}}_{\boldsymbol{W}_k^t}^{(k)} $ denotes the Riemannian gradient. After that, the weight parameter can be updated by: $ \boldsymbol{W}_k^{t+1} = \mathrm{\mathcal{R}}(\boldsymbol{W}_k^t-\eta\tilde{\nabla}{{L}}_{\boldsymbol{W}_k^t}^{(k)}) $. For detailed information about the Riemannian geometry of a Stiefel manifold and its associated retraction operation, please kindly refer to [48].

    In this section, we conduct experiments on five benchmarking datasets to evaluate the performance of the proposed DGMVCL. All the experiments are run on a Linux workstation with a GeForce RTX 4070 GPU (12 GB caches).

    The proposed DGMVCL is evaluated on five publicly available multiview datasets. The MNIST-USPS [56] dataset contains 5000 samples with two different styles of digital images. The Multi-COIL-10 dataset [57] is comprised of 720 grayscale images collected from 10 categories with the image size of 32 $ \times $ 32, where different views represent different object poses. The Fashion dataset [58] is made up of 10,000 images belonging to 10 categories, where three different styles of an object are regarded as its three different views, i.e., front view, side view, and back view, respectively. The ORL dataset [59] consists of 400 face images collected from 40 volunteers, with each volunteer providing 10 images under different expressions and lighting conditions. The Scene-15 [60] dataset contains 4485 scene images belonging to 15 categories.

    We evaluate the clustering performance using the following three metrics, i.e., clustering accuracy (ACC), normalized mutual information (NMI), and purity. Here, ACC is the ratio of correctly classified samples to the total number of samples, NMI is an indicator to measure the consistency between the clustering result and the true class distribution, and purity refers to the ratio of the number of samples in the largest cluster to the total number of samples, indicating whether each cluster contains samples belonging to the same class.

    To validate the effectiveness of the proposed method, we compare DGMVCL with several state-of-the-art methods, including augmented sparse representation (ASR) [41], deep save IMVC (DSIMVC) [63], dual contrastive prediction (DCP) [64], multi-level feature learning (MFL) [65], and cross-view contrastive learning (CVCL) [55]. For DCP, we report the best clustering results obtained for each pair of individual views in each dataset. For better comparison, we include two additional baselines. Specifically, we first apply spectral clustering [61] to each individual view and report the best clustering results obtained across multiple views, termed as best single view clustering (BSVC). Then, we utilize an adaptive neighborhood graph learning method [62] to generate a similarity matrix for each individual view. We aggregate all the similarity matrices into a new one for spectral clustering, denoted as $ SC_{Agg} $.

    The clustering results of various algorithms obtained on the five used datasets are reported in Tables 1 and 2, respectively. The best results are highlighted in bold. It can be seen that the clustering performance of the contrastive learning-based methods, including DGMVCL, CVCL, MFL, and DCP, are superior to other competitors (e.g., BSVC and $ SC_{Agg} $) on the large-scale MNIST-USPS, Scene-15, and Fashion datasets. This is mainly attributed to the fact that the contrastive learning-based self-supervised learning mechanism is capable of learning more discriminative feature representations by maximizing the similarity between positive samples and minimizing the similarity between negative samples simultaneously. Furthermore, our proposed DGMVCL is the best performer on all the used datasets, demonstrating its effectiveness. To name a few, on the Scene-15 dataset, the DGMVCL achieves approximately 16.7%, 34.22%, and 17.68% improvements in ACC, NMI, and purity in comparison with the second-best CVCL method. The fundamental reason is that the proposed subspace-based geometric learning method can characterize and analyze the structural information of the input subspace features more faithfully. In addition, the introduced dual contrastive losses makes it possible to learn a more discriminative network embedding.

    Table 1.  Results of all methods on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets.
    Datasets Methods ACC NMI Purity
    BSVC [61] 67.98 74.43 72.34
    $ SC_{Agg} $ [62] 89.00 77.12 89.18
    ASR [41] 97.90 94.72 97.90
    MNIST-USPS DSIMVC [63] 99.34 98.13 99.34
    DCP [64] 99.02 97.29 99.02
    MFL [65] 99.66 99.01 99.66
    CVCL [55] 99.58 98.79 99.58
    DGMVCL 99.82 99.52 99.82
    BSVC [61] 60.32 64.91 63.84
    $ SC_{Agg} $ [62] 98.00 94.80 97.56
    ASR [41] 96.52 93.04 96.52
    Fashion DSIMVC [63] 88.21 83.99 88.21
    DCP [64] 89.37 88.61 89.37
    MFL [65] 99.20 98.00 99.20
    CVCL [55] 99.31 98.21 99.31
    DGMVCL 99.52 98.73 99.52
    BSVC [61] 73.32 76.91 74.11
    $ SC_{Agg} $ [62] 68.34 70.18 69.26
    ASR [41] 84.23 65.47 84.23
    Multi-COIL-10 DSIMVC [63] 99.38 98.85 99.38
    DCP [64] 70.14 81.9 70.14
    MFL [65] 99.20 98.00 99.20
    CVCL [55] 99.43 99.04 99.43
    DGMVCL 100.00 100.00 100.00

     | Show Table
    DownLoad: CSV
    Table 2.  Results of all methods on the ORL and Scene-15 datasets.
    Datasets Methods ACC NMI Purity
    BSVC 61.31 64.91 61.31
    $ SC_{Agg} $ [62] 61.65 77.41 66.22
    ASR [41] 79.49 78.04 81.49
    ORL DSIMVC [63] 25.37 52.91 25.37
    DCP [64] 27.70 49.93 27.70
    MFL [65] 80.03 89.34 80.03
    CVCL [55] 85.50 93.17 86.00
    DGMVCL 92.25 98.34 92.25
    BSVC 38.05 38.85 42.08
    $ SC_{Agg} $ [62] 38.13 39.31 44.76
    ASR [41] 42.70 40.70 45.60
    Scene-15 DSIMVC [63] 28.27 29.04 29.79
    DCP [64] 42.32 40.38 43.85
    MFL [65] 42.52 40.34 44.53
    CVCL [55] 44.59 42.17 47.36
    DGMVCL 61.29 76.39 65.04

     | Show Table
    DownLoad: CSV

    Objective Function: To verify the impact of each loss function in Eq (3.13) on the performance of the proposed method, we conduct ablation experiments on the MNIST-USPS, ORL, and Fashion datasets as three examples. Table 3 shows the experimental results of our model under different combinations of loss functions. It can be seen that under type D, that is, all the loss functions are used at the same time, our method achieves the best performance. However, when $ L_a $ is removed (type A), the clustering performance of DGMVCL decreases on the three used datasets. For the MVC task, similar samples usually exhibit a large intra-data diversity, while dissimilar samples usually demonstrate a large inter-data correlation. This makes it impossible for a single $ L_c $ to effectively learn invariant representations from such data variations. Therefore, $ L_a $ is introduced to prevent instances from being assigned to a particular cluster, and the experimental results confirm its effectiveness. From Table 3, we can also conclude that the introduced Grassmannian contrastive loss $ L_g $ is beneficial for ameliorating the discriminability of the learned geometric features. Another interesting observation from Table 3 is that the clustering performance of type C ($ L_c $ is removed) is visibly lower than type D in terms of ACC, NMI, and purity. The fundamental reason is that $ L_c $ is the most critical loss function, as it is not only used to train the proposed network, but also participates in the testing phase. This indicates that $ L_c $ is crucial for distinguishing positive and negative samples, i.e., maintaining the cross-view consistency, and plays a pivotal role in the overall clustering task. Since $ L_g $ and $ L_a $ act as two regularization terms to proivide additional discriminative information for $ L_c $ and are mainly confined to the training phase, neither $ L_a $ nor $ L_g $ can be used alone. All in all, the aforementioned experimental findings demonstrate that each term in Eq (3.13) is useful.

    Table 3.  Comparison under different combinations of loss functions on the MNIST-USPS, ORL, and Fashion datasets.
    Loss MNIST-USPS ORL Fashion
    Types $ L_g $ $ L_c $ $ L_a $ ACC NMI Purity ACC NMI Purity ACC NMI Purity
    A 19.96 42.62 19.96 69.00 92.67 70.00 84.78 93.13 89.38
    B 99.76 99.37 99.76 85.00 86.58 85.00 99.48 98.60 99.48
    C 10.31 12.54 10.31 12.50 26.69 12.50 25.75 38.72 25.75
    D 99.82 99.52 99.82 92.25 98.34 92.25 99.52 98.73 99.52
    E N/A N/A N/A N/A N/A N/A N/A N/A N/A
    F N/A N/A N/A N/A N/A N/A N/A N/A N/A

     | Show Table
    DownLoad: CSV

    The Components of GMLM: To verify the significance of each operation layer defined in GMLM, we make ablation experiments on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets as three examples. The clustering results of different subarchitectures are listed in Table 4. It is evident that group A (served as the reference group, with only the loss function $ L_g $ removed) is superior to group D in terms of ACC, NMI, and purity. Wherein, group D is generated by removing the ProjMap layer from group A. This demonstrates the necessity of Riemannian computation in preserving the geometric structure of the original data manifold. From Table 4, we can also find that the performance of group C (obtained by removing the layers of FRMap, ReOrth, and ProjMap from group A) is significantly inferior to that of group D on the Fashion and Multi-COIL-10 datasets, suggesting that deep subspace learning is able to improve the effectiveness of the features. Another interesting observation from Table 4 is that after expurgating the ReOrth layer from group A (this becomes group B), the clustering results are reduced to a certain extent on the three used datasets. The basic reason is that the Grassmannian properties, e.g., orthogonality, of the input feature matrices cannot be maintained, resulting in imprecise Riemannian distances computed in $ L_g $. All in all, the experimental results mentioned above confirm the usefulness of each part in GMLM.

    Table 4.  Comparison under different subarchitectures of the proposed model on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets.
    MNIST-USPS Fashion Multi-COIL-10
    Groups ACC NMI Purity ACC NMI Purity ACC NMI Purity
    A 99.76 99.37 99.76 99.48 98.60 99.48 100.00 100.00 100.00
    B 99.70 99.11 99.70 99.02 97.60 99.02 99.14 98.71 99.14
    C 98.00 97.00 97.00 81.25 86.24 81.40 31.70 31.00 32.00
    D 89.92 94.51 89.92 98.49 96.52 98.49 99.00 97.31 99.01

     | Show Table
    DownLoad: CSV

    Network Depth: Inspired by the experimental results presented in Table 4, we carry out ablation experiments on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets as three examples to study the impact of $ n_x $ (the number of blocks in GMLM) on the model performance. According to Table 5, we can find that $ n_x = 1 $ results in the best clustering results. However, the increase in network depth leads to a slight decrease in clustering performance on the three used datasets. It needs to be emphasized that the sizes of the transformation matrices are set to ($ 49\times 25 $, $ 25\times 20 $) and ($ 49\times 25 $, $ 25\times 20 $, $ 20\times 15 $) under the two settings of $ n_x = 2 $ and $ n_x = 3 $, respectively. Therefore, the lose of pivotal structural information in the process of multi-stage deep subspace learning is considered to be the fundamental reason for the degradation of model ability. In summary, these experimental findings not only reaffirm the effectiveness of the designed GMLM for subspace learning again, but also reveal the degradation issue of Grassmannian networks. In the future, we plan to generalize the Euclidean residual learning mechanism to the context of Grassmannian manifolds to mitigate the above problem.

    Table 5.  Comparison under different $ n_x $ on the MNIST-USPS, Fashion, and Multi-COIL-10 datasets.
    MNIST-USPS Fashion Multi-COIL-10
    Metrics ACC NMI Purity ACC NMI Purity ACC NMI Purity
    $ n_x=1 $ 99.82 99.52 99.82 99.58 98.83 99.58 100.00 100.00 100.00
    $ n_x=2 $ 99.68 99.08 99.68 99.39 98.37 99.39 98.14 96.71 98.14
    $ n_x=3 $ 99.58 98.73 99.58 99.53 98.71 99.53 99.14 98.13 99.14

     | Show Table
    DownLoad: CSV

    To measure the impact of the trade-off parameters (i.e., $ \alpha $, $ \beta $, and $ \gamma $) in Eq (3.13) on the clustering performance of the proposed method, we make experiments on the ORL and MNIST-USPS datasets as two examples. The purpose of introducing these three trade-off parameters to Eq (3.13) is to balance the magnitude of the Grassmannian contrastive learning term $ L_g $, Euclidean contrastive learning term $ L_c $, and regularization term $ L_a $, so as to learning an effective network embedding for clustering. From Figure 2, we have some interesting observations. First, it is not recommended to endow $ \beta $ with a relatively small value. The basic reason is that $ L_c $ is not only used to learn discriminative features, but also for the final label prediction. What is more, we can observe that when the value of $ \beta $ is fixed, the clustering accuracy of the proposed method exhibits a trend of first increasing and then decreasing with the change of $ \gamma $. In addition, it can be found that the value of $ \gamma $ cannot be greater than $ \beta $. Otherwise, the performance of our method will be significantly affected. The fundamental reason is that a larger $ \gamma $ or a smaller $ \beta $ will cause the gradient information related to $ L_c $ to be dramatically weakened in the network optimization process, which is not conducive to learning an effective clustering hypersphere. Besides, we can also observe that the model tends to be less sensitive to $ \beta $ and $ \gamma $ when they vary within the ranges of 0.05$ \sim $0.005 and 0.01$ \sim $0.001, respectively. These experimental comparisons support our assertion that the regularization term helps to fine-tune the clustering performance.

    Figure 2.  Accuracy comparison under different values of $ \beta $ and $ \gamma $ on the MNIST-USPS and ORL datasets.

    In this part, we further investigate the impact of $ \alpha $ on the accuracy of the proposed method. We take the MNIST-USPS dataset as an example to conduct experiments, while changing the value of $ \alpha $, and we fix $ \beta $ and $ \gamma $ at their optimal values as shown in Figure 2.

    It can be seen from Figure 3 that the accuracy of the proposed method generally shows a trend of first increasing and then decreasing, and reaches a peak when $ \alpha $ is set to 0.02. Besides, $ \alpha $ is not suggested to be endowed with a relatively large value. Otherwise, the gradient information associated with $ L_c $ will be weakened in the network optimization process, which is not conducive to the learning of a discriminative network embedding. These experimental results not only further demonstrate the criticality of $ L_c $, but also underscore the effectiveness of $ \alpha $ in fine-tuning the discriminability of the learned representations.

    Figure 3.  Study the impact of $ \alpha $ on the model accuracy on the MNIST-USPS dataset.

    All in all, these experimental observations confirm the complementarity of these three terms in guiding the proposed model to learn more informative features for better clustering. In this paper, our guideline for choosing their values is to ensure that the order of magnitude of the Grassmannian contrastive learning term $ L_g $ and the regularization term $ L_a $ do not exceed that of the Euclidean contrastive learning term $ L_c $. With this criterion, the model can better integrate the gradient information of $ L_a $ and $ L_g $ regarding the data distribution with $ L_c $ to learn a more reasonable hypersphere for different views. On the MNIST-USPS dataset, the eligible values of $ \alpha $, $ \beta $, and $ \gamma $ are set to 0.02, 0.05, and 0.005, respectively. We use a similar way to determine that their appropriate values are respectively configured as (0.005, 0.005, 0.005), (0.01, 0.01, 0.01), (0.01, 0.01, 0.005), and (0.1, 0.1, 0.1) on the Fashion, Multi-COIL-10, ORL, and Scene-15 datasets. For a new dataset, the aforementioned principle can help the readers quickly determine the initial value ranges of $ \alpha $, $ \beta $, and $ \gamma $.

    To intuitively test the effectiveness of the proposed method, we select the Fashion and Multi-COIL-10 datasets as two examples to perform 2-D visualization experiments. The experimental results generated by the t-SNE technique [66] are presented in Figure 4, where different colors denote the labels of different clusters. It can be seen that compared to the case where GMLM is not included, the final clustering results, measured by the compactness between similar samples and the diversity between dissimilar samples, are improved by using GMLM on both of the two benchmarking datasets. This further demonstrates that the designed GMLM can help improve the discriminability of cluster assignments.

    Figure 4.  Two-dimensional visualization of the clustering results on the Fashion and Multi-COIL-10 datasets with and without using GMLM.

    Besides, we further investigate the impact of GMLM on the computational burden of the proposed method. The experimental results are summarized in Table 6, where "w/o" means "without containing". Note that the parameters of the new architecture are kept as the originals.

    Table 6.  Comparison of training time (s/epoch) on the MNIST-USPS, Fashion, Multi-COIL-10, ORL, and Scene-15 datasets.
    Datasets MNIST-USPS Fashion Multi-COIL-10 ORL Scene-15
    DGMVCL-"w/o" GMLM 11.9 34.4 2.6 1.5 14.5
    DGMVCL 12.3 36.1 2.7 1.6 16.9

     | Show Table
    DownLoad: CSV

    From Table 6, we can see that the integration of GMLM slightly increases the training time of the proposed model across all the five used datasets. According to Section 3, it can be known that the main computational burden of GMLM comes from the QR decomposition used in the ReOrth layer. Nevertheless, as shown in Figure 4, GMLM can enhance the clustering performance of our method, demonstrating its effectiveness.

    Recent advancements in robust learning, such as projected cross-view learning for unbalanced incomplete multiview clustering [67], have highlighted the importance of handling noisy data and outlier instances. In this section, we evaluate the robustness of the proposed method when subjected to various levels of noise and occlusion, choosing CVCL [55] as a representative competitor. Figure 5 illustrates the accuracy (%) of different methods as a function of the variance of Gaussian noise added to the Scene-15 dataset. It can be seen that as the variance of the Gaussian noise increases, both the proposed method and CVCL experience a decline in accuracy. However, our method consistently outperforms CVCL across all the noise levels. This shows that the suggested model is robust to the Gaussian noise, maintaining good clustering ability even under severe noise levels.

    Figure 5.  Accuracy of different methods as a function of the variance of Gaussian noise on the Scene-15 dataset.

    We further investigate the robustness of the proposed method when handling the data with random pixel masks, selecting the Fashion dataset as an example. Some sample images of the Fashion dataset with 15%, 30%, 45%, and 60% of the pixels masked are shown in Figure 6. According to Table 7, we can see that the performance of the proposed method is superior to that of CVCL under all the corruption levels. These experimental findings again demonstrate that the suggested modifications over the baseline model are effective to learn more robust and discriminative view-invariant representations. However, as the occlusion rate and the level of Gaussian noise increase, the performance of the proposed DGMVCL shows a noticeable decline. As discussed in Section 3.1.2, each element in the orthogonal basis matrix on the Grassmannian manifold represents the correlation between the original feature dimensions. Although the ability to encode long-range dependencies between different local feature regions enables our model to capture more useful information, it is also susceptible to the influence of local prominent noise. In such a case, the contrastive loss may fail to effectively distinguish between positive and negative samples. This is mainly attributed to the fact that the contrastive learning term treats the decision space as an explicit function of the data distribution. In the future, incorporating techniques like those studied in [67], such as multiview projections or advanced data augmentation, could improve the model's ability to handle these challenges.

    Figure 6.  Noisy images with different mask levels of the Fashion dataset.
    Table 7.  Accuracy (%) of different methods on the Fashion dataset with random pixel masks.
    Methods 15% 30% 45% 60%
    CVCL 98.49 97.23 95.06 89.48
    DGMVCL 99.16 98.29 97.28 92.70

     | Show Table
    DownLoad: CSV

    Innovation Analysis: The novelty of our proposed DGMVCL lies not in the mere combination of several existing components but in the thoughtful and innovative way in which these components are integrated and optimized, leading to a lightweight and discriminative geometric learning framework for multiview subspace clustering. Specifically, the suggested DGMVCL introduces several pivotal innovations: ⅰ) The Grassmannian neural network, designed for geometric subspace learning, could not be treated as a simple attempt on a new vision application, but rather as an intrinsic method for encoding the underlying submanifold structure of channel features. This is crucial for enabling the model to learn more effective subspace features; ⅱ) The proposed method introduces contrastive learning in both Grassmannian manifolds and Euclidean space. Compared to the baseline model (CVCL [6]) that utilizes the Euclidean-based contrastive loss for network training (in this paper), the additional designed Grassmannian contrastive learning module enables our DGMVCL to characterize and learn the geometrical distribution of the subspace data points more faithfully. Therefore, such a dual-space contrastive learning mechanism is eligible to improve the representational capacity of our model and is capable of extracting view-invariant representations; ⅲ) Extensive evaluations across multiple benchmarking datasets not only demonstrate the superiority of our proposed DGMVCL over the state-of-the-art methods, but also underscores the significance of each individual component and their complementarity.

    The Effectiveness of Grassmannian Representation: The Grassmannian manifold is a compact representation of the covariance matrix and encodes the vibrant subspace information, which has shown great success in many applications [11,27,28]. Inspired by this, the MMM is designed to capture and parameterize the $ q $-dimensional real vector subspace formed by the features extracted from the FEM. However, the Grassmannian manifold is not a Euclidean space but a Riemannian manifold. We therefore adopt a Grassmannian network to respect the latent Riemannian geometry. Specifically, each network layer can preserve the Riemannian property of the input feature matrices by normalizing each one into an orthonormal basis matrix. Besides, each manifold-valued weight parameter of the FRMap layer is optimized on a compact Stiefel manifold, not only maintaining its orthogonality but also ensuring better network training. The projector perspective studied in [32] shows that the Grassmannian manifold is an embedded submanifold of the Euclidean space of symmetric matrices, allowing the use of an extrinsic distance, i.e., a projection metric (PM), for measuring the similarity between subspaces over the Grassmannian manifold. In addition, the PM can approximate the true geodesic distance up to a scale factor of $ \sqrt{2} $ and is more efficient than the geodesic distance. By leveraging the PM-based contrastive loss, the consistency between cluster assignments across views will be intensified from the Grassmannian perspective while their underlying manifold structure is preserved.

    To intuitively demonstrate the effectiveness of MMM, we conduct a new ablation study to evaluate the clustering ability of the proposed method that does not contain it. Note that the learning rate, batch size, and three trade-off parameters of the new model remain the same as the original, while the size of the input feature matrix of GMLM becomes $ 49\times 64 $. The experimental results on the five used datasets are summarized in Table 8, where "w/o" means "without containing". From Table 8, we can see that removing the designed MMM from DGMVCL leads to a significant decrease in its accuracy across all the five used datasets. This not only confirms the significance of MMM in capturing and parameterizing the underlying submanifold structure, but also reveals the effectiveness of our proposed model in preserving and leveraging the Riemannian geometry of the data for improved clustering performance.

    Table 8.  Accuracy (%) comparison on the MNIST-USPS, Fashion, Multi-COIL-10, ORL, and Scene-15 datasets.
    Datasets MNIST-USPS Fashion Multi-COIL-10 ORL Scene-15
    DGMVCL-"w/o" MMM 10.00 10.00 10.29 2.60 9.14
    DGMVCL 99.82 99.52 100.00 92.25 61.29

     | Show Table
    DownLoad: CSV

    Selection of Network Parameters: The selection of network parameters is based on experiments and analysis to ensure optimal outcomes. Specifically, the trade-off parameters $ \alpha $, $ \beta $, and $ \gamma $ play a critical role in balancing the contributions of different loss functions in the overall objective function. The magnitude of the pivotal loss functions should be slightly higher. Based on this guideline, we can roughly determine their initial value, signified as the anchor point. Then, a candidate set can be formed around the selected anchor point. After that, we can conduct experiments to determine their optimal values. Additionally, the learning rate and batch size are crucial for the convergence and effectiveness of the proposed model. A too-high learning rate might cause the model to diverge, while a too-low one would slow down the training process [68]. Since CVCL [55] is our base model, we treat its learning rate as the initial value and adjust around it to find the suitable one. The batch size is configured to balance the memory usage and training efficiency. For the Scene-15 dataset, a batch size of 69 was chosen because this dataset contains 4485 samples, and 69 divides this number evenly, ensuring efficient utilization of data in each batch. Additionally, as shown in Figure 7, this batch size can yield good performance. However, in practice, the batch size is also related to the computing device. On the MNIST-USPS, Multi-COIL-10, Fashion, ORL, and Scene-15 datasets, the learning rate and batch size are specifically configured as (0.0002, 50), (0.0001, 50), (0.0005,100), (0.0001, 50), and (0.001, 69), respectively. Furthermore, the experimental results presented in Figure 8 suggest that it is appropriate to configure the size of the transformation matrix in the FRMap layer as $ 49 \times 25 $. When $ d_k $ is assigned to a small value, some useful geometric information will be lost during feature transformation mapping. In contrast, a relatively larger $ d_k $ results in more redundant information being incorporated into the generated subspace features. Both of these two cases have a negative impact on the model performance.

    Figure 7.  Accuracy of our proposed method under different batch sizes on the Scene-15 dataset.
    Figure 8.  Accuracy of our method under different sizes of the weight matrix in the FRMap layer on the Scene-15 dataset.

    In short, the choice of network parameters are supported by both theoretical considerations and empirical evidence, and they contribute to the overall performance of the proposed model.

    Other Datasets: In this part, the Caltech-101 dataset [39] has been applied to further evaluate the effectiveness of the proposed model. This dataset is a challenging benchmark for object detection, which consists of 101 different object categories as well as one background category, totaling approximately 9146 images.

    The experimental results achieved by different comparative models on the Caltech-101 dataset are listed in Table 9. Note that the learning rate, batch size, and the size of the weight matrix in the FRMap layer of the proposed DGMVCL are configured as 0.005, 50, and 49 × 25, respectively. According to Table 9, we can see that the clustering accuracy of our proposed DGMVCL are 8.87%, 1.62%, and 3.55% higher than that of DSIMVC, DCP, and CVCL, respectively. Additionally, under the other two validation metrics, i.e., NMI and purity, our method is still the best performer. This demonstrates that the suggested Grassmannian manifold-valued deep contrastive learning mechanism can learn compact and discriminative geometric features for MVC, even in complicated data scenarios.

    Table 9.  Accuracy (%) comparison on the Caltech-101 dataset.
    Methods ACC NMI Purity
    DSIMVC [63] 20.25 31.43 23.68
    DCP [64] 27.41 39.58 37.01
    CVCL [55] 25.48 37.67 36.63
    DGMVCL 29.03 45.81 40.02

     | Show Table
    DownLoad: CSV

    In this paper, a novel framework is suggested to learn view-invariant representations for multiview clustering (MVC), called DGMVCL. Considering the submanifold structure of channel features, a Grassmannian neural network is constructed for the sake of characterizing and learning the subspace data more faithfully and effectively. Besides, the contrastive learning mechanism built upon the Grassmannian manifold and Euclidean space enables more discriminative cluster assignments. Extensive experiments and ablation studies conducted on five MVC datasets not only demonstrate the superiority of our proposed method over the state-of-the-art methods, but also confirm the usefulness of each designed component.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported in part by the National Natural Science Foundation of China (62306127, 62020106012, 62332008, U1836218), the Natural Science Foundation of Jiangsu Province (BK20231040), the Fundamental Research Funds for the Central Universities (JUSRP124015), the Key Project of Wuxi Municipal Health Commission (Z202318), and the National Key R & D Program of China (2023YFF1105102, 2023YFF1105105).

    The authors declare no conflict of interest.

    [1] Sanderson MA, Agblevor F, Collins M, et al. (1996) Compositional analysis of biomass feedstocks by near infrared reflectance spectroscopy. Biomass Bioenerg 11: 365–370. doi: 10.1016/S0961-9534(96)00039-6
    [2] Moore KA, Owen NL (2001) Infrared spectroscopy studies of solid wood. Appl Spectros Rev 36: 65–86. doi: 10.1081/ASR-100103090
    [3] Kelly SS, Jellison J, Goodell B (2002) Use of NIR and MBMS coupled with multivariate analysis for detecting the chemical changes associated with brown-rot biodegradation of spruce wood. FEMS Microbiol Lett 209: 107–111.
    [4] Hames BR, Thomas SR, Sluiter AD, et al. (2003) Rapid biomass analysis : new tools for compositional analysis of corn stover feedstocks and process intermediates from ethanol production. Appl Biochem Biotech 105–108: 5–16.
    [5] Starks JP, Zuli D, Phillips WA, et al. (2006) Development of canopy reflectance algorithms for real-time prediction of bermudagrass pasture biomass and nutritive values. Crop Sci 46: 927–934. doi: 10.2135/cropsci2005.0258
    [6] Labbé N, Ye XP, Franklin JA, et al. (2008) Analysis of switchgrass characteristic using near-infrared spectroscopy. BioResources 3: 1329–1348.
    [7] Vogel KP, Dien BS, Jung HG, et al. (2011) Quantifying actual and theoretical ethanol yields for switchgrass strains using NIRS analyses. Bioenerg Res 4: 96–110. doi: 10.1007/s12155-010-9104-4
    [8] Shenk JS, Workman JJ, Westerhaus MO (2001) Application of NIR spectroscopy to agricultural products, In Handbook of Near-infrared Analysis, 2eds., DA Burns, and EW Ciurczak, New York : Marcel Dekker, Inc., 383–431.
    [9] Wolfrum EJ, Sluiter AD (2009) Improved multivariate calibration models for corn stover feedstock and dilute-acid pretreated corn stover. Cellulose 16: 567–576. doi: 10.1007/s10570-009-9320-2
    [10] Nkansah K, Dawson-Andoh B, Slahor J (2010) Rapid characterization of biomas using near infrared spectroscopy coupled with multivariate data analysis: Part 1 yellow-poplar (Liriodendron tulipifera L.). Bioresource Tech 101: 4570–4576. doi: 10.1016/j.biortech.2009.12.046
    [11] Monono EM, Haagenson DM, Pryor SW (2012) Developing and evaluating NIR calibration models for multi-species herbaceous perennials. Industrial Biotech 8: 285–292. doi: 10.1089/ind.2012.0018
    [12] Templeton DW, Scarlata CJ, Sluiter JB, et al (2010) Compositional analysis of lignocellulosic feedstocks. 2. Method uncertainties. J Agric Food Chem 58: 9054–9062. doi: 10.1021/jf100807b
    [13] English BC, De La Torre Ugarte DG, Walsh ME, et al. (2006) Economic competitiveness of bioenergy production and effects on agriculture of the southern region. J Agric Appl Econ 38: 389–402.
    [14] Mooney DF, Larson JA, English BC, et al. (2012) Effect of dry matter loss of profitability of outdoor storage of switcghrass. Biomass Bioenerg 44: 33–41. doi: 10.1016/j.biombioe.2012.04.008
    [15] Yu TE, English BC, Larson JA, et al. (2015) Influence of particle size and packaging on storage dry matter losses for switchgrass. Biomass Bioenerg 73: 735–144.
    [16] Chaoui H, Eckhoff SR (2014) Biomass feedstock storage for quantity and quality preservation, In: Engineering and science of biomass feedstock production and provision, 1ed. New York: Springer Science+Business Media, 165–193.
    [17] Kenney KL, Smith WA, Gresham GL, et al. (2013) Understanding biomass feedstock variability. Biofuels 4: 111–127. doi: 10.4155/bfs.12.83
    [18] Shinners KJ, Boettcher GC, Much RE, et al. (2010) Harvest and storage of two perennial grasses as biomass feedstocks. Trans ASABE 53: 359–370. doi: 10.13031/2013.29566
    [19] Shah A, Darr MJ, Webster K, et al. (2011) Outdor storage characteristics of single-pass large square corn stover bales in Iowa. Energies 4: 1687–1695. doi: 10.3390/en4101687
    [20] Smith WA, Bonner IJ, Kenney KL, et al. (2013) Practical considerations of moisture in baled biomass feedstocks. Biofuels 4: 95–110. doi: 10.4155/bfs.12.74
    [21] Darr MJ, Shah A (2012) Biomass storage: An update on industrial solutions for baled biomass feedstocks. Biofuels 3: 321–332. doi: 10.4155/bfs.12.23
    [22] Larson JA, Yu TE, Boyer CN, et al. (2010) Cost evaluation of alternative switchgrass producing, harvesting, storing, and transporting systems and their logistics in the southeastern USA. Agric Fin Rev 70: 184–200. doi: 10.1108/00021461011064950
    [23] Larson JA, Yu TE, English BC, et al. (2015) Effect of particle size and bale wrap on storage losses and quality of switchgrass. SunGrant Conference, Auburn, AL.
    [24] American Society of Testing Materials (ASTM), ASTM in West Conshohocken, PA: From Standard practices for infrared multivariate quantitative analysis. Available from: http://enterprise.astm.org/filtrexx40.cgi?+REDLINE_PAGES/E1655.htm. Accessed 15 December 2014.
    [25] National Renewable Energy Laboratory (NREL). (2008) Determination of extractives in biomass. National Renewable Energy Lab. NREL/TP-510-42619.
    [26] National Renewable Energy Laboratory (NREL). (2008) Determination of total solids in biomass and total dissolved solids in liquid process samples. National Renewable Energy Lab. NREL/TP-510-42621.
    [27] National Renewable Energy Laboratory (NREL). (2008) Determination of structural carbohydrates and lignin in biomass, National Renewable Energy Lab. NREL/TP-510-42618.
    [28] National Renewable Energy Laboratory (NREL). (2008) Determination of sugars, byproducts, and degradation products in liquid fraction process samples. National Renewable Energy Lab. NREL/TP-510-42623.
    [29] Labbé N, Ye XP, Franklin JA, et al. (2008) Analysis of switchgrass characteristics using near infrared spectroscopy. Bioresources 3: 1329–1348.
    [30] Çelen İ, Harper D, Labbé N (2008) A multivariate approach to the acetylated poplar wood samples by near infrared spectroscopy. Holzforschung 62: 189–196.
    [31] Taylor A, Labbé N, Noehmer A (2011) NIR-based prediction of extractives in American white oak hardwood. Holzforschung 65: 185–190.
    [32] Martens H, Naes T (1989) Multivariate calibration. Chichester: Wiley.
    [33] Esbensen KH (2002) Multivariate data analysis in practice: An introduction to multivariate data analysis and experimental design, 5 eds. Oslo, Norway: CAMO Process AS.
    [34] Rials TG, Kelley SS, So CL (2002) Use of advanced spectroscopic techniques for predicting the mechanical properties of wood composites. Wood Fiber Sci 34: 398–404.
    [35] Wold S, Sjöström M, Eriksson L (2001) PLS-regression: A basic tool of chemometrics. Chemometr Intell Lab. 58: 109–130.
    [36] Tsuchikawa S, Siesler HW (2003) Near-infrared spectroscopic monitoring of the diffusion process of deuterium-labeled molecules in wood. Part I: softwood. Appl Spectrosc 57: 667–674. doi: 10.1366/000370203322005364
    [37] Goulden CH (1952) Methods of statistical analysis, 2eds. New York: John Wiley.
    [38] Garde-Cerdán T, Lorenzo C, Alonso GL, et al. (2010) Employment of near infrared spectroscopy to determine oak volatile compounds and ethylphenols in aged red wines. Food Chem 119: 823–828. doi: 10.1016/j.foodchem.2009.07.026
    [39] Arvelakis S, Koukios EG (2013) Critical factors for high temperature processing of biomass from agriculture and energy crops to biofuels and bioenergy. WIREs Energy Environ 2: 441–455. doi: 10.1002/wene.28
    [40] Evans RJ, Milne TA (1987) Molecular characterization of the pyrolysis of biomass. 1. Fundamentals. Energ Fuel 1: 123–137. doi: 10.1021/ef00002a001
    [41] Everard CD, Fagan CC, McDonnell KP (2012) Visible-near infrared spectral sensing coupled with chemometric analysis as a method for on-line prediction of milled biomass composition pre-pelletising. J Near Infrared Spectrosc 20: 361–369.
    [42] Lestander TA, Johnsson B, Grothage M (2009) NIR techniques create added values for the pellet and biofuel industry. Bioresource Tech 100: 1589–1594. doi: 10.1016/j.biortech.2008.08.001
    [43] Hayes DJM (2012) Development of near infrared spectroscopy models for the quantitative prediction of the lignocellulosic components of wet Miscanthus samples. Bioresource Tech 119:393–405. doi: 10.1016/j.biortech.2012.05.137
    [44] Sluiter A, Wolfrum E (2013) Near infrared calibration models for pretreated corn stover slurry solids, isolated and in situ. J Near Infrared Spectrosc 21: 249–257.
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(7617) PDF downloads(1314) Cited by(9)

Figures and Tables

Figures(2)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog