Loading [MathJax]/extensions/TeX/mathchoice.js
Research article

Bäcklund transformation and soliton solutions for a generalized Wadati-Konno-Ichikawa equation

  • By employing a reciprocal transformation, we relate the generalized Wadati–Konno–Ichikawa (gWKI) equation to an associated gWKI (agWKI) equation. Utilizing the Darboux transformation of the agWKI equation and Bianchi's theorem of permutability, we derive the N-Bäcklund transformation for the gWKI equation. As an application, we obtain some soliton solutions for the gWKI equation, including smooth solitons, bursting solitons, and loop-type solitons. Furthermore, we explore the interactions between two solitons.

    Citation: Chenglu Zhu, Lihua Wu. Bäcklund transformation and soliton solutions for a generalized Wadati-Konno-Ichikawa equation[J]. AIMS Mathematics, 2025, 10(4): 7828-7846. doi: 10.3934/math.2025359

    Related Papers:

    [1] Xiang Gao, Linzhang Lu, Qilong Liu . Non-negative Tucker decomposition with double constraints for multiway dimensionality reduction. AIMS Mathematics, 2024, 9(8): 21755-21785. doi: 10.3934/math.20241058
    [2] A. El-Mesady, Y. S. Hamed, Khadijah M. Abualnaja . A novel application on mutually orthogonal graph squares and graph-orthogonal arrays. AIMS Mathematics, 2022, 7(5): 7349-7373. doi: 10.3934/math.2022410
    [3] Sanaa Al-Marzouki, Christophe Chesneau, Sohail Akhtar, Jamal Abdul Nasir, Sohaib Ahmad, Sardar Hussain, Farrukh Jamal, Mohammed Elgarhy, M. El-Morshedy . Estimation of finite population mean under PPS in presence of maximum and minimum values. AIMS Mathematics, 2021, 6(5): 5397-5409. doi: 10.3934/math.2021318
    [4] Nurcan Bilgili Gungor . Some fixed point results via auxiliary functions on orthogonal metric spaces and application to homotopy. AIMS Mathematics, 2022, 7(8): 14861-14874. doi: 10.3934/math.2022815
    [5] Khazan Sher, Muhammad Ameeq, Muhammad Muneeb Hassan, Basem A. Alkhaleel, Sidra Naz, Olyan Albalawi . Novel efficient estimators of finite population mean in stratified random sampling with application. AIMS Mathematics, 2025, 10(3): 5495-5531. doi: 10.3934/math.2025254
    [6] Ling Zhou, Zuhan Liu . Blow-up in a p-Laplacian mutualistic model based on graphs. AIMS Mathematics, 2023, 8(12): 28210-28218. doi: 10.3934/math.20231444
    [7] Igal Sason . Observations on graph invariants with the Lovász ϑ-function. AIMS Mathematics, 2024, 9(6): 15385-15468. doi: 10.3934/math.2024747
    [8] Jahfar T K, Chithra A V . Central vertex join and central edge join of two graphs. AIMS Mathematics, 2020, 5(6): 7214-7233. doi: 10.3934/math.2020461
    [9] Dijian Wang, Dongdong Gao . Laplacian integral signed graphs with few cycles. AIMS Mathematics, 2023, 8(3): 7021-7031. doi: 10.3934/math.2023354
    [10] Milica Anđelić, Saleem Khan, S. Pirzada . On graphs with a few distinct reciprocal distance Laplacian eigenvalues. AIMS Mathematics, 2023, 8(12): 29008-29016. doi: 10.3934/math.20231485
  • By employing a reciprocal transformation, we relate the generalized Wadati–Konno–Ichikawa (gWKI) equation to an associated gWKI (agWKI) equation. Utilizing the Darboux transformation of the agWKI equation and Bianchi's theorem of permutability, we derive the N-Bäcklund transformation for the gWKI equation. As an application, we obtain some soliton solutions for the gWKI equation, including smooth solitons, bursting solitons, and loop-type solitons. Furthermore, we explore the interactions between two solitons.



    In pattern recognition, cluster analysis is a formal study of methods and algorithms for dividing data into meaningful and useful clusters (groups) based on some intrinsic similarity measures[1]. It is a crucial intermediate step in exploring the potential structure of massive data for subsequent analysis. In the field of internet applications, clustering is used to identify users with different interest characteristics, thus realizing the personalized service recommendation process [2,3]. In the field of biology, clustering can be applied to discover patterns in genes, helping to identify subtypes of certain diseases (e.g., cancers)[4,5,6].

    The nonnegative matrix factorization (NMF) models (algorithms) have been widely studied as one of the important tools for achieving clustering. The NMF model finds wide applications in various domains, owing to its part-based representation and physical interpretability. In [7,8], NMF was used to decompose the original nonnegative high-dimensional matrix X into the product of two low-dimensional nonnegative matrices, WH, ensuring that X and WH are sufficiently close under some metric. Usually, W is referred to as the basis matrix and H is referred to as the coefficient matrix. Each element in the matrices W and H obtained above is nonnegative, and their rank is much lower than X itself, so this decomposition is called dimension reduction decomposition. This form of decomposition effectively saves storage and computational space, improving computational efficiency. In clustering analysis, the coefficient matrix H is used as a substitute matrix for the high-dimensional original matrix X for clustering tasks. The matrix H here is much lower in dimension than the original matrix X, and it will save a lot of computation and space.

    Lee and Seung [9] extended NMF by introducing a simple algorithm based on multiplication update rules (MUR) and demonstrating its effectiveness in learning parts of objects and discovering semantic features in text. Additionally, NMF has been successfully extended and applied in various fields, including clustering and classification [10,11,12], image analysis [13,14,15], spectral analysis [16,17], and blind source separation [18,19], and so on. Moreover, NMF has brought a series of new theories and algorithms for clustering problems. Numerous studies have shown that by introducing different constraints, such as sparsity [12,20,21], smoothness [16,22,23], and orthogonality [24,25,26], the performance of standard NMF can usually be further improved. This is because the addition of these constraints introduces other useful features beyond non negativity to the standard NMF model. In fact, many datasets not only do not follow linear distributions, but also exist in certain nonlinear spaces, such as manifold spaces. This requires the introduction of nonlinear strategies for effective processing, such as constructing a Laplace matrix [27,28] to maintain this nonlinear geometric structure between data points. This approach is known as manifold learning, which assumes that high-dimensional data has a nonlinear mapping in a low-dimensional space.

    Cai et al. effectively combined manifold learning with NMF by introducing a graph regularization term, which is the graph regularized nonnegative matrix factorization (GNMF[29]) method. This algorithm aims to establish smooth connections between data representations, encode geometric information between data in linear and nonlinear spaces, and find the representation of data on manifolds. The advantage of this method is that it not only preserves the inherent geometric structure, but also obtains a compact representation of hidden semantic information. Based on this, Li et al.[30] presents a graph-based local concept coordinate factorization (GLCF) method, which respects the intrinsic structure of the data through manifold kernel learning in the warped eproducing kernal Hilbert space. Hu et al. [31] combined the GNMF algorithm with the Convex Non-Negative Matrix Factorization (CNMF) algorithm and proposed the graph convex nonnegative matrix factorization (GCNMF) algorithm with manifold regularization terms; Meng et al. [32] introduced graph regularization terms in the sample space and feature space, and proposed the dual graph sparse nonnegative matrix factorization (DSNMF) algorithm. In addition, algorithms such as graph regularized discriminative non-negative matrix factorization (GDNMF) [33], dual regularization nonnegative matrix factorization(DNMF) [34], Incomplete multi-view clustering (IMVC) [35], collaborative topological graph learning (CTGL) [36], multi-view Deep nonnegative matrix factorization with multi-kernel learning (MvMKDNMF) [37], nonnegtive mtrix factorization with graph regularization for multi-view data representation (GMultiNMF) [38], and Pairwise constrained Graph Regularized Convex Nonnegative Matrix Factorization (PGCNMF) [39] applied constraints to the data through manifold learning, constructed different Laplacian regularization terms, and provided many basic methods for data analysis through the idea of manifold learning.

    Recently, orthogonal nonnegative matrix factorization (ONMF [24]) has received widespread attention by imposing orthogonal constraints on NMF, making it more suitable for clustering tasks as the constraint matrix composed of factors can be viewed as an indicator matrix. This is because local representation requires the fundamental vectors of the factor matrices to be different from each other, which means that local representation is closely related to orthogonality.

    The orthogonality of a given matrix A refers to ATA=I or AAT=I (I is an appropriate identity matrix), which means that any two different columns of A are orthogonal or rows of A are orthogonal.

    In [20] Li et al. found that for some datasets, part-based representation involves clustering the rows of the input data matrix, and the clustering result is described by the basis matrix W. The orthogonal constraint on W is column orthogonality, WTW=I. In document clustering [40,41], clustering is performed on the columns of the input data matrix, represented by the coefficient matrix H, and the orthogonality on each row of H can improve clustering performance. The orthogonality constraint here is now on coefficient matrix H satisfying HHT=I. The orthogonality models of the above two categories have greatly improved clustering performance[42,43,44]. Due to the excellent performance of ONMF in clustering, many update rules and variants of ONMF have been proposed [45,46,47].

    In fact, ONMF was initially proposed by Ding [24], and empirical evidence demonstrated that introducing a certain degree of orthogonality constraint into the algorithm can enhance sparsity, reduce redundancy in feature vectors, and ensure the uniqueness of the solution and interpretability of clustering results. In their method, orthogonality is achieved by solving an optimization problem with orthogonality constraints.

    There, ONMF was considered as a continuous relaxation (without discrete constraints) under the K-means method. Research on various data mining tasks has shown that compared to clustering methods based on K-means and NMF, the ONMF model performs better. However, in order to achieve the orthogonality of the basis vectors, this requires intensive computation, especially for word clustering in the document matrix, which results in high computational costs. Next, in 2010, Li et al. proposed nonnegative matrix factorization on orthogonal subspace (NMFOS) [48], introducing the concept of penalty functions to incorporate orthogonality constraints into the objective function. This orthogonality, which is achieved by adding it as a penalty term to the objective function, is established to reduce computational burden and to some extent improve the sparsity of the decomposition matrices (W matrix or H matrix). However, this model has two issues. First, when applied to certain document datasets, the algorithm may exhibit zero locking phenomenon; Second, the convergence proof proposed in it is not rigorous enough (due to insufficient correctness of matrix differentiation), that is, it fails to effectively prove that the iterative algorithm does indeed converge.

    Overall, ONMF models have shown good performance, but still face the following two challenges: significant computational costs when used as constraints, and difficulty or absence in proving the convergence of algorithms when orthogonality WTW=I or HHT=I is introduced as a penalty term. Based on the above observations, this paper will study a new ONMF-based model graph-regularized nonnegative matrix factorization orthogonal subspace with auxiliary variable(GNMFOSV). By adding an orthogonal penalty term and a graph Laplacian term, we achieve the orthogonality of the factor matrix through factorization, enhancing the sparsity of the coefficient matrix to some extent. Leveraging the geometric structure properties of the data, we enhance the algorithm's capability to recover the underlying nonlinear relationships in the data. More importantly, we will introduce an intermediate variable V as an alternating update in the auxiliary function, and prove the convergence of our algorithm using the idea of alternating updates of three variables. Meanwhile, experiments on a large number of datasets have also shown that this algorithm greatly improves clustering performance and has a certain degree of robustness to the dataset. In other words, experimental results on many datasets have shown that our algorithm maintains overall optimality.

    The main contributions of this paper are as follows:

    The ONMF method obtains sparser results through orthogonal constraints. However, in some complex practical application scenarios, it is still difficult to interpret the physical meanings or practical implications of the basis vectors and coefficient matrices obtained after decomposition, which is not conducive to a deep understanding of the internal structure and features of the data. At the same time, it is also very difficult to prove the convergence of the algorithm. The GNMF method preserves the inherent geometric structure of the data and also acquires a compact representation of the hidden semantic information. Nevertheless, this method overemphasizes local information and relatively ignores the global structure and dependency relationships of the data. In this paper, in order to improve the interpretability of ONMF and the theoretical convergence of the algorithm, and also to retain the global structure and dependency relationships of the data in the GNMF model, the two methods are integrated, and an auxiliary function is ingeniously added, resulting in a rigorous convergence proof. The ablation experiments also fully demonstrate that integrating the graph regularization method, orthogonal constraints, and the auxiliary variable mechanism can significantly improve the clustering accuracy and stability, outperforming single methods on multiple datasets, which fully verifies the rationality and innovation of the model design.

    In our algorithm, different levels of orthogonality can be achieved by adjusting the parameter α1 and α2, and various regularization strengths can be obtained by tuning the parameter λ, making the algorithm more flexible and applicable to a broader range of scenarios.

    Extensive experiments were conducted on 13 datasets, demonstrating the algorithm's effectiveness in terms of clustering performance. The results not only highlight its efficiency in learning and enhanced discriminative power but also indicate its ability to acquire improved partial linear representations.

    The remaining content of this paper is as follows: Section 2 reviews related work, including NMF, NMFOS, and GNMF. Section 3 provides a detailed introduction to our method, including the derivation of update rules and the corresponding algorithm, along with a proof of the algorithm's convergence. Section 4 presents experiments and analysis. Section 5 offers conclusions and future directions.

    In this section, some symbols and notations used throughout the paper will be provided in Tables 1 below, followed by classic models of NMF, NFMOS, GNMF and their multiplicative update algorithms.

    Table 1.  The symbols used throughout the paper.
    Symbols Annotations
    N The set of all natural numbers
    Rυ×ω+ The set of all nonnegative matrices with υ,ωN
    X Input Nonnegative data matrix X=[X1,,XN]RM×N+
    W The basis matrix W=[Wik]RM×K+
    H The coefficient matrix H=[Hkj]RK×N+
    V The auxiliary matrix V=[Vkj]RK×N+
    M The number of data features(or denoted as data dimension)
    N The number of data instances(or denoted as data points)
    K The number of data clusters
    i Elements in the sets {1,2,,M}
    j Elements in the sets {1,2,,N}
    k Elements in the sets {1,2,,K}
    F Frobenius norm
    Tr() The trace operator
    L The graph Laplacian operators of the data manifold
    S The weighted matrix
    D The diagonal matrices with Dii=jSij
    p The size of nearestneighbors
    I The identity matrix
    α1,α2 The penalty parameter for controlling orthogonality α10,α20
    λ The regularization parameter λ0
    ψ,ϕ Lagrange multiplier

     | Show Table
    DownLoad: CSV

    NMF [9] is a classic matrix factorization algorithm that requires all elements in the decomposed matrix to be nonnegative. Consider the objective matrix X=[x1,,xN]RM×N+ and treat each xj of X as a column vector. Nonnegative matrix factorization of X refers to finding two matrices W and H (W=[wik]RM×K+, H=[hkj]RK×N+, i=1,2,,M, j=1,2,,N and k=1,2,,K), such that the matrix-product WH is as equal as possible to X under a certain metric:

    X WH. (2.1)

    Generally speaking, W is called the basis matrix and H is called the coefficient matrix. This is because (2.1) tells us that each column vector xj of X can be linearly represented using the column vectors of W in this factorization, where the coefficients come from matrix H. The specific representation is as follows:

    xjKk=1 wkhkj, (2.2)

    where wk represents the column vectors of W, and K in the (2.2) is much smaller than M, N, or satisfies (M+N)K<MN. It is precisely because of (2.2) that the decomposed matrix W is called the basis matrix and matrix H is called the coefficient matrix. Afterward, matrix H will be used in clustering analysis applications, similar to the indicator set matrix in K-means. Meanwhile, the fact that K is much smaller than M and N makes this decomposition a dimensionality reduction decomposition, so clustering with H (not X) will greatly save time and space, and the mathematical model of NMF is

    minW,HXWH2F,s.t.W0,H0, (2.3)

    where W=[w1,,wM]RM×K+, H=[h1,,hN]RK×N+. In addition, we will import an iterative multiplicative update algorithm from Lee and Seung [50].

    WikWik(XHT)ik(WHHT)ik, HkjHkj(WTX)kj(WTWH)kj.

    Chris Ding et al.[24] extended the traditional NMF by incorporating an orthogonal constraint on the basis or coefficient matrix into the objective function's constraints, and this model is the important ONMF model. More importantly, the equivalence between ONMF and kernel K-means methods was also proved. Li et al. [48] found that the Lagrangian multiplier used in the nonlinear constraint optimization in ONMF is a symmetric matrix that needs to be updated at each iteration, resulting in a large computational workload. So, to overcome this limitation, they proposed a novel approach called NMFOS (using penalty terms). Then, an improved optimization model was obtained in the following form:

    minW,HXWH2F+αHHTI2F,s.t.W0,H0, (2.4)

    where X=[x1,,xN]RM×N+, W=[w1,,wM]RM×K+, H=[h1,,hN]RK×N+, I is the identity matrix with K rows and K columns.The nonnegative parameter α0 controls the degree of orthogonality of the vectors in H. When α=0, NMFOS reduces to NMF. Through model (2.4), the following update rules were derived:

    WikWik(XHT)ik(WHHT)ik, HkjHkj(WTX+2αH)kj(WTWH+2αHHTH)kj. (2.5)

    The advantage of this algorithm is that by adjusting the penalty parameter α, orthogonality can be achieved during the decomposition process without the need for additional constraints, while reducing computational complexity. By the way, it should be noted that Mirzal [51] has proved the convergence of (2.5) from [24].

    The extraction of the internal structure of data is an important application extension based on NMF models. Graph regularization is a manifold-based data processing method widely used to analyze the intrinsic geometric structure of manifold data [49,52,53]. The starting point of this idea is that manifolds can enhance the smoothness of data in both linear and nonlinear spaces by minimizing the formula later in (2.6), and by extracting the geometric structure information from data points and features, manifolds can also enhance the clustering performance of NMF (see details in [34,54,55]). The study of NMF models based on such minimization forms is the classic GNMF model [29].

    Next, let's briefly introduce the specific construction process of GNMF. To effectively capture the geometric structure of both data and feature manifolds, the p-nearest neighbor graph is constructed based on the input data matrix X=[x1,,xN]RM×N+. Considering a graph with N vertices, which make up the current X, each vertex corresponds to a data point xj (j=1,2,,N). For each data point xj, we identify its p-nearest neighbors and connect them with edges (theoretically, p can take any value from {1,2,,N}, but in practice, it is usually between 3 and 5). At the same time, a weight matrix S is defined, which is used to measure the degree to which adjacent points are not only similar in distance but also similar in geometric structure, thereby strengthening the influence of geometric structure on the model. There are many options for defining the weight matrix S={Sjl}RN×N+ on the graph, with the commonly used ones being the following three options:

    01 Weight Matrix: If N(xj) represents the set of p-nearest neighbors for data point xj, then

    Sjl={1,xjN(xl) or xlN(xj),0,otherwise, j,l{1,,N}.

    Heat Kernal Weight Matrix: If point xj is connected to point xl, then,

    Sjl=exjxl2σ.

    Dot-Product Weight Matrix: If point xj is connected to point xl, then,

    Sjl=xTjxl.

    It is worth noting that if the data point x is normalized to 1, the above dot-product of two vectors is equal to their cosine similarity.

    With the help of the weight matrix S, the Laplacian matrix LRN×N+ is defined to characterize the degree of similarity between points in geometric structure, and the Laplacian matrix L is defined as L=DS, where DRN×N+ is the degree matrix with Dii=jSij. Note that, D is a diagonal matrix. Inspired by matrix factorization and manifold learning, Cai et al.[29] proposed the GNMF algorithm. The core graph regularization formula is as follows:

    12nl=1nj=1hlhj22Sij=nj=1DjjhTjhjnl=1nj=1SljhTihj=Tr(HDHT)Tr(HSHT)=Tr(HLHT), (2.6)

    where Tr() denotes the trace operator and measures the pairwise similarity of the original data points. By minimizing (2.6), an intuitive result can be obtained: If two data points are close in terms of the intrinsic geometric structure of the original data distribution, they will also be close with respect to the new basis vectors hl and hj. Then, the optimization model of GNMF is given by:

    minXWH2F+λTr(HLHT),s.t.W0,H0, (2.7)

    where W=[w1,,wM]RM×K+, H=[h1,,hN]RK×N+. The regularization parameter λ0 controls the smoothness of the model. By utilizing the Lagrange method, the model (2.7) can be solved and the iterative update rules are obtained below:

    WikWik(XHT)ik(WHHT)ik, HkjHkj(WTX+λHS)kj(WTWH+λHD)kj. (2.8)

    It is evident that when λ=0, the update rules in (2.8) go back to the form of NMF update rules. In this paper, such manifold techniques are combined with NMFOS to enhance the capability of capturing underlying nonlinear relationships in data recovery.

    In this section, we will introduce a new method called the GNMFOSV. More importantly, we also provide a convergence proof for this type of model to compensate for the lack or inadequacy of algorithm convergence proof. The specific operation is as follows: First, using the idea of NMFOS, the orthogonal constraint conditions on the coefficient matrix H are added to the objective function by using the penalty function method. Second, inspired by GNMF, our new model utilizes the geometric structures inherent in the data manifold and feature manifold to evolve the manifold term into additional regularization terms and add them to the objective function for further clustering research. From the experimental results, it can be seen that combining orthogonality and graph regularization (manifold) and adding them separately to the objective function not only solves the problem of considering both the nonnegativity and orthogonality of the coefficient matrix, but also improves the clustering performance via GNMFOSV.

    As mentioned above, we incorporate both the orthogonality constraint and the Laplacian constraint into the objective function to construct GNMFOSV:

    minJ(W,H)=minW,H12XWH2F+α4HHTI2F+λ2Tr(HLHT),s.t.W0,H0, (3.1)

    where X=[x1,,xN]RM×N+, W=[w1,,wM]RM×K+, H=[h1,,hN]RK×N+, LRN×N+, I is the identity matrix with K rows and K columns. Due to the property that a nonnegative matrix with pairwise orthogonal rows may have at most one nonzero entry in each column, the generation of strictly uncorrelated feature vectors is possible. Thus, we incorporate an orthogonal penalty term, αHHTI2F, into the foundational model. However, directly imposing orthogonality constraints on the coefficient matrix poses challenges for convergence proof. Therefore, we introduce an auxiliary variable, denoted as VRK×N+, with the matrix dimension equivalent to that of H. Subsequently, two new parameters, α1 and α2, are introduced to split the orthogonal constraint penalty term, αHHTI2F, into two second-order terms. This is equivalently replaced by α12IHVT2F+α22VH2F, i.e.,

    αHHTI2F=α12IHVT2F+α22VH2F. (3.2)

    Therefore, we equivalently replace model (3.1) with the following new model:

    minJ(W,V,H)=minW,H,V12XWH2F+λ2Tr(HLHT)+α12IHVT2F+α22VH2F,s.t.W0,H0,V0. (3.3)

    This formulation incorporates the orthogonality constraint with an auxiliary variable as a penalty term in the objective function(see Figure 1). The advantage of introducing the auxiliary variable lies in its ability to deconstruct the complex orthogonality constraints found in NMFOS, thereby addressing the common challenges of convergence proof encountered in traditional orthogonal subspace NMF methods, and facilitating rapid convergence and favorable comparative performance in our experiments.

    Figure 1.  The framework of GNMFOSV.

    Here, α10, α20 are the two orthogonal penalty parameters, and λ0 is the regularization parameter, respectively. The first two parameters adjust the level of orthogonality in H and ensure the validity of Eq (3.2), while the latter balances the reconstruction error in the first term and the graph regularization in the third term of the objective function. For convenience, we employ a 01 weight scheme to construct the p-nearest neighbor graph. Clearly, when α1=α2=λ=0, model (3.3) degenerates into model (2.3). Since the objective function in model (3.3) is non-convex with respect to both W and H, obtaining closed-form solutions is challenging. Now, we need to explore how to develop an effective algorithm to solve this model. We adopt an iterative optimization algorithm and utilize multiplicative update rules to obtain local optima (a similar idea can be found in [56,57,58], but not the same). We can rewrite the objective function of J(W,V,H) in trace form as follows:

    J(W,V,H)=12Tr((XWH)T(XWH))+λ2Tr(HLHT)+α12Tr((IHVT)T(IHVT))+α22Tr((VH)T(VH))=12Tr(XTX2XTWH+HTWTWH)+λ2Tr(HLHT)+α12Tr(I2HVT+VHTHVT)+α22Tr(VTV2VTH+HTH). (3.4)

    Let ψ, ϕ, and θ be the Lagrange multipliers associated with the constraints W0, H0, and V0 in the constrained model (3.3). Define matrices Ψ=[ψik], Φ=[ϕkj], and Θ=[θkj]. The Lagrangian function for the constrained optimization problem can then be expressed as:

    L=J(W,V,H)Tr(ΨTH)Tr(ΦTW)Tr(ΘTV)=12Tr(XTX2XTWH+HTWTWH)+λ2Tr(HLHT)+α12Tr(I2HVT+VHTHVT)+α22Tr(VTV2VTH+HTH)Tr(ΨTH)Tr(ΦTW)Tr(ΘTV).

    Now, we will have the following equations by taking the derivatives of the Lagrangian function L with respect to W, V and H.

    LW=WHHTXHTΨ, (3.5)
    LV=α1(VHTHH)+α2(VH)Θ, (3.6)

    and

    LH=WTWHWTX+λHDλHS+α1(HVTVV)+α2(HV)Φ. (3.7)

    By setting Eqs (3.5)–(3.7) to zero, we can derive the expressions for the Lagrange multipliers:

    Ψ=WHHTXHT, (3.8)
    Θ=α1(VHTHH)+α2(VH), (3.9)

    and

    Φ=WTWHWTX+λHDλHS+α1(HVTVV)+α2(HV). (3.10)

    Furthermore, at both ends of Eqs (3.8)–(3.10), we multiply them by factors Wik, Vkj, and Hkj, respectively. With the help of Karush-Kuhn-Tucker (KKT) conditions ψikWik=0, θkjVkj=0, and ϕkjHkj=0, we will obtain the following three equations:

    ψikWik=[WHHTXHT]ikWik=0, (3.11)
    θkjVkj=[α1(VHTHH)+α2(VH)]kjVkj=0, (3.12)
    ϕkjHkj=[WTWHWTX+λHDλHS+α1(HVTVV)+α2(HV)]kjHkj=0. (3.13)

    Therefore, based on Eqs (3.11)–(3.13), we derive the iterative formulas for updating W, V, and H in the optimization of the model (3.3):

    WikWik[XHT]ik[WHHT]ik, (3.14)
    VkjVkj[(α1+α2)H]kj[α1VHTH+α2V]kj, (3.15)
    HkjHkj[WTX+λHS+(α1+α2)V]kj[WTWH+λHD+α1HVTV+α2H]kj. (3.16)

    We develop an efficient algorithm to solve model (3.3) in the Algorithm 1. The proof of the convergence of the Algorithm 1 can be found in the Supplementary.

    Algorithm 1 GNMFOSV
    Require: Original data matrix X=[x1,,xN]RM×N+, the number of clusters K, orthogonality parameters α10,α20, regularization parameter λ0, the maximum number of iterations T;
    Ensure: Locally optimal basis matrix W and solution coefficient matrix H
    1: Initialization randomly choose initial matrices W0RM×K+, V0RK×N+ and H0RK×N+;
    2: Start updating t=1,,T;
    3: Updating W according to Eq (3.14);
    4: Updating V according to Eq (3.15);
    5: Updating H according to Eq (3.16);
    6: t=t+1;
    7: If the termination conditions are not satisfied or tT, return step 3; otherwise terminates.

    In this subsection, we analyze the computational complexity of Algorithm 1, which is the amount of space requirements needed to run the algorithm. The notation O(n) is used to represent the computational complexity of the algorithm and is of the same order as n. Recall that M denotes the number of features, N denotes the number of samples, K denotes the number of clusters, and p denotes the number of nearest neighbors. The computational complexity of XHT, WHHT, WTX, and WTWH are all MNK. Moreover, the computational complexity of HD and HS are both N2K, while the computational complexity of VHTH and HVTV is NK2. Besides, we need to provide an additional complexity of O(N2M) to construct the p-nearest neighbor graph. In conclusion, the computational complexity of each iteration in Algorithm 1 is O(MNK+N2M+N2K+NK2). During the experiment processing, we found that when conducting algorithm comparison experiments, the differences in time consumption among various algorithms on the author's computer were not significant. Therefore, data on time consumption was not included.

    In this section, via lots of experiments, we will verify and study the performance of our algorithm in many aspects, including algorithm parameter selection analysis, clustering effect comparison experiment, and convergence experiment. In order to show these, we have compared the results based on the following datasets.

    To verify the performance superiority of the proposed model, we conduct comprehensive experiments on 13 real-world datasets involving face images, digit images, and documents.

    ORL*: Olivetti research laboratory dataset(ORL). This dataset contains 400 face images collected from 40 individuals and each subject has 10 grayscale images with size 32*32.

    *https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

    AR[59]: Augmented reality dataset(AR). This dataset contains images of 65 men and 55 women, with each person contributing 14 images, resulting in a total of 1680 images. These images are resized to 32*32 pixels.

    Yale: This dataset contains 165 grayscale images collected from 15 individuals. Each individual has 11 images with different facial expressions or configurations: with or without glasses, left-light, center-light, right-light, normal, happy, sad, sleepy and surprised. All image sizes are 32*32.

    http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html

    Handwritten digit: The dataset consists of 2000 samples collected from 10 numeric characters (0–9).

    https://lig-membres.imag.fr/grimal/data.html

    UCI digit§: This dataset consists of features of handwritten numerals (0–9) from University of California, Irvine(UCI) machine learning repository. They contain 2000 instances belonging to 10 classes.

    §http://archive.ics.uci.edu/dataset/72/multiple+features

    Diag-Bcw: Breast Cancer Wisconsin (denoted as Diag-Bcw) dataset, whereas the features are computed from a digitized image of a fine needle aspirate of a breast mass. It contains 569 documents in 2 categories.

    https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

    Caltech101 [60]: Caltech101 contains 9144 pictures of objects belonging to 101 categories, including motorbike, airplane, cannon, etc. There are about 40 to 800 images per category but most categories have about 50 images.

    Isolet||: This dataset was generated as follows: 150 subjects spoke the name of each letter of the alphabet twice. The speakers are grouped into sets of 30 speakers each, and are referred to as isolet1, isolet2, isolet3, isolet4, and isolet5. We select the subset isolet3 in this experiment. It contains 1560 samples in 26 categories.

    ||http://archive.ics.uci.edu/dataset/54/isolet

    Vote**: Congressional Voting Records (denoted as Vote) are derived from vote collections for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the congressional quarterly almanac(CQA). This dataset contains 435 documents in 2 categories.

    **https://archive.ics.uci.edu/dataset/105/congressional+voting+records

    Abalone††: This dataset corresponds to the physical measurements of abalones and was collected from Marine Resources Division. It contains 2282 documents in 4 categories.

    ††https://archive.ics.uci.edu/dataset/1/abalone

    A summary of the above datasets is provided in Table 2 for their statistical characteristics.

    Table 2.  Statistics characteristics of the datasets.
    Datasets Samples (N) Features (M) Classes (C) Data Types
    ORL 400 1024 40 Face image
    AR 1680 1024 120 Face image
    Yale 165 1024 15 Face image
    Handwritten digit 2000 649 10 Digit image
    UCI-fac 2000 216 10 Digit image
    UCI-fou 2000 76 10 Digit image
    UCI-kar 2000 64 10 Digit image
    Diag-Bcw 569 30 2 Digit image
    Caltech7 1474 3766 7 Object image
    Caltech20 2386 3766 20 Object image
    isolet3 1560 617 26 Spoken document
    Vote 435 16 2 Document
    Abalone 2282 8 4 Document

     | Show Table
    DownLoad: CSV

    The proposed method is compared with the well-known K-means method and six other popular NMF based algorithms: NMF, ONMF, GNMF, sparse constrained nonnegative matrix factorization with matrix H (SNMF-H), sparse constrained nonnegative matrix factorization with matrix W (SNMF-W), NMFOS. As to the parameter setup of all compared methods, we adopt their default settings if feasible. For a fair comparison, we only consider algorithms based on the Frobenius norm. The description of each algorithm is as follows:

    K-means[61]: This algorithm operates on the original matrix without extracting or utilizing information contained in the original matrix.

    NMF[9]: The goal of this algorithm is to find two nonnegative matrices with significantly lower dimensions than the original matrix, while achieving strategies for features extraction and dimensionality reduction.

    ONMF[24]: Building upon NMF, an orthogonal constraint is introduced to the basis matrix (or coefficient matrix).

    GNMF[29]: By integrating graph theory and manifold assumptions with the original NMF, this algorithm preserves the intrinsic geometric structure of the data.

    SNMF-W and SNMF-H[62]: These two algorithms can generate sparse representations and represent the data as a linear combination of a small number of basis vectors.

    RMNMF[63]: The Robust manifold nonnegative matrix factorization (RMNMF) algorithm combines the 2,1-norm with spectral clustering to enhance robustness to outliers and preserve data geometry, while an orthonormal constraint addresses the uniqueness of the solution.

    KOGNMF[64]: Kernel-based orthogonal graph regularized NMF (KOGNMF). By integrating graph constraints into the nonlinear NMF framework, this algorithm develops a kernel-based graph-regularized NMF.

    min-vol NMF[65]: The min-vol NMF algorithm incorporates volume regularization to address rank-deficient basis matrices, making it applicable to real-world problems such as multispectral image unmixing. It utilizes an alternating fast projected gradient method for optimization.

    NMFOS[48]: Building upon ONMF, this algorithm incorporates an orthogonal constraint on the coefficient matrix as a penalty term in the objective function.

    To assess clustering performance, we will adopt purity, normalized mutual information(NMI), accuracy(ACC), and rand index(RI) for comparing clustering results, which are widely used in nonnegative learning literature.

    In this subsection, we will provide descriptions and definitions of these four indicators, as well as the selection of initial matrices and the stopping criteria for algorithms.

    Purity is used for measuring the extent to which each cluster contained data instances from primary one class, and it is computed by

    Purity=Ki=1maxj(nji)N. (4.1)

    The notation nji represents the count of occurrences where the j-th input class is assigned to the i-th cluster, with N denoting the total number of samples. The described process involves assigning a class to each cluster, and the samples for this class that appear most frequently within the cluster. Therefore, its range is [0,1] and a higher value implies better clustering performance.

    NMI is commonly employed in clustering to measure the similarity between two clustering results. Its range is [0,1], with higher values indicating greater similarity between the two clustering results [63]. NMI is defined by the following formula:

    NMI(C,C)=MI(C,C)max(H(C),H(C)). (4.2)

    Here, C and C represent a set of true labels and a set of clustering labels, respectively, and H() denotes the entropy function.

    ACC is used for discovering the one-to-one relationship between classes and clusters, along with measuring the extent to which each cluster's contained data instances from the corresponding class. The range of ACC values is from 0 to 1, where a higher value indicates better clustering results [66]. The expression for ACC is defined as follows:

    ACC=Ni=1δ(ci,map(cj))N, (4.3)

    where N represents the total number of samples, and map() denotes the mapping of the computed clustering label cj to the true clustering label ci. The definition of δ is as follows:

    δ(i,j)={1,i=j,0,others. (4.4)

    RI is a clustering evaluation indicator that measures the similarity between two clusters. It determines the agreement between the pairwise relationships of data points in the true cluster and the obtained cluster. RI is calculated by using the following formula:

    \begin{equation} RI = \frac{{a + b}}{{\binom{n}{2}}}, \end{equation} (4.5)

    where a represents the number of pairwise agreements between data points assigned to the same cluster in both the true and obtained clusterings, b represents the number of pairwise agreements between data points assigned to different clusters in both the true and obtained clusterings, and n denotes the total number of data points, with {\binom{n}{2}} being the combinatorial number.

    By unifying (4.1)–(4.3) and (4.5) together, the definitions of all considered clustering indicators are given in Table 3.

    Table 3.  Definitions of all considered clustering metrics.
    Metric Definition Range
    Purity \sum_{i=1}^{K} \frac{\max _{j}\left(n_{i}^{j}\right)}{N} [0, 1]
    Normalized Mutual Information (NMI) \frac{M I\left(C, C^{\prime}\right)}{\max \left(H(C), H\left(C^{\prime}\right)\right)} [0, 1]
    Accuracy (ACC) \frac{\sum_{i=1}^{N} \delta\left(c_{i}, \operatorname{map}\left(c_{j}^{\prime}\right)\right)}{N} [0, 1]
    Rand Index(RI) \frac{{a + b}}{{\binom{n}{2}}} [0, 1]

     | Show Table
    DownLoad: CSV

    In the clustering experiment, we use the number of clusters K in the dataset to be equal to the maximum number of categories C in the target dataset. Due to the sensitivity of most methods to initial values, for experimental accuracy, we conducted 10 separate runs for all algorithms and calculate their mean and standard deviation. As for the stopping criterion, we only needed to set a maximum number T = 100 of iterations.

    The experiments were conducted on a computing system operating under Windows 10 Home Edition, 64-bit (version 10.0, Build 19045). The hardware configuration comprises an Intel(R) Core(TM) i5-7300HQ CPU, functioning at a clock speed of 2.50 GHz with four cores, and is supplemented by 8 GB of RAM. The programming environment employed for the implementation of the algorithms is MATLAB R2015b.

    In this subsection, we will discuss the sensitivity of the parameters.

    There are four parameters in our model: the regularization parameter \lambda , the orthogonal penalty parameters \alpha_1 , the orthogonal penalty parameters \alpha_2 , and the nearest neighbor point p . To reveal the impact of these terms on the performance of the model, we select a batch of parameters such that \lambda , \alpha_1 , and \alpha_2 are all in the range of \left\{ 10^{-3}, 10^{-2}, 10^{-1}, 10^{0}, 10^{1}, 10^{2}, 10^{3}\right\} . For the parameters \lambda , \alpha_1 and \alpha_2 , we employed NMI as the evaluation metric to conduct a parameter sensitivity experiment on the ORL dataset. The number of iterations is empirically fixed at 100 times. As stated above, the 0–1 weighting is used for computing the weighting graph in all experiments if not stated otherwise. Figure 2 plots the NMI of the ORL dataset with different parameters. The x-axis represents \alpha_2 , the y-axis represents \alpha_1 , and the z-axis represents NMI. As shown in Figure 2, algorithm performance changes with variations in three parameters. For example, we have the following observations that the proposed GNMFOSV algorithm obtains superior performance when \lambda = 100 , \alpha_1 varies in \left\{ 10^{-2}, 10^{-1}, 10^{0}, 10^{1}\right\} , and achieves consistent good performance regardless of the range of \alpha_2 . The specific analysis of Figure 2 is presented in the following paragraphs.

    Figure 2.  Parameter sensitivity on the ORL dataset, where NMI of the proposed method are reported while fixing the nearest neighbor point p and clustering number K .

    First, a small \lambda and a large \alpha_1 may deteriorate the clustering performance. For example, from Figure 2(c), GNMFOSV performs poorly when \alpha_1 is greater than 100. However, when the value of \alpha_1 is in a smaller range, the performance of GNMFOSV is very stable and has significant advantages. So, the value of \alpha_1 will be set to \left\{ 10^{-2}, 10^{-1}, 10^{0}, 10^{1}\right\} in the following experiments.

    Moreover, when \lambda\! = \!100 , the best results appear for \alpha_1\! = \!0.01 , and \alpha_2\! = \!1000 and this reflects that a small \lambda and a large \alpha_1 are needed. Then, by experiments, we will use \alpha_1\! = \!0.01 , \alpha_2\! = \!1000 , and \lambda\! = \!100 .

    Second, to choose nearest neighbor point p when \alpha_1 = 0.01 , \alpha_2 = 1000 , and \lambda = 100 . The clustering results of NMI with respect to the neighborhood size p varies from 1 to 10 are shown in Figure 3. Based on the ORL dataset, GNMFOSV algorithm arrives at the good NMI for the ORL dataset when p varies from 2 to 5. Other indicators and datasets are similar.

    Figure 3.  The clustering results of NMI on ORL dataset with respect to the neighborhood size p .

    In summary, our experiments (all datasets are analyzed) informed us that the parameter selection results are p = 3 , \alpha_1 = 0.01 , \alpha_2 = 1000 , and \lambda = 100 . These selected parameter values will be used in numerical experiments for clustering throughout the entire paper. This means that the robustness of the proposed method can be guaranteed when the parameters are selected in an appropriate range. These observations also indicate that the GNMFOSV algorithm is sensitive to the values of parameters, which provides corresponding moderation for practical applications.

    At the same time, we also conducted parameter sensitivity tests on 12 other datasets. We found that the model proposed in this paper demonstrates excellent robustness. Regrettably, constrained by the page limit, only the experimental results of the ORL dataset are presented.

    In the ablation study of this subsection, we choose p = 3 , \alpha_1 = 0.01 , \alpha_2 = 1000 , and \lambda = 100 as parameter selection of GNMFOSV. The ablation study means that the lost functions of the proposed GNMFOSV are studied by experiments of removing different parts. The combinations of each part in (3.3) are denoted as follows.

    \bullet Baseline(NMF): The loss function composed of only the NMF initial item(the first item) in (3.3) is set as the baseline, which is

    \begin{equation*} \min J(W,H) = \min\limits_{W,H} \frac{1}{2}\|X-WH\|_F^2\; s.t.\; W\geq0,H\geq0. \end{equation*}

    \bullet GNMF: The loss function composed of the NMF term and the second term in Eq (3.3), which is

    \begin{equation*} \min J(W,H) = \min\limits_{W,H} \frac{1}{2}\|X-WH\|_F^2+\frac{\lambda }{2}Tr(HLH^T) \; s.t.\; W\geq0,H\geq0. \end{equation*}

    \bullet NMFOSV: The loss function, consisting of the NMF term along with the third and fourth terms in Eq (3.3),

    \begin{equation*} \min J(W,H,V) = \min\limits_{W,H,V} \frac{1}{2}\|X-WH\|_F^2+\frac{\alpha_1} {2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2\; s.t.\; W\geq0,H\geq0,V\geq0. \end{equation*}

    \bullet GNMFOSV: The loss function of Eq (3.3), which is indeed the proposed algorithm in this paper.

    Here X = [x_1, \cdots, x_N]\in R_{+}^{M\times N} , W = [w_1, \cdots, w_M]\in R_{+}^{M\times K} , H = [h_1, \cdots, h_N]\in R_{+}^{K\times N} , L \in R_{+}^{N\times N} , I is the identity matrix with K rows and K columns. The clustering performance values presented in Tables 4-6 demonstrate that, all core modules of the proposed image clustering model are indispensable. The experimental results indicate that the integration of graph regularization methods, orthogonal constraints, and the auxiliary variable mechanism can significantly enhance clustering accuracy and stability. Outperforming comparative methods on multiple datasets, this validates the rationality and innovativeness of the proposed model.

    Table 4.  Results (mean \pm standard deviation) on AR.
    Metric NMF GNMF NMFOSV GNMFOSV
    Purity 0.5262 \pm 0.0166 0.4824 \pm 0.0161 0.5103 \pm 0.0131 0.5718 \pm 0.0147
    NMI 0.7636 \pm 0.0090 0.7268 \pm 0.0098 0.7563 \pm 0.0101 0.8011 \pm 0.0085
    ACC 0.4904 \pm 0.0154 0.4464 \pm 0.0187 0.4562 \pm 0.0092 0.5438 \pm 0.0191

     | Show Table
    DownLoad: CSV
    Table 5.  Results (mean \pm standard deviation) on Yale.
    Metric NMF GNMF NMFOSV GNMFOSV
    Purity 0.4024 \pm 0.0294 0.3909 \pm 0.0103 0.3972 \pm 0.0083 0.4661 \pm 0.0153
    NMI 0.4381 \pm 0.0161 0.4329 \pm 0.0146 0.4412 \pm 0.0261 0.5024 \pm 0.0175
    ACC 0.3872 \pm 0.0355 0.3757 \pm 0.0151 0.3810 \pm 0.0113 0.6459 \pm 0.0163

     | Show Table
    DownLoad: CSV
    Table 6.  Results (mean \pm standard deviation) on Caltech20.
    Metric NMF GNMF NMFOSV GNMFOSV
    Purity 0.5547 \pm 0.0133 0.5428 \pm 0.0028 0.5582 \pm 0.0121 0.6114 \pm 0.0185
    NMI 0.2910 \pm 0.0110 0.2731 \pm 0.0044 0.2841 \pm 0.0076 0.3816 \pm 0.0130
    ACC 0.2767 \pm 0.0191 0.2403 \pm 0.0095 0.2811 \pm 0.0126 0.3465 \pm 0.0170

     | Show Table
    DownLoad: CSV

    In this subsection, extensive experimental results are displayed to validate the effectiveness of the proposed model. The experimental results of GNMFOSV with other comparison algorithms are given in Tables 7-19, and each result is the average of 10 runs after turning the parameters. The clustering results of all comparison algorithms are evaluated according to the former 4 indicators. The best result for each indicator is marked in bold. "-" indicates that clustering results could not be obtained or there was insufficient memory.

    Table 7.  Results (mean \pm standard deviation) on ORL.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.6140¡À0.0350 0.6350 \pm 0.0302 0.6180 \pm 0.0399 0.6540 \pm 0.0022 0.6431 \pm 0.0292 0.6020 \pm 0.0205 0.5432 \pm 0.0211 0.1407 \pm 0.0039 0.5199 \pm 0.0325 0.5730 \pm 0.0309 0.7185 \pm 0.0328
    NMI 0.6082 \pm 0.0214 0.5967 \pm 0.0146 0.5927 \pm 0.0233 0.6171 \pm 0.0015 0.5937 \pm 0.0213 0.5930 \pm 0.0139 0.4951 \pm 0.0207 0.0130 \pm 0.0017 0.5417 \pm 0.0179 0.5597 \pm 0.0179 0.6572 \pm 0.0152
    ACC 0.5875 \pm 0.0518 0.6080 \pm 0.0338 0.5785 \pm 0.0444 0.6065 \pm 0.0022 0.6415 \pm 0.0368 0.5805 \pm 0.0270 0.5119 \pm 0.0146 0.1357 \pm 0.0027 0.5058 \pm 0.0406 0.5375 \pm 0.0359 0.7185 \pm 0.0411
    RI 0.9008 \pm 0.0081 0.9020 \pm 0.0041 0.8919 \pm 0.0068 0.9097 \pm 0.0005 0.9002 \pm 0.0050 0.8969 \pm 0.0039 0.4001 \pm 0.0110 0.1014 \pm 0.0008 0.8801 \pm 0.0065 0.8827 \pm 0.0055 0.9165 \pm 0.0058

     | Show Table
    DownLoad: CSV
    Table 8.  Results (mean \pm standard deviation) on AR.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.4222 \pm 0.0135 0.5262 \pm 0.0166 0.4876 \pm 0.0171 0.4824 \pm 0.0161 0.5258 \pm 0.0199 0.5387 \pm 0.0130 0.5593 \pm 0.0191 0.4190 \pm 0.0153 0.2874 \pm 0.0092 0.5095 \pm 0.0100 0.5718 \pm 0.0147
    NMI 0.6992 \pm 0.0059 0.7636 \pm 0.0090 0.7289 \pm 0.0131 0.7268 \pm 0.0098 0.7536 \pm 0.0125 0.7707 \pm 0.0077 0.7557 \pm 0.0127 0.6722 \pm 0.0078 0.6084 \pm 0.0081 0.7633 \pm 0.0074 0.8011 \pm 0.0085
    ACC 0.3883 \pm 0.0154 0.4904 \pm 0.0154 0.4511 \pm 0.0193 0.4464 \pm 0.0187 0.4873 \pm 0.0201 0.5011 \pm 0.0136 0.4973 \pm 0.0206 0.3860 \pm 0.0156 0.2629 \pm 0.0118 0.4666 \pm 0.0138 0.5438 \pm 0.0191
    RI 0.9867 \pm 0.0003 0.9885 \pm 0.0004 0.9874 \pm 0.0005 0.9865 \pm 0.0005 0.9879 \pm 0.0005 0.9885 \pm 0.0003 0.9865 \pm 0.0009 0.9836 \pm 0.0013 0.9843 \pm 0.0005 0.9878 \pm 0.0004 0.9896 \pm 0.0006

     | Show Table
    DownLoad: CSV
    Table 9.  Results (mean \pm standard deviation) on Yale.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.4079 \pm 0.0346 0.4024 \pm 0.0294 0.4055 \pm 0.0181 0.3909 \pm 0.0103 0.4018 \pm 0.0277 0.4042 \pm 0.0228 0.3285 \pm 0.0251 0.4067 \pm 0.0194 0.4430 \pm 0.0283 0.3964 \pm 0.0333 0.4661 \pm 0.0153
    NMI 0.4408 \pm 0.0392 0.4381 \pm 0.0161 0.4502 \pm 0.0206 0.4329 \pm 0.0146 0.4361 \pm 0.0188 0.4393 \pm 0.0227 0.3477 \pm 0.0259 0.4566 \pm 0.0231 0.4573 \pm 0.0213 0.4353 \pm 0.0306 0.5024 \pm 0.0175
    ACC 0.3836 \pm 0.0333 0.3872 \pm 0.0355 0.3860 \pm 0.0217 0.3757 \pm 0.0151 0.3793 \pm 0.0325 0.3800 \pm 0.0336 0.3072 \pm 0.0221 0.3903 \pm 0.0162 0.4200 \pm 0.0281 0.3793 \pm 0.0377 0.4593 \pm 0.0163
    RI 0.8785 \pm 0.0164 0.8975 \pm 0.0062 0.8980 \pm 0.0043 0.8974 \pm 0.0037 0.8956 \pm 0.0045 0.8958 \pm 0.0051 0.8751 \pm 0.0061 0.8810 \pm 0.0083 0.8963 \pm 0.0050 0.8973 \pm 0.0045 0.9014 \pm 0.0025

     | Show Table
    DownLoad: CSV
    Table 10.  Results (mean \pm standard deviation) on Handwritten digit.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.5461 \pm 0.0065 0.5382 \pm 0.0299 0.5358 \pm 0.0160 0.5521 \pm 0.0060 0.5030 \pm 0.0271 0.5537 \pm 0.0156 0.4924 \pm 0.0249 0.3105 \pm 0.0102 0.4701 \pm 0.0048 0.5583 \pm 0.0445 0.6010 \pm 0.0216
    NMI 0.5276 \pm 0.0098 0.5162 \pm 0.0260 0.5063 \pm 0.0118 0.5563 \pm 0.0052 0.5061 \pm 0.0298 0.5070 \pm 0.0133 0.4629 \pm 0.0158 0.2847 \pm 0.0106 0.4663 \pm 0.0041 0.5210 \pm 0.0256 0.6128 \pm 0.0205
    ACC 0.4865 \pm 0.0650 0.5450 \pm 0.0449 0.5220 \pm 0.0520 0.5900 \pm 0.0476 0.4510 \pm 0.0340 0.5125 \pm 0.0527 0.4531 \pm 0.0258 0.2970 \pm 0.0165 0.4189 \pm 0.0162 0.5015 \pm 0.0549 0.6075 \pm 0.0748
    RI 0.8679 \pm 0.0086 0.8688 \pm 0.0068 0.8786 \pm 0.0017 0.8737 \pm 0.0005 0.8578 \pm 0.0115 0.8665 \pm 0.0053 - 0.8432 \pm 0.0043 - 0.8529 \pm 0.0110 0.8913 \pm 0.0030

     | Show Table
    DownLoad: CSV
    Table 11.  Results (mean \pm standard deviation) on UCI-fac.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.6095 \pm 0.0233 0.5316 \pm 0.0365 0.5812 \pm 0.0235 0.5934 \pm 0.0057 0.5284 \pm 0.0204 0.5320 \pm 0.0185 0.5342 \pm 0.0256 0.1362 \pm 0.0038 0.3986 \pm 0.0692 0.5427 \pm 0.0247 0.6355 \pm 0.0232
    NMI 0.5809 \pm 0.0165 0.4674 \pm 0.0282 0.5283 \pm 0.0185 0.5349 \pm 0.0043 0.4658 \pm 0.0136 0.4682 \pm 0.0155 0.4491 \pm 0.0186 0.0097 \pm 0.0016 0.3376 \pm 0.0605 0.4695 \pm 0.0186 0.6028 \pm 0.0136
    ACC 0.5855 \pm 0.0462 0.5119 \pm 0.0432 0.5478 \pm 0.0285 0.5597 \pm 0.0057 0.4965 \pm 0.0340 0.5107 \pm 0.0196 0.5040 \pm 0.0345 0.1324 \pm 0.0045 0.3730 \pm 0.0656 0.5232 \pm 0.0267 0.6118 \pm 0.0276
    RI 0.8940 \pm 0.0057 0.8760 \pm 0.0074 0.8856 \pm 0.0059 0.8869 \pm 0.0008 0.8741 \pm 0.0036 0.8754 \pm 0.0032 0.8682 \pm 0.0076 0.8202 \pm 0.0003 0.0484 \pm 0.8464 0.8768 \pm 0.0045 0.8966 \pm 0.0052

     | Show Table
    DownLoad: CSV
    Table 12.  Results (mean \pm standard deviation) on UCI-fou.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.6140 \pm 0.0350 0.6350 \pm 0.0302 0.6180 \pm 0.0399 0.6540 \pm 0.0022 0.6431 \pm 0.0292 0.6020 \pm 0.0205 0.5432 \pm 0.0211 0.1407 \pm 0.0039 0.5199 \pm 0.0325 0.5730 \pm 0.0309 0.7185 \pm 0.0328
    NMI 0.6082 \pm 0.0214 0.5967 \pm 0.0146 0.5927 \pm 0.0233 0.6171 \pm 0.0015 0.5937 \pm 0.0213 0.5930 \pm 0.0139 0.4951 \pm 0.0207 0.0130 \pm 0.0017 0.5417 \pm 0.0179 0.5597 \pm 0.0179 0.6572 \pm 0.0152
    ACC 0.5875 \pm 0.0518 0.6080 \pm 0.0338 0.5785 \pm 0.0444 0.6065 \pm 0.0022 0.6415 \pm 0.0368 0.5805 \pm 0.0270 0.5119 \pm 0.0146 0.1357 \pm 0.0027 0.5058 \pm 0.0406 0.5375 \pm 0.0359 0.7185 \pm 0.0411
    RI 0.9008 \pm 0.0081 0.9020 \pm 0.0041 0.8919 \pm 0.0068 0.9097 \pm 0.0005 0.9002 \pm 0.0050 0.8969 \pm 0.0039 0.4001 \pm 0.0110 0.1014 \pm 0.0008 0.8801 \pm 0.0065 0.8827 \pm 0.0055 0.9165 \pm 0.0058

     | Show Table
    DownLoad: CSV
    Table 13.  Results (mean \pm standard deviation) on UCI-kar.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.6170 \pm 0.0347 0.6990 \pm 0.0240 0.6085 \pm 0.0268 0.6085 \pm 0.0172 0.6650 \pm 0.0352 0.6910 \pm 0.0474 0.5809 \pm 0.0384 0.2394 \pm 0.0123 0.6721 \pm 0.0408 0.6145 \pm 0.0507 0.7759 \pm 0.0247
    NMI 0.5258 \pm 0.0267 0.6286 \pm 0.0208 0.5186 \pm 0.0215 0.5553 \pm 0.0104 0.6438 \pm 0.0311 0.6590 \pm 0.0286 0.5110 \pm 0.0238 0.1495 \pm 0.0128 0.6244 \pm 0.0328 0.6339 \pm 0.0392 0.7212 \pm 0.0245
    ACC 0.5890 \pm 0.0469 0.6675 \pm 0.0310 0.5760 \pm 0.0318 0.5350 \pm 0.0237 0.6330 \pm 0.0436 0.6910 \pm 0.0564 0.5567 \pm 0.0443 0.2243 \pm 0.0098 0.6402 \pm 0.0609 0.5410 \pm 0.0591 0.7480 \pm 0.0317
    RI 0.8877 \pm 0.0088 0.9125 \pm 0.0054 0.8962 \pm 0.0056 0.8994 \pm 0.0029 0.8597 \pm 0.0088 0.9084 \pm 0.0087 0.8814 \pm 0.0090 0.8288 \pm 0.0023 0.9028 \pm 0.0131 0.9142 \pm 0.0098 0.9266 \pm 0.0070

     | Show Table
    DownLoad: CSV
    Table 14.  Results (mean \pm standard deviation) on Diag-Bcw.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.8799 \pm 0.0008 0.7374 \pm 0.0993 0.8660 \pm 0.0059 0.8365 \pm 0.0370 0.7179 \pm 0.1090 0.6769 \pm 0.1045 0.7840 \pm 0.0195 0.6274 \pm 0.0000 0.8746 \pm 0.0008 0.7079 \pm 0.1047 0.8922 \pm 0.0125
    NMI 0.4652 \pm 0.0029 0.1862 \pm 0.1602 0.4076 \pm 0.0173 0.3666 \pm 0.0233 0.1684 \pm 0.1837 0.1010 \pm 0.1776 0.2319 \pm 0.0245 0.0007 \pm 0.0007 0.4320 \pm 0.0020 0.1434 \pm 0.1796 0.4761 \pm 0.0383
    ACC 0.8822 \pm 0.0008 0.7165 \pm 0.1272 0.8660 \pm 0.0059 0.8365 \pm 0.0392 0.6869 \pm 0.1389 0.6050 \pm 0.1476 0.7840 \pm 0.0195 0.5158 \pm 0.0114 0.8746 \pm 0.0008 0.6720 \pm 0.1379 0.8957 \pm 0.0125
    RI 0.7918 \pm 0.0013 0.6222 \pm 0.1087 0.7676 \pm 0.0086 0.7260 \pm 0.0090 0.6039 \pm 0.1200 0.5605 \pm 0.1168 0.6614 \pm 0.0215 0.4998 \pm 0.0009 0.7804 \pm 0.0012 0.5927 \pm 0.1166 0.8080 \pm 0.0195

     | Show Table
    DownLoad: CSV
    Table 15.  Results (mean \pm standard deviation) on Caltech7.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.7818 \pm 0.0021 0.7898 \pm 0.0102 0.7794 \pm 0.0046 0.7824 \pm 0.0010 0.7875 \pm 0.0109 0.7822 \pm 0.0178 0.7787 \pm 0.0077 0.5467 \pm 0.0086 0.5559 \pm 0.0025 0.7818 \pm 0.0093 0.8110 \pm 0.0066
    NMI 0.2426 \pm 0.0062 0.2688 \pm 0.0168 0.2415 \pm 0.0075 0.2423 \pm 0.0007 0.2727 \pm 0.0181 0.2685 \pm 0.0153 0.2467 \pm 0.0146 0.0193 \pm 0.0075 0.0563 \pm 0.0013 0.2631 \pm 0.0177 0.3482 \pm 0.0094
    ACC 0.3390 \pm 0.0162 0.3968 \pm 0.0441 0.3352 \pm 0.0121 0.3215 \pm 0.0019 0.4233 \pm 0.0490 0.4119 \pm 0.0384 0.3616 \pm 0.0214 0.1978 \pm 0.0107 0.2421 \pm 0.0087 0.4216 \pm 0.0499 0.4548 \pm 0.0290
    RI 0.6620 \pm 0.0034 0.6790 \pm 0.0131 0.6618 \pm 0.0021 0.6606 \pm 0.0004 0.6834 \pm 0.0138 0.6803 \pm 0.0096 0.6544 \pm 0.0052 0.5856 \pm 0.0021 0.5890 \pm 0.0004 0.6695 \pm 0.0142 0.6812 \pm 0.0072

     | Show Table
    DownLoad: CSV
    Table 16.  Results (mean \pm standard deviation) on Caltech20.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.5539 \pm 0.0083 0.5547 \pm 0.0133 0.5369 \pm 0.0091 0.5428 \pm 0.0028 0.5496 \pm 0.0123 0.5636 \pm 0.0116 0.5439 \pm 0.0123 0.3388 \pm 0.0037 0.3497 \pm 0.0016 0.5579 \pm 0.0143 0.6114 \pm 0.0185
    NMI 0.2859 \pm 0.0105 0.2910 \pm 0.0110 0.2650 \pm 0.0051 0.2731 \pm 0.0044 0.2837 \pm 0.0087 0.2934 \pm 0.0107 0.2746 \pm 0.0114 0.0467 \pm 0.0041 0.0788 \pm 0.0012 0.2922 \pm 0.0157 0.3816 \pm 0.0130
    ACC 0.2485 \pm 0.0140 0.2767 \pm 0.0191 0.2481 \pm 0.0113 0.2403 \pm 0.0095 0.2772 \pm 0.0197 0.2739 \pm 0.0127 0.2623 \pm 0.0189 0.0949 \pm 0.0023 0.1099 \pm 0.0020 0.2753 \pm 0.0131 0.3465 \pm 0.0170
    RI 0.8280 \pm 0.0019 0.8300 \pm 0.0028 0.8272 \pm 0.0012 0.8272 \pm 0.0015 0.8299 \pm 0.0018 0.8306 \pm 0.0022 - 0.8042 \pm 0.0002 0.8023 \pm 0.0014 0.8302 \pm 0.0024 0.8352 \pm 0.0025

     | Show Table
    DownLoad: CSV
    Table 17.  Results (mean \pm standard deviation) on isolet3.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.5759 \pm 0.0210 0.5800 \pm 0.0275 0.5807 \pm 0.0188 0.5391 \pm 0.0219 0.5908 \pm 0.0216 0.5665 \pm 0.0209 0.5064 \pm 0.0260 0.4964 \pm 0.0266 0.5435 \pm 0.0338 0.5839 \pm 0.0240 0.6227 \pm 0.0239
    NMI 0.7161 \pm 0.0105 0.6989 \pm 0.0156 0.6947 \pm 0.0130 0.7129 \pm 0.0119 0.7068 \pm 0.0131 0.6879 \pm 0.0146 0.6209 \pm 0.0200 0.6304 \pm 0.0142 0.6766 \pm 0.0189 0.7008 \pm 0.0162 0.7392 \pm 0.0129
    ACC 0.5312 \pm 0.0205 0.5505 \pm 0.0301 0.5438 \pm 0.0212 0.4820 \pm 0.0284 0.5562 \pm 0.0281 0.5300 \pm 0.0236 0.4787 \pm 0.0289 0.4631 \pm 0.0213 0.5016 \pm 0.0317 0.5507 \pm 0.0285 0.5866 \pm 0.0257
    RI 0.9575 \pm 0.0013 0.9589 \pm 0.0020 0.9583 \pm 0.0017 0.9448 \pm 0.0017 0.9594 \pm 0.0028 0.9563 \pm 0.0026 0.9520 \pm 0.0020 0.9497 \pm 0.0020 0.9534 \pm 0.0035 0.9589 \pm 0.0025 0.9612 \pm 0.0020

     | Show Table
    DownLoad: CSV
    Table 18.  Results (mean \pm standard deviation) on Vote.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.8712 \pm 0.0022 0.8664 \pm 0.0020 0.8667 \pm 0.0000 0.8667 \pm 0.0000 0.8667 \pm 0.0000 0.6161 \pm 0.0000 0.8791 \pm 0.0022 0.7094 \pm 0.0540 0.8667 \pm 0.0000 0.6161 \pm 0.0000 0.8824 \pm 0.0016
    NMI 0.4592 \pm 0.0037 0.4393 \pm 0.0075 0.4375 \pm 0.0000 0.4375 \pm 0.0000 0.4440 \pm 0.0000 0.0032 \pm 0.0000 0.4420 \pm 0.0045 0.1426 \pm 0.0726 0.4440 \pm 0.0000 0.0032 \pm 0.0000 0.4680 \pm 0.0045
    ACC 0.8712 \pm 0.0022 0.8664 \pm 0.0020 0.8666 \pm 0.0000 0.8666 \pm 0.0000 0.8666 \pm 0.0000 0.6160 \pm 0.0000 0.8790 \pm 0.0022 0.7000 \pm 0.0729 0.8666 \pm 0.0000 0.6160 \pm 0.0000 0.8824 \pm 0.0016
    RI 0.7751 \pm 0.0033 0.7680 \pm 0.0029 0.7683 \pm 0.0000 0.7683 \pm 0.0000 0.7683 \pm 0.0000 0.5258 \pm 0.0000 0.7869 \pm 0.0033 0.5886 \pm 0.0464 0.7683 \pm 0.0000 0.5258 \pm 0.0000 0.7768 \pm 0.0024

     | Show Table
    DownLoad: CSV
    Table 19.  Results (mean \pm standard deviation) on Abalone.
    Metric K-means NMF ONMF GNMF SNMF-H SNMF-W RNMF KOGNMF min-vol NMF NMFOS GNMFOSV
    Purity 0.3624 \pm 0.0000 0.3537 \pm 0.0132 0.3582 \pm 0.0119 0.3501 \pm 0.0035 0.3575 \pm 0.0099 0.3511 \pm 0.0141 0.3849 \pm 0.0154 0.3053 \pm 0.0030 0.3553 \pm 0.0014 0.3588 \pm 0.0044 0.3834 \pm 0.0010
    NMI 0.0703 \pm 0.0002 0.0628 \pm 0.0140 0.0671 \pm 0.0144 0.0595 \pm 0.0010 0.0664 \pm 0.0111 0.0605 \pm 0.0141 0.0834 \pm 0.0091 0.0017 \pm 0.0006 0.0639 \pm 0.0015 0.0687 \pm 0.0069 0.0810 \pm 0.0009
    ACC 0.3312 \pm 0.0011 0.3373 \pm 0.0118 0.3413 \pm 0.0131 0.3326 \pm 0.0066 0.3405 \pm 0.0146 0.3385 \pm 0.0122 0.3526 \pm 0.0118 0.2686 \pm 0.0049 0.3242 \pm 0.0020 0.3414 \pm 0.0062 0.3745 \pm 0.0026
    RI 0.5972 \pm 0.0004 0.6099 \pm 0.0094 0.6175 \pm 0.0088 0.5629 \pm 0.0085 0.6055 \pm 0.0222 0.6081 \pm 0.0160 0.6308 \pm 0.0054 0.6198 \pm 0.0005 0.5755 \pm 0.0068 0.6078 \pm 0.0095 0.6332 \pm 0.0011

     | Show Table
    DownLoad: CSV

    From the above tables, we will have the following observations.

    First, the proposed method GNMFOSV achieves the best Purity, ACC, NMI, and RI values among all compared algorithms from a total of 13 datasets. It should be pointed that GNMFOSV outperforms other compared methods, with considerable advantages on ORL, AR, UCI-fou, UCI-kar datasets. For example, it makes about 7.7% increases against the second best method SNMF-H on the dataset UCI-fou in terms of ACC. Second, from the data in the above tables, the clustering algorithms of GNMF and GNMFOSV show certain competitive power at Purity, NMI, ACC, and RI comparing to the other methods. It indicates that the graph-based clustering algorithms are good at modeling the intrinsic structure of data. Furthermore, the orthogonal-based learning is promising over original NMF learning, which can be observed from the fact that the orthogonal-based algorithms, i.e., ONMF, NMFOS, and GNMFOSV, substantially boost the clustering performance on document datasets.

    Third, compared with GNMF and NMFOS algorithms, neither the graph regularization algorithm nor the orthogonal subspace algorithm can achieve excellent clustering performance. However the GNMFOSV algorithm combines the advantages of graph regularity and orthogonal subspace, and it can significantly exceed for the original single-constrained NMF algorithm. In general, our proposed GNMFOSV algorithm consistently outperforms the other compared algorithms in terms of Purity, NMI, ACC, and RI metrics, which demonstrates the clear advance of our method.

    In conclusion, the GNMFOSV method performs acceptably on all datasets, i.e., face image, digit image, object image, and document, which further validates its robustness in terms of steady clustering performance. The quantitative results fully demonstrate the effectiveness of our proposed GNMFOSV approach.

    In this subsection, we investigate how fast the convergence speed is. The convergence curves of GNMFOSV on all datasets are demonstrated in Figure 4 based on all 13 datasets. The x-axis denotes the number of iterations, and the y-axis denotes the objective function value.

    Figure 4.  The convergence curves of the objective function value in GNMFOSV on (a) ORL, (b) AR, (c) Yale, (d) Handwritten digit, (e) UCI-fac, (f) UCI-fou, (g) [UCI-kar, (h) Diag-Bcw, (i) Caltech7, (j) Caltech20, (k) isolet3, (l) Vote, (m) Abalone datasets.

    It can be seen from the convergence curve (tested on each dataset depicted in the figure) that our algorithm can achieve rapid convergence on each data set. In the datasets of AR, Handwritten digit, Diag-Bcw, Caltech7, Caltech20, and Vote, when the number of iterations is less than 20, the GNMFOSV algorithm can achieve stable convergence. On the ORL, Yale, UCI-fac, UCI-fou, UCI-kar and Abalone datasets, the speed of GNMFOSV drops rapidly, and then it becomes a gentle decline, and finally achieves stable convergence at about 100 iterations. On the isolet3 dataset, the convergence curve of GNMFOSV is very smooth, and convergence is achieved around 100 iterations.

    So, Figure 4 indicates that the GNMFOSV algorithm has done a very good job in actual convergence experiments, and the choosen of 100 (for iteration) is good enough in actual experiments.

    In this paper, we have proposed a new NMF cost function for clustering problems. The derived optimization model incorporates the orthogonal constraints, a graph Laplacian term, and auxiliary variable mechanism. The experiments show the superiority of the proposed algorithm (at least on the datasets used in this work). The pro-posed algorithm has the following advantages: (i) It leverages the geometric structure information hidden in both data and feature spaces effectively. Specifically, this model effectively exploits the geometric structure information hidden in the data and feature spaces. It models the data space as a sub-manifold embedded in a high-dimensional space. After conducting matrix factorization on this sub-manifold, it obtains a low-dimensional representation of data points, which is beneficial for data point clustering. Therefore, this model is more discriminative than algorithms based on orthogonal subspaces that only take into account the Euclidean structure of the data. Also, it can achieve better classification performance than graph-based algorithms because the latter do not consider the orthogonalization of the vectors in the factor matrices. (ii) By introducing an auxiliary variable, it provides a complete proof of convergence, filling the gap in demonstrating convergence under orthogonality conditions for such algorithms. (iii) After conducting 13 groups experiments with data of different sizes and types, the results show that, it is less sensitive to changing the size and type of the data. The performance of the proposed algorithm remains more or less stable even when the size and type of the data changes.

    Caiping Wang: Conceptualization; Wen Li: Formal analysis; Caiping Wang, Wen Li, Junjian Zhao and Yasong Chen wrote the main manuscript text. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The first author is partially supported by the China Tobacco Hebei industrial co., Ltd Technology Project, China (Grant No. HBZY2023A034). The third and fourth authors are partially supported by the Natural Science Foundation of Tianjin City, China (Grant Nos. 24JCYBJC00430 and 18JCYBJC16300), the Natural Science Foundation of China (Grant No. 11401433), and the Ordinary Universities Undergraduate Education Quality Reform Research Project of Tianjin, China (Grant No. A231005802).

    The authors declare there is no conflict of interest.

    Convergence analysis of Algorithm 1

    In this section, we will investigate the convergence of Algorithm 1. Since the objective function of the GNMFOSV model (3.3) has a lower bound, we only need to prove (3.3) is nonincreasing with respect to the update rules (3.14)–(3.16). Before that, we first introduce a definition given by Lee and Seung [50], as well as a lemma with a concise proof.

    Definition 1. [50] Let F:{R}_+^{\upsilon\times\omega}\to {R} be continuously differentiable with \upsilon, \omega\in\mathbb{N} , and G:{R}_+^{\upsilon\times\omega}\times {R}_+^{\upsilon\times\omega}\rightarrow {R} also be continuously differentiable. Then G is called an auxiliary function of F if it satisfies the following conditions:

    \begin{equation*} G(U,U) = F(U),\ G(U,U^t)\geq F(U), \end{equation*}

    for any U, U^t\in {R}_+^{\upsilon\times\omega} .

    Lemma 1. Let G be an auxiliary function of F . Then F is nonincreasing under the sequence of matrices \{U^{t}\}_{t = 0}^{+\infty} generated by the following update rule:

    \begin{equation} U^{t+1} = \arg\min\limits_{U\in R^{\upsilon\times\omega}}G(U,U^t). \end{equation} (5.1)

    Proof. By Definition 1 and (5.1), the following formula

    \begin{equation} G(U^{t+1},U^{t+1}) = F(U^{t+1})\leq G(U^{t+1},U^t)\leq G(U^t,U^t) = F(U^t) \end{equation} (5.2)

    holds naturally.

    Next, we will provide the convergence proofs for the updated rules (3.14)–(3.16), which is also an important innovation of this paper. To this end, we first present the objective function F , and it is introduced to replace the symbols in the objective function J(W, H) in (3.4), i.e., F = J(W, V, H) . Moreover, let F_{W_{ik}} , F_{V_{kj}} , and F_{H_{kj}} represent the parts of the objective function (3.4) associated with the elements W_{ik} , V_{kj} , and H_{kj} , respectively. In addition, due to the complexity of representation of F(H) based only on H , the function F(H) will be rewritten as F(H) = F_1(H)+F_2(H) with

    \begin{equation} F_1(H) = \frac{1}{2}\|X-WH\|_F^2+\frac{\lambda }{2}Tr(HLH^T),\ F_2(H) = \frac{\alpha_1} {2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2. \end{equation} (5.3)

    The first and second-order derivatives of F with respect to W_{ik} are listed below:

    \begin{equation*} F_{W_{ik}}^{'} = \left(\frac{\partial J(W,V,H)}{\partial W}\right)_{ik} = (WHH^T-XH^T)_{ik},\ F_{W_{ik}}^{''} = (HH^T)_{kk}. \end{equation*}

    Meanwhile, the first and second-order derivatives of F_1(H) with respect to H_{kj} will be shown below:

    \begin{equation*} F_{1_{H_{kj}}}^{'} = \left(\frac{\partial J(W,V,H)}{\partial H}\right)_{kj} = (W^TWH-W^TX+\lambda HL)_{kj},\ F_{1_{H_{kj}}}^{''} = (W^TW)_{kk}+(\lambda L)_{jj}. \end{equation*}

    Note that we have omitted the partial derivatives of F_2 and F_{V_{kj}} because they are not required when constructing the auxiliary functions.

    We can now define auxiliary functions. Since the update rules (3.14)–(3.16) are all updated in the form of each element of the matrix, according to Lemma 1, it is sufficient to prove that F_{W_{ik}} , F_{V_{kj}} and F_{H_{kj}} do not increase under the sequences \{W_ {ik}^t\}_{t = 0}^{+\infty} , \{V_{kj}^t\}_{t = 0}^{+\infty} , and \{H_{kj}^t\}_{t = 0}^{+\infty} generated by (3.14)–(3.16), respectively. Then, the specific expressions of auxiliary functions are

    \begin{equation} G_W(W_{ik},W_{ik}^{t}) = F_{W_{ik}}(W_{ik}^{t})+F_{W_{ik}}^{'}(W_{ik}^{t})(W_{ik}-W_{ik}^{t})+\frac{1}{2}\frac{(W^{t}HH^T)_{ik}}{W_{ik}^{t}}(W_{ik}-W_{ik}^{t})^{2}, \end{equation} (5.4)
    \begin{equation} G_V(V_{kj},V_{kj}^{t}) = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\frac1{(H{V^t}^T)_{ab}}\sum\limits_{r = 1}^nH_{ar}{V^t}^T_{rb}\left[\delta_{ab} -\frac{V_{br}}{V^t_{br}}(H{V^t}^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2, \end{equation} (5.5)
    \begin{equation} G_{H_1}(H_{kj},H_{kj}^{t}) = F_{1_{H_{kj}}}(H_{kj}^{t})+F_{1_{H_{kj}}}^{'}(H_{kj}^{t})(H_{kj}-H_{kj}^{t})+\frac{1}{2}\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^{t}}(H_{kj}-H_{kj}^{t})^{2}, \end{equation} (5.6)
    \begin{equation} G_{H_2}(H_{kj},H_{kj}^{t}) = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\frac1{(H^tV^T)_{ab}}\sum\limits_{r = 1}^nH^t_{ar}V^T_{rb} \left[\delta_{ab}-\frac{H_{ar}}{H^t_{ar}}(H^tV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2. \end{equation} (5.7)

    We now have the following auxiliary function for H via (5.6) and (5.7):

    \begin{equation} G_H(H_{kj},H_{kj}^{t}) = G_{H_1}(H_{kj},H_{kj}^{t})+G_{H_2}(H_{kj},H_{kj}^{t}). \end{equation} (5.8)

    Next, we will provide one by one in the form of lemmas that (5.4), (5.5), and (5.8) are indeed auxiliary functions. The first is to show that G_W(W_{ik}, W_{ik}^{t}) is an auxiliary function of F_{W_{ik}} .

    Lemma 2. Let G_W(W_{ik}, W_{ik}^{t}) be defined by the formula (5.4). Then, G_W(W_{ik}, W_{ik}^{t}) is an auxiliary function of F_{W_{ik}} .

    Proof. By expanding F_{W_{ik}}(W_{ik}) based on the Taylor series, we have the following expansion:

    \begin{equation*} F_{W_{ik}}(W_{ik}) = F_{W_{ik}}(W_{ik}^t)+F_{W_{ik}}^{'}(W_{ik}^t)(W_{ik}-W_{ik}^t)+\frac{1}{2}(HH^T)_{kk}(W_{ik}-W_{ik}^t)^2. \end{equation*}

    To prove that G_W(W_{ik}, W_{ik}^{t})\geq F_{W_{ik}}(W_{ik}) , it is sufficient to prove the following inequality:

    \begin{equation} \frac{(W^tHH^T)_{ik}}{W_{ik}^t}\geq(HH^T)_{kk}. \end{equation} (5.9)

    In fact, (5.9) is true via

    \begin{equation} (W^tHH^T)_{ik} = \sum\limits_{r = 1}^KW_{ir}^t(HH^T)_{rk}\geq W_{ik}^t(HH^T)_{kk}. \end{equation} (5.10)

    In addition, (5.10) can also lead to

    \begin{equation*} \frac{1}{2}\frac{(W^{t}HH^T)_{ik}}{W_{ik}^{t}}(W_{ik}-W_{ik}^{t})^{2} \geq \frac{1}{2}\frac{W_{ik}^t(HH^T)_{kk}}{W_{ik}^t}(W_{ik}-W_{ik}^t)^2 = \frac{1}{2}(HH^T)_{kk}(W_{ik}-W_{ik}^t)^2. \end{equation*}

    Therefore, G_W(W_{ik}, W_{ik}^t)\geq F_{W_{ik}}(W_{ik}) follows. Meanwhile, G_W(W_{ik}, W_{ik}) = F_{W_{ik}}(W_{ik}) holds automatically. In summary, one has that G_W(W_{ik}, W_{ik}^{t}) is an auxiliary function of F_{W_{ik}} .

    The following lemmas are needed to prove G_V(V_{kj}, V_{kj}^{t}) is an auxiliary function of F_{V_{kj}} .

    Lemma 3. [67] Let \Omega\subseteq R^N be a convex set, F:\Omega\to\ R a convex function, and \lambda_{i}\in[0, 1] with i\in\{1, \cdots, k\} and \sum_{i = 1}^k\lambda_i = 1 . Then, for all x_{i}\in\Omega , F\left(\sum_{i = 1}^k\lambda_i x_i\right)\leq\sum_{i = 1}^k\lambda_iF(x_i).

    Lemma 4. Suppose F(X) = \frac12\|Y-KX\|_F^2 with X\in R^{p\times m}_+ . Let A\in R^{p\times m}_+ and K\in R^{n\times p}_+ ), then,

    \begin{equation*} F(X)\leq\frac12\sum\limits_{i = 1}^n\sum\limits_{j = 1}^m\frac1{(KA)_{ij}}\sum\limits_{k = 1}^pK_{ik}A_{kj}\left(Y_{ij}-\frac{X_{kj}}{A_{kj}}(KA)_{ij}\right)^2. \end{equation*}

    Proof. First, let's expand the Frobenius norm of F(X) in the following way:

    \begin{equation*} F(X) = \frac{1}{2}\|Y-KX\|_{F}^{2} = \frac{1}{2}\sum\limits_{i = 1}^{n}\sum\limits_{j = 1}^{m}(Y_{ij}-(KX)_{ij})^{2}. \end{equation*}

    Second, by Lemma 3, letting \lambda_k = \frac{K_{ik}A_{kj}}{(KA)_{ij}} and using the auxiliary variable A , we have that

    \begin{equation*} [Y_{ij}-(KX)_{ij}]^2 = [Y_{ij}-\sum\limits_{k = 1}^p\frac{K_{ik}A_{kj}}{(KA)_{ij}}\frac{X_{kj}(KA)_{ij}}{A_{kj}}]^2 \leq\sum\limits_{k = 1}^p\frac{K_{ik}A_{kj}}{(KA)_{ij}}\left(Y_{ij}-\frac{X_{kj}}{A_{kj}}(KA)_{ij}\right)^2. \end{equation*}

    Therefore, it leads to

    \begin{equation*} F(X) = \frac12\|Y-KX\|_F^2 \leq\frac12\sum\limits_{i = 1}^n\sum\limits_{j = 1}^m\frac1{(KA)_{ij}}\sum\limits_{k = 1}^pK_{ik}A_{kj}\left(Y_{ij}-\frac{X_{kj}}{A_{kj}}(KA)_{ij}\right)^2, \end{equation*}

    and the result holds.

    So, we have the following conclusion that G_V(V_{kj}, V_{kj}^{t}) is an auxiliary function of F_{V_{kj}} .

    Lemma 5. Let G_V(V_{kj}, V_{kj}^{t}) be defined by (5.5), then it serves as an auxiliary function for F_{V_{kj}} .

    Proof. According to Lemmas 3 and 4, we can estimate G_V(V_{kj}, V_{kj}^{t}) as follows:

    \begin{equation*} \begin{aligned} G_V(V_{kj},V_{kj}^{t})& = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\frac1{(H{V^t}^T)_{ab}}\sum\limits_{r = 1}^nH_{ar}{V^t}^T_{rb}\left[\delta_{ab}-\frac{V_{br}}{V^t_{br}}(H{V^t}^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ &\geq\frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-\sum\limits_{r = 1}^n\frac{H_{ar}V^{t^T}_{rb}}{(HV^{t^T})_{ab}}\frac{V_{br}}{V_{br}^t}(HV^{t^T})_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ & = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-(HV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ & = F(V). \end{aligned} \end{equation*}

    In addition, it is easy to see that G_V(V_{kj}, V_{kj}) = F_{V_{kj}}(V_{kj}) and the result holds.

    The third is to prove that G_H(H_{kj}, H_{kj}^{t}) can serve as an auxiliary function of F_{H_{kj}} .

    Lemma 6. Let G_H(H_{kj}, H_{kj}^{t}) be defined as in (5.8), then it can serve as an auxiliary function for F_{H_{kj}} .

    Proof. According to the definitions of F_1(H) and F_2(H) in (5.3), to prove

    G_H(H_{kj},H_{kj}^{t}) = G_{H_1}(H_{kj},H_{kj}^{t})+G_{H_2}(H_{kj},H_{kj}^{t})\geq F_1(H_{kj})+F_2(H_{kj}) = F(H_{kj}),

    it is sufficient to show that

    \begin{equation} G_{H_1}(H_{kj},H_{kj}^{t})\geq F_1(H_{kj}),\ G_{H_2}(H_{kj},H_{kj}^{t})\geq F_2(H_{kj}). \end{equation} (5.11)

    The proof of the first part in (5.11) is similar to Lemma 2. To eliminate unnecessary misunderstandings, we will provide it in detail. First, by a Taylor series expansion of F_{1_{H_{kj}}}(H_{kj}) , we have

    \begin{equation*} F_{1}(H_{kj}) = F_{1}(H_{kj}^{t})+F_{1}^{'}(H_{kj}^{t})(H_{kj}-H_{kj}^{t}) +\frac{1}{2}[(W^TW)_{kk}+(\lambda L)_{jj}](H_{kj}-H_{kj}^{t})^{2}. \end{equation*}

    Note that (W^TWH^t)_{kj} = \sum_{r = 1}^{K}(W^TW)_{kr}(H^t)_{rj}\geq (W^TW)_{kk}H^t_{kj}, (\lambda H^t D)_{kj} = \lambda \sum_{r = 1}^{K}H^t_{kr}D_{rj}\geq \lambda H^t_{kj}D_{jj} \geq \lambda H^t_{kj}(D-S)_{jj} = \lambda H^t_{kj}L_{jj}, then, \frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^{t}} \geq (W^TW)_{kk}+(\lambda L)_{jj}. Therefore,

    \begin{equation*} \begin{aligned} \frac{1}{2}\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^{t}}(H_{kj}-H_{kj}^{t})^{2} &\geq \frac{1}{2}\frac{(W^TW)_{kk}(H^t)_{kj}+\lambda(H^t)_{kj}L_{jj}}{H_{kj}^{t}} (H_{kj}-H_{kj}^{t})^{2} \\ & = \frac{1}{2}[(W^TW)_{kk}+(\lambda L)_{jj}](H_{kj}-H_{kj}^{t})^{2} \end{aligned} \end{equation*}

    holds and G_{H_1}(H_{kj}, H_{kj}^t)\geq F_{1}(H_{kj}) follows.

    The second part of (5.11) is also similar to Lemma 5. By Lemmas 3 and 4,

    \begin{equation*} \begin{aligned} G_{H_2}(H_{kj},H_{kj}^{t}) & = \frac{\alpha_{1}}{2}\sum\limits_{a = 1}^{k}\sum\limits_{b = 1}^{k}\frac{1}{(H^tV^{T})_{ab}}\sum\limits_{r = 1}^{n}H^t_{ar}V^{T}_{rb}\left[\delta_{ab}-\frac{H_{ar}}{H_{ar}^{t}}(H^tV^{T})_{ab}\right]^{2}+\frac{\alpha_{2}}{2}\|V-H\|_{F}^{2}\\ &\geq\frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-\sum\limits_{r = 1}^n\frac{H^t_{ar}V_{rb}^T}{(H^tV^T)_{ab}}\frac{H_{ar}}{H_{ar}^t}(H^tV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ & = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-(HV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2 = F_2(H_{kj}). \end{aligned} \end{equation*}

    follows and G_{H_2}(H_{kj}, H_{kj}^t)\geq F_2(H_{kj}) holds. Then the result follows.

    Based on the above preparations, we now provide the main conclusion of this paper.

    Theorem 1. Let X\in R_{+}^{m\times n} , W^0\in R_{+}^{m\times k} , V^0\in R_{+}^{k\times n} , and H^0\in R_{+}^{k\times n} . Then, the objective function in (3.4) is nonincreasing under sequences \{W^{t}, V^{t}, H^{t}\}_{t = 0}^{+\infty} generated by (3.14)(3.16).

    Proof. First, since \frac{\partial G_W(W_{ik}, W_{ik}^{t})}{\partial W_{ik}} = F_{W_{ik}}^{'}(W_{ik}^{t})+\frac{(W^{t}HH^T)_{ik}}{W_{ik}^{t}}(W_{ik}-W_{ik}^{t}) and letting \frac{\partial G_W(W_{ik}, W_{ik}^{t})}{\partial W_{ik}} = 0 , it then goes to F_{W_{ik}}^{'}(W_{ik}^{t})W_{ik}^{t}+(W^{t}HH^T)_{ik}W_{ik}-(W^{t}HH^T)_{ik}W_{ik}^{t} = 0. According to Lemma 2,

    \begin{equation*} W_{ik} = \frac{W_{ik}^{t}(W^{t}HH^T)_{ik}-W_{ik}^{t}F_{W_{ik}}^{'}(W_{ik}^{t})}{(W^{t}HH^T)_{ik}} = \frac{W_{ik}^{t}(W^{t}HH^T)_{ik}-W_{ik}^{t}(W^{t}HH^T-XH^T)_{ik}}{(W^{t}HH^T)_{ik}} = W_{ik}^{t}\frac{(XH^T)_{ik}}{(W^{t}HH^T)_{ik}}. \end{equation*}

    By Lemma 1, if W_{ik}^{t+1} = \arg\min_{W_{ik}\in R}G_W(W_{ik}, W_{ik}^{t}) = W_{ik}^{t}\frac{XH^T}{(W^{t}HH^T)_{ik}}, then, (3.14) is nonincreasing. This with (3.4) has a nonnegative lower bound, then, (3.14) is a convergent iterative formula.

    Second, due to the fact that

    \begin{equation*} \begin{aligned} \frac{\partial G_{V}(V_{kj},V_{kj}^{t})}{\partial V_{kj}} = &\frac{\alpha_1}2\sum\limits_{a = 1}^k\frac{H_{aj}V^{t^T}_{jk}}{(HV^{t^T})_{ak}}2\left[1-\frac{V_{kj}}{V_{kj}^t}(HV^{t^T})_{ak}\right] \left[-\frac{(HV^{t^T})_{ak}}{V^{t^T}_{kj}}\right]+\frac{\alpha_2}22(V_{kj}-H_{kj})\\ = &\alpha_1\sum\limits_{a = 1}^k\left[-H_{aj}+\frac{H_{aj}}{V_{kj}^t}(HV^{t^T})_{ak}V_{kj}\right]+\alpha_2(V_{kj}-H_{kj})\\ = &\alpha_1\sum\limits_{a = 1}^kH_{aj}\left[\frac{V_{kj}}{V_{kj}^t}(HV^{t^T})_{ak}-\delta_{ak}\right]+\alpha_2(V_{kj}-H_{kj})\\ = &\alpha_1\left[\frac{V_{kj}}{V_{kj}^t}(V^tH^TH)_{kj}-H_{kj}\right]+\alpha_2(V_{kj}-H_{kj}). \end{aligned} \end{equation*}

    Let \frac{\partial G_{V}(V_{kj}, V_{kj}^{t})}{\partial V_{kj}} = 0 , then, \alpha_1\frac{V_{kj}}{V_{kj}^t}(V^tH^TH)_{kj}+\alpha_2V_{kj} = (\alpha_1+\alpha_2)H_{kj}, \alpha_1V_{kj}(V^tH^TH)_{kj}+\alpha_2V_{kj}V_{kj}^t = (\alpha_1+\alpha_2)H_{kj}V_{kj}^t. According to Lemma 5, we can get that V_{kj} = V_{kj}^{t}\frac{\left[(\alpha_1+\alpha_2)H \right]_{kj}}{\left[\alpha_1V^tH^TH+\alpha_2V^t \right]_{kj}}. So, by Lemma 1, if V_{kj}^{t+1} = \arg\min_{V_{kj}\in R}G_V(V_{kj}, V_{kj}^{t}) = V_{kj}^{t}\frac{\left[(\alpha_1+\alpha_2)H \right]_{kj}}{\left[\alpha_1V^tH^TH+\alpha_2V^t \right]_{kj}}, then (3.15) is nonincreasing. Besides, the objective function in (3.4) has a nonnegative lower bound, and it can be concluded that the formula in (3.15) is a convergent iterative formula.

    Third, it is true that \frac{\partial G_{H_1}(H_{kj}, H_{kj}^t)}{\partial H_{kj}} = F_{1_{H_{kj}}}^{'}(H_{kj}^t)+\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^t}(H_{kj}-H_{kj}^t) and

    \begin{equation*} \begin{aligned} \frac{\partial G_{H_2}(H_{kj},H_{kj}^t)}{\partial H_{kj}} = &\frac{\alpha_{1}}{2}\sum\limits_{b = 1}^{k}\frac{H^t_{kj}V_{jb}^T}{(H^tV^T)_{kb}}2\left[1-\frac{H_{kj}}{H_{kj}^{t}}(H^tV^T)_{kb}\right]\left[-\frac{(H^tV^T)_{kb}}{H_{kj}^{t}}\right]+\frac{\alpha_{2}}{2}2(H_{kj}-V_{kj})\\ = &\alpha_{1}\sum\limits_{b = 1}^{k}\left[-V^T_{jb}+\frac{V^T_{jb}}{H_{kj}^{t}}(H^tV^T)_{kb}H_{kj}\right]+\alpha_{2}(H_{kj}-V_{kj})\\ = &\alpha_{1}\sum\limits_{b = 1}^{k}V^T_{jb}\left[\frac{H_{kj}}{H_{kj}^{t}}(H^tV^T)_{kb}-\delta_{kb}\right]+\alpha_{2}(H_{kj}-V_{kj})\\ = &\alpha_1\frac{H_{kj}}{H_{kj}^t}(H^tV^TV)_{kj}+\alpha_2H_{kj}-(\alpha_1+\alpha_2)V_{kj}. \end{aligned} \end{equation*}

    Let \frac{\partial G_{H}(H_{kj}, H_{kj}^{t})}{\partial H_{kj}} = \frac{\partial G_{H_1}(H_{kj}, H_{kj}^{t})}{\partial H_{kj}}+\frac{\partial G_{H_2}(H_{kj}, H_{kj}^{t})}{\partial H_{kj}} = 0 , then,

    \begin{equation*} \begin{aligned} &F_{1_{H_{kj}}}^{'}(H_{kj}^t)+\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^t}(H_{kj}-H_{kj}^t)+\alpha_1\frac{H_{kj}}{H_{kj}^t}(H^tV^TV)_{kj}+\alpha_2H_{kj}-(\alpha_1+\alpha_2)V_{kj} = 0\\ &F_{1_{H_{kj}}}^{'}(H_{kj}^{t})H_{kj}^{t}+(W^{T}WH^{t}+\lambda H^{t}D)_{kj}H_{kj}-(W^{T}WH^{t}+\lambda H^{t}D)_{kj}H_{kj}^{t}+\alpha_{1}H_{kj}(H^{t}V^{T}V)_{kj}\\ &+\alpha_{2}H_{kj}H^t_{kj}-(\alpha_{1}+\alpha_{2})V_{kj}{H_{kj}^{t}} = 0. \end{aligned} \end{equation*}

    According to Lemma 6, it leads to H_{kj} = H^t_{kj}\frac{\left[W^TX+\lambda H^tS+(\alpha_1+\alpha_2)V \right]_{kj}}{\left[W^TWH^t+\lambda H^tD+\alpha_1H^tV^TV+\alpha_2H^t \right]_{kj}}. By Lemma 1, if

    \begin{equation*} H_{kj}^{t+1} = \arg\min\limits_{H_{kj}\in R}G_H(H_{kj},H_{kj}^{t}) = H^t_{kj}\frac{\left[W^TX+\lambda H^tS+(\alpha_1+\alpha_2)V \right]_{kj}}{\left[W^TWH^t+\lambda H^tD+\alpha_1H^tV^TV+\alpha_2H^t \right]_{kj}} \end{equation*}

    holds, then (3.16) is nonincreasing. Once again, since the objective function in (3.4) has a nonnegative lower bound, it can be concluded that the formula (3.16) is a convergent iterative formula.



    [1] C. S. Gardner, J. M. Greene, M. D. Kruskal, R. M. Miura, Method for Solving the Korteweg-deVries Equation, Phys. Rev. Lett., 19 (1967), 1095–1097. https://doi.org/10.1103/PhysRevLett.19.1095 doi: 10.1103/PhysRevLett.19.1095
    [2] M. Wadati, K. Konno, Y. H. Ichikawa, New integrable nonlinear evolution equations, J. Phys. Soc. Jpn., 47 (1979), 1698–1700. https://doi.org/10.1143/JPSJ.47.1698 doi: 10.1143/JPSJ.47.1698
    [3] Y. Ichikawa, K. Konno, M. Wadati, Nonlinear transverse oscillation of elastic beams under tension, J. Phys. Soc. Jpn., 50 (1981), 1799–1802. https://doi.org/10.1143/JPSJ.47.1698 doi: 10.1143/JPSJ.47.1698
    [4] K. Konno, Y. Ichikawa, M. Wadati, A loop soliton propagation along a stretched rope, J. Phys. Soc. Jpn., 50 (1981), 1025–1026. https://doi.org/10.1143/JPSJ.50.1025 doi: 10.1143/JPSJ.50.1025
    [5] C. Qu, D. Zhang, The WKI model of type Ⅱ arises from motion of curves in E^3, J. Phys. Soc. Jpn., 74 (2005), 2941–2944. https://doi.org/10.1143/JPSJ.74.2941 doi: 10.1143/JPSJ.74.2941
    [6] T. Shimizu, M. Wadati, A new integrable nonlinear evolution equation, Prog. Theor. Phys., 63 (1980), 808–820. https://doi.org/10.1143/PTP.63.808 doi: 10.1143/PTP.63.808
    [7] J. Choudhury, Inverse scattering method for a new integrable nonlinear evolution equation under nonvanishing boundary conditions, J. Phys. Soc. Jpn., 51 (1982), 2312–2317. https://doi.org/10.1143/JPSJ.51.2312 doi: 10.1143/JPSJ.51.2312
    [8] Y. Ishimori, A relationship between the Ablowitz-Kaup-Newell-Segur and Wadati-Konno-Ichikawa schemes of the inverse scattering method, J. Phys. Soc. Jpn., 51 (1982), 3036–3041. https://doi.org/10.1143/jpsj.51.3036 doi: 10.1143/jpsj.51.3036
    [9] M. Wadati, K. Sogo, Gauge transformations in soliton theory, J. Phys. Soc. Jpn., 52 (1983), 394–398. https://doi.org/10.1143/JPSJ.52.394 doi: 10.1143/JPSJ.52.394
    [10] D. Levi, O. Ragnisco, A. Sym, The Bäcklund transformations for nonlinear evolution equations which exhibit exotic solitons, Phys. Lett. A., 100 (1984), 7–10. https://doi.org/10.1016/0375-9601(84)90341-4 doi: 10.1016/0375-9601(84)90341-4
    [11] M. Boiti, V. S. Gerdjikov, F. Pempinelli, The WKIS system: Bäcklund transformations, generalized Fourier transforms and all that, Prog. Theor. Phys., 75 (1986), 1111–1141. https://doi.org/10.1143/PTP.75.1111 doi: 10.1143/PTP.75.1111
    [12] A. Kundu, Explicit auto-Bäcklund relation through gauge transformation, J. Phys. A: Math. Gen., 20 (1987), 1107–1114. https://doi.org/10.1088/0305-4470/20/5/022 doi: 10.1088/0305-4470/20/5/022
    [13] Y. Xiao, The Bäcklund transformations for the Wadati-Konno-Ichikawa system, Phys. Lett. A., 149 (1990), 369–370. https://doi.org/10.1016/0375-9601(90)90895-U doi: 10.1016/0375-9601(90)90895-U
    [14] Z. J. Qiao, A completely integrable system and parametric representation of solutions of the Wadati-Konno-Ichikawa hierarchy, J. Math. Phys., 36 (1995), 3535–3540. https://doi.org/10.1063/1.530979 doi: 10.1063/1.530979
    [15] Z. Li, X. G. Geng, L. Guan, Algebro-geometric constructions of the Wadati-Konno-Ichikawa flows and applications, Math. Meth. Appl. Sci., 39 (2016), 734–743. https://doi.org/10.1142/S0219887823502390 doi: 10.1142/S0219887823502390
    [16] Y. S. Zhang, D. Qiu, Y. Cheng, The Darboux transformation for the Wadati-Konno-Ichikawa system, Theor. Math. Phys., 191 (2017), 710–724. https://doi.org/10.1134/S0040577917050117 doi: 10.1134/S0040577917050117
    [17] H. F. Liu, Y. Schimabukuro, N-soliton formula and blow-up result of the Wadati-Konno-Ichikawa equation, J. Phys. A: Math. Theor., 50 (2017), 315204. https://doi.org/10.1088/1751-8121/aa75af doi: 10.1088/1751-8121/aa75af
    [18] Y. S. Zhang, J. G. Rao, Y. Cheng, J. S. He, Riemann-Hilbert method for the Wadati-Konno-Ichikawa equation: N simple poles and one higher-order pole, Phys. D., 399 (2019), 173–185. https://doi.org/10.1016/j.physd.2019.05.008 doi: 10.1016/j.physd.2019.05.008
    [19] Z. Q. Li, S. F. Tian, J. J. Yang, On the soliton resolution and the asymptotic stability of N-soliton solution for the Wadati-Konno-Ichikawa equation with finite density initial data in space-time solitonic regions, Adv. Math., 409 (2022), 108639. https://doi.org/10.48550/arXiv.2109.07042 doi: 10.48550/arXiv.2109.07042
    [20] Y. S. Zhang, S. W. Xu, The soliton solutions for the Wadati-Konno-Ichikawa equation, Appl. Math. Lett., 99 (2020), 105995. https://doi.org/10.1016/j.aml.2019.07.026 doi: 10.1016/j.aml.2019.07.026
    [21] X. X. Xu, A generalized Wadati-Konno-Ichikawa hierarchy and new finite-dimensional integrable systems, Phys. Lett. A., 301 (2002), 250–262. https://doi.org/10.1016/S0375-9601(02)00957-X doi: 10.1016/S0375-9601(02)00957-X
    [22] C. Z. Qu, R. X. Xia, Z. B. Li, Two-component Wadati-Konno-Ichikawa equation and tts symmetry reductions, Chin. Phys. Lett., 21 (2004), 2077–2080. https://doi.org/10.1088/0256-307X/21/11/002 doi: 10.1088/0256-307X/21/11/002
    [23] H. Y. Zhu, S. M. Yu, S. F. Shen, W. X. Ma, New integrable sl(2, R)-generalization of the classical Wadati-Konno-Ichikawa hierarchy, Commun. Nonlinear Sci. Numer. Simul., 22 (2015), 1341–1349. https://doi.org/10.1016/j.cnsns.2014.07.023 doi: 10.1016/j.cnsns.2014.07.023
    [24] X. G. Geng, Y. F. Guo, Y. Y. Zhai, Two integrable generalizations of WKI and FL equations: Positive and negative flows, and conservation laws, Chin. Phys. B., 29 (2020), 050201. https://doi.org/10.1088/1674-1056/ab7e9d doi: 10.1088/1674-1056/ab7e9d
    [25] A. V. Bäcklund, Om ytor med konstant negativ Krökning, Lunds Univ. Årsskr, 19 (1883), 1–48.
    [26] L. Bianchi, Sulle trasformazioni delle superficie a curvatura costante, Ann. Mat. Pura Appl., 18 (1899), 329–346. https://doi.org/10.1007/BF02419245 doi: 10.1007/BF02419245
    [27] H. D. Wahlquist, F. B. Estabrook, Bäcklund transformation for solutions of the Korteweg-de Vries equation, Phys. Rev. Lett., 31 (1973), 1386. https://doi.org/10.1103/PhysRevLett.31.1386 doi: 10.1103/PhysRevLett.31.1386
    [28] G. L. Lamb Jr., Bäcklund transformations of certain nonlinear evolution equations, J. Math. Phys., 15 (1974), 2157–2165. https://doi.org/10.48550/arXiv.2101.09245 doi: 10.48550/arXiv.2101.09245
    [29] K. Konno, M. Wadati, Simple derivation of Bäcklund transformation from Riccati form of inverse method, Prog. Theo. Phys., 53 (1975), 1652–1656. https://doi.org/10.1143/PTP.53.1652 doi: 10.1143/PTP.53.1652
    [30] C. Rogers, W. K. Schief, Bäcklund and Darboux transformations: geometry and modern applications in soliton theory, Cambridge: Cambridge University Press, 2002.
    [31] M. Boiti, F. Pempinelli, G. Z. Tu, The nonlinear evolution equations related to the Wadati-Konno-Ichikawa spectral problem, Prog. Theor. Phys., 69 (1983), 48–64. https://doi.org/10.1143/PTP.69.48 doi: 10.1143/PTP.69.48
    [32] A. G. Rasin, J. Schiff, Bäcklund transformations for the Camassa-Holm equation, J. Nonlinear Sci., 27 (2017), 45–69. https://doi.org/10.48550/arXiv.1508.05636 doi: 10.48550/arXiv.1508.05636
    [33] G. H. Wang, Q. P. Liu, H. Mao, The modified Camassa-Holm equation: Bäcklund transformation and nonlinear superposition formula, J. Phys. A: Math. Theor., 53 (2020), 294003. https://doi.org/10.1088/1751-8121/ab7136 doi: 10.1088/1751-8121/ab7136
    [34] H. Mao, G. H. Wang, Bäcklund transformations for the Degasperis-Procesi equation, Theor. Math. Phys., 203 (2020), 747–760. https://doi.org/10.1134/S0040577920060045 doi: 10.1134/S0040577920060045
    [35] H. Mao, Q. P. Liu, The short pulse equation: Bäcklund transformations and applications, Stud. Appl. Math., 145 (2020), 791–811. https://doi.org/10.1111/sapm.12336 doi: 10.1111/sapm.12336
    [36] H. Mao, Novikov equation: Bäcklund transformation and applications, Theor. Math. Phys., 206 (2021), 163–173. https://doi.org/10.1134/S0040577921020045 doi: 10.1134/S0040577921020045
    [37] F. L. Tan, L. H. Wu, On the Bäcklund transformation of a generalized Harry Dym type equation, Wave. Motion., 120 (2023), 103162. https://doi.org/10.1016/j.wavemoti.2023.103162 doi: 10.1016/j.wavemoti.2023.103162
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(406) PDF downloads(68) Cited by(0)

Figures and Tables

Figures(8)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog