On second-order differential subordination for certain meromorphically multivalent functions

Cai-Mei Yan; Jin-Lin Liu; Cai-Mei Yan; Jin-Lin Liu

doi:10.3934/math.2020320

AIMS Mathematics

2020, Volume 5, Issue 5: 4995-5003. doi: 10.3934/math.2020320

Previous Article Next Article

Research article

On second-order differential subordination for certain meromorphically multivalent functions

Cai-Mei Yan ¹,
Jin-Lin Liu ^{2
,
,}

1.
Information Engineering College, Yangzhou University, Yangzhou 225002, China
2.
Department of Mathematics, Yangzhou University, Yangzhou 225002, China

Received: 18 April 2020 Accepted: 22 May 2020 Published: 09 June 2020
MSC : Primary 30C45; Secondary 30C80

A new class

$\mathcal{R}_n(A, B, \lambda)$ of meromorphically multivalent functions defined by the second-order differential subordination is introduced. Several geometric properties of this new class are studied. The sharp upper bound on

$|z| = r < 1$ for the functional

$\mathrm{Re}\{(\lambda-1)z^{p+1}f'(z)+\frac{\lambda}{p+1}z^{p+2}f''(z)\}$ over the class

$\mathcal{R}_n(A, B, 0)$ is obtained.

Keywords:

Citation: Cai-Mei Yan, Jin-Lin Liu. On second-order differential subordination for certain meromorphically multivalent functions[J]. AIMS Mathematics, 2020, 5(5): 4995-5003. doi: 10.3934/math.2020320

Related Papers:

[1]	Xiang Gao, Linzhang Lu, Qilong Liu . Non-negative Tucker decomposition with double constraints for multiway dimensionality reduction. AIMS Mathematics, 2024, 9(8): 21755-21785. doi: 10.3934/math.20241058
[2]	A. El-Mesady, Y. S. Hamed, Khadijah M. Abualnaja . A novel application on mutually orthogonal graph squares and graph-orthogonal arrays. AIMS Mathematics, 2022, 7(5): 7349-7373. doi: 10.3934/math.2022410
[3]	Sanaa Al-Marzouki, Christophe Chesneau, Sohail Akhtar, Jamal Abdul Nasir, Sohaib Ahmad, Sardar Hussain, Farrukh Jamal, Mohammed Elgarhy, M. El-Morshedy . Estimation of finite population mean under PPS in presence of maximum and minimum values. AIMS Mathematics, 2021, 6(5): 5397-5409. doi: 10.3934/math.2021318
[4]	Nurcan Bilgili Gungor . Some fixed point results via auxiliary functions on orthogonal metric spaces and application to homotopy. AIMS Mathematics, 2022, 7(8): 14861-14874. doi: 10.3934/math.2022815
[5]	Khazan Sher, Muhammad Ameeq, Muhammad Muneeb Hassan, Basem A. Alkhaleel, Sidra Naz, Olyan Albalawi . Novel efficient estimators of finite population mean in stratified random sampling with application. AIMS Mathematics, 2025, 10(3): 5495-5531. doi: 10.3934/math.2025254
[6]	Ling Zhou, Zuhan Liu . Blow-up in a $p$ -Laplacian mutualistic model based on graphs. AIMS Mathematics, 2023, 8(12): 28210-28218. doi: 10.3934/math.20231444
[7]	Igal Sason . Observations on graph invariants with the Lovász $\vartheta$ -function. AIMS Mathematics, 2024, 9(6): 15385-15468. doi: 10.3934/math.2024747
[8]	Jahfar T K, Chithra A V . Central vertex join and central edge join of two graphs. AIMS Mathematics, 2020, 5(6): 7214-7233. doi: 10.3934/math.2020461
[9]	Dijian Wang, Dongdong Gao . Laplacian integral signed graphs with few cycles. AIMS Mathematics, 2023, 8(3): 7021-7031. doi: 10.3934/math.2023354
[10]	Milica Anđelić, Saleem Khan, S. Pirzada . On graphs with a few distinct reciprocal distance Laplacian eigenvalues. AIMS Mathematics, 2023, 8(12): 29008-29016. doi: 10.3934/math.20231485

Abstract

A new class

$|z| = r < 1$ for the functional

$\mathrm{Re}\{(\lambda-1)z^{p+1}f'(z)+\frac{\lambda}{p+1}z^{p+2}f''(z)\}$ over the class

$\mathcal{R}_n(A, B, 0)$ is obtained.

1. Introduction

In pattern recognition, cluster analysis is a formal study of methods and algorithms for dividing data into meaningful and useful clusters (groups) based on some intrinsic similarity measures^[1]. It is a crucial intermediate step in exploring the potential structure of massive data for subsequent analysis. In the field of internet applications, clustering is used to identify users with different interest characteristics, thus realizing the personalized service recommendation process ^[2,3]. In the field of biology, clustering can be applied to discover patterns in genes, helping to identify subtypes of certain diseases (e.g., cancers)^[4,5,6].

The nonnegative matrix factorization (NMF) models (algorithms) have been widely studied as one of the important tools for achieving clustering. The NMF model finds wide applications in various domains, owing to its part-based representation and physical interpretability. In ^[7,8], NMF was used to decompose the original nonnegative high-dimensional matrix $X$ into the product of two low-dimensional nonnegative matrices, $WH$ , ensuring that $X$ and $WH$ are sufficiently close under some metric. Usually, $W$ is referred to as the basis matrix and $H$ is referred to as the coefficient matrix. Each element in the matrices $W$ and $H$ obtained above is nonnegative, and their rank is much lower than $X$ itself, so this decomposition is called dimension reduction decomposition. This form of decomposition effectively saves storage and computational space, improving computational efficiency. In clustering analysis, the coefficient matrix $H$ is used as a substitute matrix for the high-dimensional original matrix $X$ for clustering tasks. The matrix $H$ here is much lower in dimension than the original matrix $X$ , and it will save a lot of computation and space.

Lee and Seung ^[9] extended NMF by introducing a simple algorithm based on multiplication update rules (MUR) and demonstrating its effectiveness in learning parts of objects and discovering semantic features in text. Additionally, NMF has been successfully extended and applied in various fields, including clustering and classification ^[10,11,12], image analysis ^[13,14,15], spectral analysis ^[16,17], and blind source separation ^[18,19], and so on. Moreover, NMF has brought a series of new theories and algorithms for clustering problems. Numerous studies have shown that by introducing different constraints, such as sparsity ^[12,20,21], smoothness ^[16,22,23], and orthogonality ^[24,25,26], the performance of standard NMF can usually be further improved. This is because the addition of these constraints introduces other useful features beyond non negativity to the standard NMF model. In fact, many datasets not only do not follow linear distributions, but also exist in certain nonlinear spaces, such as manifold spaces. This requires the introduction of nonlinear strategies for effective processing, such as constructing a Laplace matrix ^[27,28] to maintain this nonlinear geometric structure between data points. This approach is known as manifold learning, which assumes that high-dimensional data has a nonlinear mapping in a low-dimensional space.

Cai et al. effectively combined manifold learning with NMF by introducing a graph regularization term, which is the graph regularized nonnegative matrix factorization (GNMF^[29]) method. This algorithm aims to establish smooth connections between data representations, encode geometric information between data in linear and nonlinear spaces, and find the representation of data on manifolds. The advantage of this method is that it not only preserves the inherent geometric structure, but also obtains a compact representation of hidden semantic information. Based on this, Li et al.^[30] presents a graph-based local concept coordinate factorization (GLCF) method, which respects the intrinsic structure of the data through manifold kernel learning in the warped eproducing kernal Hilbert space. Hu et al. ^[31] combined the GNMF algorithm with the Convex Non-Negative Matrix Factorization (CNMF) algorithm and proposed the graph convex nonnegative matrix factorization (GCNMF) algorithm with manifold regularization terms; Meng et al. ^[32] introduced graph regularization terms in the sample space and feature space, and proposed the dual graph sparse nonnegative matrix factorization (DSNMF) algorithm. In addition, algorithms such as graph regularized discriminative non-negative matrix factorization (GDNMF) ^[33], dual regularization nonnegative matrix factorization(DNMF) ^[34], Incomplete multi-view clustering (IMVC) ^[35], collaborative topological graph learning (CTGL) ^[36], multi-view Deep nonnegative matrix factorization with multi-kernel learning (MvMKDNMF) ^[37], nonnegtive mtrix factorization with graph regularization for multi-view data representation (GMultiNMF) ^[38], and Pairwise constrained Graph Regularized Convex Nonnegative Matrix Factorization (PGCNMF) ^[39] applied constraints to the data through manifold learning, constructed different Laplacian regularization terms, and provided many basic methods for data analysis through the idea of manifold learning.

Recently, orthogonal nonnegative matrix factorization (ONMF ^[24]) has received widespread attention by imposing orthogonal constraints on NMF, making it more suitable for clustering tasks as the constraint matrix composed of factors can be viewed as an indicator matrix. This is because local representation requires the fundamental vectors of the factor matrices to be different from each other, which means that local representation is closely related to orthogonality.

The orthogonality of a given matrix $A$ refers to $A^TA = I$ or $AA^T = I$ ( $I$ is an appropriate identity matrix), which means that any two different columns of $A$ are orthogonal or rows of $A$ are orthogonal.

In ^[20] Li et al. found that for some datasets, part-based representation involves clustering the rows of the input data matrix, and the clustering result is described by the basis matrix $W$ . The orthogonal constraint on $W$ is column orthogonality, $W^TW = I$ . In document clustering ^[40,41], clustering is performed on the columns of the input data matrix, represented by the coefficient matrix $H$ , and the orthogonality on each row of $H$ can improve clustering performance. The orthogonality constraint here is now on coefficient matrix $H$ satisfying $HH^T = I$ . The orthogonality models of the above two categories have greatly improved clustering performance^[42,43,44]. Due to the excellent performance of ONMF in clustering, many update rules and variants of ONMF have been proposed ^[45,46,47].

In fact, ONMF was initially proposed by Ding ^[24], and empirical evidence demonstrated that introducing a certain degree of orthogonality constraint into the algorithm can enhance sparsity, reduce redundancy in feature vectors, and ensure the uniqueness of the solution and interpretability of clustering results. In their method, orthogonality is achieved by solving an optimization problem with orthogonality constraints.

There, ONMF was considered as a continuous relaxation (without discrete constraints) under the K-means method. Research on various data mining tasks has shown that compared to clustering methods based on K-means and NMF, the ONMF model performs better. However, in order to achieve the orthogonality of the basis vectors, this requires intensive computation, especially for word clustering in the document matrix, which results in high computational costs. Next, in 2010, Li et al. proposed nonnegative matrix factorization on orthogonal subspace (NMFOS) ^[48], introducing the concept of penalty functions to incorporate orthogonality constraints into the objective function. This orthogonality, which is achieved by adding it as a penalty term to the objective function, is established to reduce computational burden and to some extent improve the sparsity of the decomposition matrices ( $W$ matrix or $H$ matrix). However, this model has two issues. First, when applied to certain document datasets, the algorithm may exhibit zero locking phenomenon; Second, the convergence proof proposed in it is not rigorous enough (due to insufficient correctness of matrix differentiation), that is, it fails to effectively prove that the iterative algorithm does indeed converge.

Overall, ONMF models have shown good performance, but still face the following two challenges: significant computational costs when used as constraints, and difficulty or absence in proving the convergence of algorithms when orthogonality $W^TW = I$ or $HH^T = I$ is introduced as a penalty term. Based on the above observations, this paper will study a new ONMF-based model graph-regularized nonnegative matrix factorization orthogonal subspace with auxiliary variable(GNMFOSV). By adding an orthogonal penalty term and a graph Laplacian term, we achieve the orthogonality of the factor matrix through factorization, enhancing the sparsity of the coefficient matrix to some extent. Leveraging the geometric structure properties of the data, we enhance the algorithm's capability to recover the underlying nonlinear relationships in the data. More importantly, we will introduce an intermediate variable $V$ as an alternating update in the auxiliary function, and prove the convergence of our algorithm using the idea of alternating updates of three variables. Meanwhile, experiments on a large number of datasets have also shown that this algorithm greatly improves clustering performance and has a certain degree of robustness to the dataset. In other words, experimental results on many datasets have shown that our algorithm maintains overall optimality.

The main contributions of this paper are as follows:

$\bullet$ The ONMF method obtains sparser results through orthogonal constraints. However, in some complex practical application scenarios, it is still difficult to interpret the physical meanings or practical implications of the basis vectors and coefficient matrices obtained after decomposition, which is not conducive to a deep understanding of the internal structure and features of the data. At the same time, it is also very difficult to prove the convergence of the algorithm. The GNMF method preserves the inherent geometric structure of the data and also acquires a compact representation of the hidden semantic information. Nevertheless, this method overemphasizes local information and relatively ignores the global structure and dependency relationships of the data. In this paper, in order to improve the interpretability of ONMF and the theoretical convergence of the algorithm, and also to retain the global structure and dependency relationships of the data in the GNMF model, the two methods are integrated, and an auxiliary function is ingeniously added, resulting in a rigorous convergence proof. The ablation experiments also fully demonstrate that integrating the graph regularization method, orthogonal constraints, and the auxiliary variable mechanism can significantly improve the clustering accuracy and stability, outperforming single methods on multiple datasets, which fully verifies the rationality and innovation of the model design.

$\bullet$ In our algorithm, different levels of orthogonality can be achieved by adjusting the parameter $\alpha_1$ and $\alpha_2$ , and various regularization strengths can be obtained by tuning the parameter $\lambda$ , making the algorithm more flexible and applicable to a broader range of scenarios.

$\bullet$ Extensive experiments were conducted on 13 datasets, demonstrating the algorithm's effectiveness in terms of clustering performance. The results not only highlight its efficiency in learning and enhanced discriminative power but also indicate its ability to acquire improved partial linear representations.

The remaining content of this paper is as follows: Section 2 reviews related work, including NMF, NMFOS, and GNMF. Section 3 provides a detailed introduction to our method, including the derivation of update rules and the corresponding algorithm, along with a proof of the algorithm's convergence. Section 4 presents experiments and analysis. Section 5 offers conclusions and future directions.

2. Symbols and classic models

In this section, some symbols and notations used throughout the paper will be provided in Tables 1 below, followed by classic models of NMF, NFMOS, GNMF and their multiplicative update algorithms.

Table 1. The symbols used throughout the paper.

Symbols	Annotations
$\mathbb{N}$	The set of all natural numbers
$R_{+}^{\upsilon\times\omega}$	The set of all nonnegative matrices with $\upsilon, \omega\in\mathbb{N}$
$X$	Input Nonnegative data matrix $X=[X_1, \cdots, X_N]\in R_{+}^{M\times N}$
$W$	The basis matrix $W=[W_{ik}]\in R_{+}^{M\times K}$
$H$	The coefficient matrix $H=[H_{kj}]\in R_{+}^{K\times N}$
$V$	The auxiliary matrix $V=[V_{kj}]\in R_{+}^{K\times N}$
$M$	The number of data features(or denoted as data dimension)
$N$	The number of data instances(or denoted as data points)
$K$	The number of data clusters
$i$	Elements in the sets $\{1, 2, \cdots, M\}$
$j$	Elements in the sets $\{1, 2, \cdots, N\}$
$k$	Elements in the sets $\{1, 2, \cdots, K\}$
$\\|\cdot\\|_{F}$	Frobenius norm
$Tr(\cdot)$	The trace operator
$L$	The graph Laplacian operators of the data manifold
$S$	The weighted matrix
$D$	The diagonal matrices with $D_{ii}=\sum_jS_{ij}$
$p$	The size of nearestneighbors
$I$	The identity matrix
$\alpha_1, \alpha_2$	The penalty parameter for controlling orthogonality $\alpha_1 \geq 0, \alpha_2 \geq 0$
$\lambda$	The regularization parameter $\lambda \geq 0$
$\psi, \phi$	Lagrange multiplier

| Show Table

DownLoad: CSV

2.1. NMF

NMF ^[9] is a classic matrix factorization algorithm that requires all elements in the decomposed matrix to be nonnegative. Consider the objective matrix $X = [x_1, \cdots, x_N]\in R_{+}^{M\times N}$ and treat each $x_j$ of $X$ as a column vector. Nonnegative matrix factorization of $X$ refers to finding two matrices $W$ and $H$ ( $W = [w_{ik}]\in R_{+}^{M\times K}$ , $H = [h_{kj}]\in R_{+}^{K\times N}$ , $i = 1, 2, \cdots, M$ , $j = 1, 2, \cdots, N$ and $k = 1, 2, \cdots, K$ ), such that the matrix-product $WH$ is as equal as possible to $X$ under a certain metric:

$\begin{equation} X\approx\ WH. \end{equation}$

(2.1)

Generally speaking, $W$ is called the basis matrix and $H$ is called the coefficient matrix. This is because (2.1) tells us that each column vector $x_j$ of $X$ can be linearly represented using the column vectors of $W$ in this factorization, where the coefficients come from matrix $H$ . The specific representation is as follows:

$\begin{equation} x_j\approx\sum\limits_{k = 1}^K\ w_k h_{kj}, \end{equation}$

(2.2)

where $w_k$ represents the column vectors of $W$ , and $K$ in the (2.2) is much smaller than $M$ , $N$ , or satisfies $(M+N)K < MN$ . It is precisely because of (2.2) that the decomposed matrix $W$ is called the basis matrix and matrix $H$ is called the coefficient matrix. Afterward, matrix $H$ will be used in clustering analysis applications, similar to the indicator set matrix in K-means. Meanwhile, the fact that $K$ is much smaller than $M$ and $N$ makes this decomposition a dimensionality reduction decomposition, so clustering with $H$ (not $X$ ) will greatly save time and space, and the mathematical model of NMF is

$\begin{equation} \begin{aligned} \min\limits_{W,H}\|X-WH\|_F^2, \\ s.t.W\geq0,H\geq0, \end{aligned} \end{equation}$

(2.3)

where $W = [w_1, \cdots, w_M]\in R_{+}^{M\times K}$ , $H = [h_1, \cdots, h_N]\in R_{+}^{K\times N}$ . In addition, we will import an iterative multiplicative update algorithm from Lee and Seung ^[50].

$\begin{equation*} W_{ik}\leftarrow W_{ik}\frac{(XH^T)_{ik}}{(WHH^T)_{ik}},\ H_{kj}\leftarrow H_{kj}\frac{(W^TX)_{kj}}{(W^TWH)_{kj}}. \end{equation*}$

2.2. NMFOS

Chris Ding et al.^[24] extended the traditional NMF by incorporating an orthogonal constraint on the basis or coefficient matrix into the objective function's constraints, and this model is the important ONMF model. More importantly, the equivalence between ONMF and kernel K-means methods was also proved. Li et al. ^[48] found that the Lagrangian multiplier used in the nonlinear constraint optimization in ONMF is a symmetric matrix that needs to be updated at each iteration, resulting in a large computational workload. So, to overcome this limitation, they proposed a novel approach called NMFOS (using penalty terms). Then, an improved optimization model was obtained in the following form:

$\begin{equation} \min\limits_{W,H}\|X-WH\|_F^2+\alpha\|HH^T -I\|_F^2, \; s.t.\; W\geq0,H\geq0, \end{equation}$

(2.4)

where $X = [x_1, \cdots, x_N]\in R_{+}^{M\times N}$ , $W = [w_1, \cdots, w_M]\in R_{+}^{M\times K}$ , $H = [h_1, \cdots, h_N]\in R_{+}^{K\times N}$ , $I$ is the identity matrix with $K$ rows and $K$ columns.The nonnegative parameter $\alpha \geq 0$ controls the degree of orthogonality of the vectors in $H$ . When $\alpha = 0$ , NMFOS reduces to NMF. Through model (2.4), the following update rules were derived:

$\begin{equation} W_{ik} \leftarrow W_{ik} \frac{(XH^T)_{ik}}{(WHH^T)_{ik}},\ H_{kj} \leftarrow H_{kj} \frac{(W^TX+2\alpha H)_{kj}}{(W^TWH+2\alpha HH^TH)_{kj}}. \end{equation}$

(2.5)

The advantage of this algorithm is that by adjusting the penalty parameter $\alpha$ , orthogonality can be achieved during the decomposition process without the need for additional constraints, while reducing computational complexity. By the way, it should be noted that Mirzal ^[51] has proved the convergence of (2.5) from ^[24].

2.3. GNMF

The extraction of the internal structure of data is an important application extension based on NMF models. Graph regularization is a manifold-based data processing method widely used to analyze the intrinsic geometric structure of manifold data ^[49,52,53]. The starting point of this idea is that manifolds can enhance the smoothness of data in both linear and nonlinear spaces by minimizing the formula later in (2.6), and by extracting the geometric structure information from data points and features, manifolds can also enhance the clustering performance of NMF (see details in ^[34,54,55]). The study of NMF models based on such minimization forms is the classic GNMF model ^[29].

Next, let's briefly introduce the specific construction process of GNMF. To effectively capture the geometric structure of both data and feature manifolds, the $p$ -nearest neighbor graph is constructed based on the input data matrix $X = [x_1, \cdots, x_N] \in R_{+}^{M\times N}$ . Considering a graph with $N$ vertices, which make up the current $X$ , each vertex corresponds to a data point $x_j$ ( $j = 1, 2, \cdots, N$ ). For each data point $x_j$ , we identify its $p$ -nearest neighbors and connect them with edges (theoretically, $p$ can take any value from $\{1, 2, \cdots, N\}$ , but in practice, it is usually between 3 and 5). At the same time, a weight matrix $S$ is defined, which is used to measure the degree to which adjacent points are not only similar in distance but also similar in geometric structure, thereby strengthening the influence of geometric structure on the model. There are many options for defining the weight matrix $S = \{S_{jl}\} \in R_{+}^{N\times N}$ on the graph, with the commonly used ones being the following three options:

$\bullet \; 0-1$ Weight Matrix: If $N(x_j)$ represents the set of $p$ -nearest neighbors for data point $x_j$ , then

$\begin{equation*} \begin{aligned} S_{jl} = \begin{cases}1,&x_j\in N(x_l)\ \text{or}\ x_l\in N(x_j),\\0,&\text{otherwise},\end{cases}\ j,l\in\{1,\cdots,N\}. \end{aligned} \end{equation*}$

$\bullet$ Heat Kernal Weight Matrix: If point $x_j$ is connected to point $x_l$ , then,

$\begin{equation*} S_{jl} = e^{-\frac{\|x_{j}-x_{l}\|^{2}}{\sigma}}. \end{equation*}$

$\bullet$ Dot-Product Weight Matrix: If point $x_j$ is connected to point $x_l$ , then,

$\begin{equation*} S_{jl} = {x}_j^T{x}_l. \end{equation*}$

It is worth noting that if the data point $x$ is normalized to $1$ , the above dot-product of two vectors is equal to their cosine similarity.

With the help of the weight matrix $S$ , the Laplacian matrix $L \in R_{+}^{N\times N}$ is defined to characterize the degree of similarity between points in geometric structure, and the Laplacian matrix $L$ is defined as $L = D-S$ , where $D \in R_{+}^{N\times N}$ is the degree matrix with $D_{ii} = \sum_jS_{ij}$ . Note that, $D$ is a diagonal matrix. Inspired by matrix factorization and manifold learning, Cai et al.^[29] proposed the GNMF algorithm. The core graph regularization formula is as follows:

$\begin{equation} \frac12\!\sum\limits_{l = 1}^n\!\sum\limits_{j = 1}^n\lVert h_l\!-\! h_j\rVert_2^2 S_{ij}\! = \!\sum\limits_{j = 1}^n\!D_{jj}h_j^Th_j\!-\!\sum\limits_{l = 1}^n\!\sum\limits_{j = 1}^n S_{lj}h_i^Th_j\! = \!Tr(HDH^T)\!-\!Tr(HSH^T)\! = \!Tr(HLH^T), \end{equation}$

(2.6)

where $Tr(\cdot)$ denotes the trace operator and measures the pairwise similarity of the original data points. By minimizing (2.6), an intuitive result can be obtained: If two data points are close in terms of the intrinsic geometric structure of the original data distribution, they will also be close with respect to the new basis vectors $h_l$ and $h_j$ . Then, the optimization model of GNMF is given by:

$\begin{equation} \min\left\|X-WH\right\|_F^2+\lambda\operatorname{Tr}\left(HLH^T\right),\; s.t.\; W \geq 0, H \geq 0, \end{equation}$

(2.7)

where $W = [w_1, \cdots, w_M]\in R_{+}^{M\times K}$ , $H = [h_1, \cdots, h_N]\in R_{+}^{K\times N}$ . The regularization parameter $\lambda \geq 0$ controls the smoothness of the model. By utilizing the Lagrange method, the model (2.7) can be solved and the iterative update rules are obtained below:

$\begin{equation} W_{i k} \leftarrow W_{i k} \frac{(X H^T)_{i k}}{\left(WHH^T\right)_{i k}},\ H_{kj} \leftarrow H_{kj} \frac{\left(W^TX+\lambda HS\right)_{kj}}{\left(W^TWH+\lambda HD\right)_{kj}}. \end{equation}$

(2.8)

It is evident that when $\lambda = 0$ , the update rules in (2.8) go back to the form of NMF update rules. In this paper, such manifold techniques are combined with NMFOS to enhance the capability of capturing underlying nonlinear relationships in data recovery.

3. A new class of NMF-type model and algorithm

In this section, we will introduce a new method called the GNMFOSV. More importantly, we also provide a convergence proof for this type of model to compensate for the lack or inadequacy of algorithm convergence proof. The specific operation is as follows: First, using the idea of NMFOS, the orthogonal constraint conditions on the coefficient matrix $H$ are added to the objective function by using the penalty function method. Second, inspired by GNMF, our new model utilizes the geometric structures inherent in the data manifold and feature manifold to evolve the manifold term into additional regularization terms and add them to the objective function for further clustering research. From the experimental results, it can be seen that combining orthogonality and graph regularization (manifold) and adding them separately to the objective function not only solves the problem of considering both the nonnegativity and orthogonality of the coefficient matrix, but also improves the clustering performance via GNMFOSV.

3.1. Methodology

As mentioned above, we incorporate both the orthogonality constraint and the Laplacian constraint into the objective function to construct GNMFOSV:

$\begin{equation} \min J(W,H) = \min\limits_{W,H} \frac{1}{2}\|X-WH\|_F^2+\frac{\alpha}{4}\|HH^T-I\|_F^2+\frac{\lambda }{2}Tr(HLH^T),\; s.t.\; W\geq0,H\geq0, \end{equation}$

(3.1)

where $X = [x_1, \cdots, x_N]\in R_{+}^{M\times N}$ , $W = [w_1, \cdots, w_M]\in R_{+}^{M\times K}$ , $H = [h_1, \cdots, h_N]\in R_{+}^{K\times N}$ , $L \in R_{+}^{N\times N}$ , $I$ is the identity matrix with $K$ rows and $K$ columns. Due to the property that a nonnegative matrix with pairwise orthogonal rows may have at most one nonzero entry in each column, the generation of strictly uncorrelated feature vectors is possible. Thus, we incorporate an orthogonal penalty term, $\alpha\|HH^T-I\|_F^2$ , into the foundational model. However, directly imposing orthogonality constraints on the coefficient matrix poses challenges for convergence proof. Therefore, we introduce an auxiliary variable, denoted as $V \in R_{+}^{K \times N}$ , with the matrix dimension equivalent to that of $H$ . Subsequently, two new parameters, $\alpha_1$ and $\alpha_2$ , are introduced to split the orthogonal constraint penalty term, $\alpha\|HH^T-I\|_F^2$ , into two second-order terms. This is equivalently replaced by $\frac{\alpha_1}{2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2$ , i.e.,

$\begin{equation} \alpha\|HH^T-I\|_F^2 = \frac{\alpha_1}{2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2. \end{equation}$

(3.2)

Therefore, we equivalently replace model (3.1) with the following new model:

$\begin{equation} \begin{aligned} &\min J(W,V,H) = \min\limits_{W,H,V} \frac{1}{2}\|X-WH\|_F^2+\frac{\lambda }{2}Tr(HLH^T)+\frac{\alpha_1} {2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2,\\ &s.t. W\geq0,H\geq0,V\geq0. \end{aligned} \end{equation}$

(3.3)

This formulation incorporates the orthogonality constraint with an auxiliary variable as a penalty term in the objective function(see Figure 1). The advantage of introducing the auxiliary variable lies in its ability to deconstruct the complex orthogonality constraints found in NMFOS, thereby addressing the common challenges of convergence proof encountered in traditional orthogonal subspace NMF methods, and facilitating rapid convergence and favorable comparative performance in our experiments.

Figure 1. The framework of GNMFOSV.

DownLoad: Full-Size Img PowerPoint

Here, $\alpha_1 \geq 0$ , $\alpha_2 \geq 0$ are the two orthogonal penalty parameters, and $\lambda \geq 0$ is the regularization parameter, respectively. The first two parameters adjust the level of orthogonality in $H$ and ensure the validity of Eq (3.2), while the latter balances the reconstruction error in the first term and the graph regularization in the third term of the objective function. For convenience, we employ a $0-1$ weight scheme to construct the $p$ -nearest neighbor graph. Clearly, when $\alpha_1 = \alpha_2 = \lambda = 0$ , model (3.3) degenerates into model (2.3). Since the objective function in model (3.3) is non-convex with respect to both W and H, obtaining closed-form solutions is challenging. Now, we need to explore how to develop an effective algorithm to solve this model. We adopt an iterative optimization algorithm and utilize multiplicative update rules to obtain local optima (a similar idea can be found in ^[56,57,58], but not the same). We can rewrite the objective function of $J(W, V, H)$ in trace form as follows:

$\begin{equation} \begin{aligned} J(W,V,H) = &\frac{1}{2}Tr((X-WH)^T(X-WH))+\frac{\lambda}{2}Tr(HLH^T)\\ &+\frac{\alpha_1}{2} Tr((I-HV^T)^T(I-HV^T))+\frac{\alpha_2}{2} Tr((V-H)^T(V-H))\\ = &\frac{1}{2}Tr(X^TX -2X^TWH+H^TW^T WH)+\frac{\lambda }{2}Tr(HLH^T)\\ &+\frac{\alpha_1}{2} Tr(I-2HV^T+VH^THV^T)+\frac{\alpha_2}{2} Tr(V^TV-2V^TH+H^TH). \end{aligned} \end{equation}$

(3.4)

Let $\psi$ , $\phi$ , and $\theta$ be the Lagrange multipliers associated with the constraints $W \geq 0$ , $H \geq 0$ , and $V \geq 0$ in the constrained model (3.3). Define matrices $\Psi = [\psi_{ik}]$ , $\Phi = [\phi_{kj}]$ , and $\Theta = [\theta_{kj}]$ . The Lagrangian function for the constrained optimization problem can then be expressed as:

$\begin{equation*} \begin{aligned} L = &J(W,V,H)-Tr(\Psi ^TH)-Tr(\Phi ^TW)-Tr(\Theta^TV)\\ = &\frac{1}{2}Tr(X^TX -2X^TWH+H^TW^T WH)+\frac{\lambda }{2}Tr(HLH^T)\\ &+\frac{\alpha_1}{2} Tr(I-2HV^T+VH^THV^T)+\frac{\alpha_2}{2} Tr(V^TV-2V^TH+H^TH)\\ &-Tr(\Psi ^TH)-Tr(\Phi ^TW)-Tr(\Theta^TV). \end{aligned} \end{equation*}$

Now, we will have the following equations by taking the derivatives of the Lagrangian function $L$ with respect to $W$ , $V$ and $H$ .

$\begin{equation} \frac{\partial L}{\partial W} = WHH^T-XH^T-\Psi, \end{equation}$

(3.5)

$\begin{equation} \frac{\partial L}{\partial V} = \alpha_1(VH^TH-H)+\alpha_2(V-H)-\Theta, \end{equation}$

(3.6)

and

$\begin{equation} \frac{\partial L}{\partial H} = W^TWH-W^TX+\lambda HD-\lambda HS+\alpha_1 (HV^TV-V)+\alpha_2 (H-V)-\Phi. \end{equation}$

(3.7)

By setting Eqs (3.5)–(3.7) to zero, we can derive the expressions for the Lagrange multipliers:

$\begin{equation} \Psi = WHH^T-XH^T, \end{equation}$

(3.8)

$\begin{equation} \Theta = \alpha_1(VH^TH-H)+\alpha_2(V-H), \end{equation}$

(3.9)

and

$\begin{equation} \Phi = W^TWH-W^TX+\lambda HD-\lambda HS+\alpha_1 (HV^TV-V)+\alpha_2 (H-V). \end{equation}$

(3.10)

Furthermore, at both ends of Eqs (3.8)–(3.10), we multiply them by factors $W_{ik}$ , $V_{kj}$ , and $H_{kj}$ , respectively. With the help of Karush-Kuhn-Tucker (KKT) conditions $\psi_{ik}W_{ik} = 0$ , $\theta_{kj}V_{kj} = 0$ , and $\phi_{kj}H_{kj} = 0$ , we will obtain the following three equations:

$\begin{equation} \psi_{i k}W_{ik} = \left[WHH^T-XH^T \right]_{ik}W_{ik} = 0, \end{equation}$

(3.11)

$\begin{equation} \theta_{kj}V_{kj} = \left[\alpha_1(VH^TH-H)+\alpha_2(V-H) \right]_{kj}V_{kj} = 0, \end{equation}$

(3.12)

$\begin{equation} \phi_{kj}H_{kj} = \left[W^TWH-W^TX+\lambda HD-\lambda HS+\alpha_1 (HV^TV-V)+\alpha_2 (H-V) \right]_{kj}H_{kj} = 0. \end{equation}$

(3.13)

Therefore, based on Eqs (3.11)–(3.13), we derive the iterative formulas for updating $W$ , $V$ , and $H$ in the optimization of the model (3.3):

$\begin{equation} W_{ik}\longleftarrow W_{ik}\frac{\left[XH^T\right]_{ik}}{\left[WHH^T\right]_{ik}}, \end{equation}$

(3.14)

$\begin{equation} V_{kj}\longleftarrow V_{kj}\frac{\left[(\alpha_1+\alpha_2)H \right]_{kj}}{\left[\alpha_1VH^TH+\alpha_2V \right]_{kj}}, \end{equation}$

(3.15)

$\begin{equation} H_{kj}\longleftarrow H_{kj}\frac{\left[W^TX+\lambda HS+(\alpha_1+\alpha_2)V \right]_{kj}}{\left[W^TWH+\lambda HD+\alpha_1HV^TV+\alpha_2H \right]_{kj}}. \end{equation}$

(3.16)

We develop an efficient algorithm to solve model (3.3) in the Algorithm 1. The proof of the convergence of the Algorithm 1 can be found in the Supplementary.

Algorithm 1 GNMFOSV

Require: Original data matrix

$X = \left[x_{1}, \cdots, x_{N}\right] \in R_{+}^{M \times N}$ , the number of clusters

$K$ , orthogonality parameters

$\alpha_1 \geq0, \alpha_2 \geq0$ , regularization parameter

$\lambda\geq0$ , the maximum number of iterations

$T$ ;
Ensure: Locally optimal basis matrix

$W$ and solution coefficient matrix

$H$
1: Initialization randomly choose initial matrices

$W^0 \in R_{+}^{M \times K}$ ,

$V^0 \in R_{+}^{K \times N}$ and

$H^0 \in R_{+}^{K \times N}$ ;
2: Start updating

$t = 1, \cdots, T$ ;
3: Updating

$W$ according to Eq (3.14);
4: Updating

$V$ according to Eq (3.15);
5: Updating

$H$ according to Eq (3.16);
6:

$t = t+1$ ;
7: If the termination conditions are not satisfied or

$t\leq T$ , return step 3; otherwise terminates.

3.2. Complexity analysis of algorithm

In this subsection, we analyze the computational complexity of Algorithm 1, which is the amount of space requirements needed to run the algorithm. The notation $O(n)$ is used to represent the computational complexity of the algorithm and is of the same order as $n$ . Recall that $M$ denotes the number of features, $N$ denotes the number of samples, $K$ denotes the number of clusters, and $p$ denotes the number of nearest neighbors. The computational complexity of $XH^T$ , $WHH^T$ , $W^TX$ , and $W^TWH$ are all $MNK$ . Moreover, the computational complexity of $HD$ and $HS$ are both $N^2K$ , while the computational complexity of $VH^TH$ and $HV^TV$ is $NK^2$ . Besides, we need to provide an additional complexity of $O(N^2M)$ to construct the $p$ -nearest neighbor graph. In conclusion, the computational complexity of each iteration in Algorithm 1 is $O(MNK+N^2M+N^2K+NK^2)$ . During the experiment processing, we found that when conducting algorithm comparison experiments, the differences in time consumption among various algorithms on the author's computer were not significant. Therefore, data on time consumption was not included.

4. Experimental results and analysis

In this section, via lots of experiments, we will verify and study the performance of our algorithm in many aspects, including algorithm parameter selection analysis, clustering effect comparison experiment, and convergence experiment. In order to show these, we have compared the results based on the following datasets.

4.1. Datasets

To verify the performance superiority of the proposed model, we conduct comprehensive experiments on 13 real-world datasets involving face images, digit images, and documents.

$\bullet$ ORL^*: Olivetti research laboratory dataset(ORL). This dataset contains 400 face images collected from 40 individuals and each subject has 10 grayscale images with size 32*32.

^*https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

$\bullet$ AR^[59]: Augmented reality dataset(AR). This dataset contains images of 65 men and 55 women, with each person contributing 14 images, resulting in a total of 1680 images. These images are resized to 32*32 pixels.

$\bullet$ Yale^†: This dataset contains 165 grayscale images collected from 15 individuals. Each individual has 11 images with different facial expressions or configurations: with or without glasses, left-light, center-light, right-light, normal, happy, sad, sleepy and surprised. All image sizes are 32*32.

^†http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html

$\bullet$ Handwritten digit^‡: The dataset consists of 2000 samples collected from 10 numeric characters (0–9).

^‡https://lig-membres.imag.fr/grimal/data.html

$\bullet$ UCI digit^§: This dataset consists of features of handwritten numerals (0–9) from University of California, Irvine(UCI) machine learning repository. They contain 2000 instances belonging to 10 classes.

^§http://archive.ics.uci.edu/dataset/72/multiple+features

$\bullet$ Diag-Bcw^¶: Breast Cancer Wisconsin (denoted as Diag-Bcw) dataset, whereas the features are computed from a digitized image of a fine needle aspirate of a breast mass. It contains 569 documents in 2 categories.

^¶https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

$\bullet$ Caltech101 ^[60]: Caltech101 contains 9144 pictures of objects belonging to 101 categories, including motorbike, airplane, cannon, etc. There are about 40 to 800 images per category but most categories have about 50 images.

$\bullet$ Isolet^||: This dataset was generated as follows: 150 subjects spoke the name of each letter of the alphabet twice. The speakers are grouped into sets of 30 speakers each, and are referred to as isolet1, isolet2, isolet3, isolet4, and isolet5. We select the subset isolet3 in this experiment. It contains 1560 samples in 26 categories.

^||http://archive.ics.uci.edu/dataset/54/isolet

$\bullet$ Vote^**: Congressional Voting Records (denoted as Vote) are derived from vote collections for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the congressional quarterly almanac(CQA). This dataset contains 435 documents in 2 categories.

^**https://archive.ics.uci.edu/dataset/105/congressional+voting+records

$\bullet$ Abalone^††: This dataset corresponds to the physical measurements of abalones and was collected from Marine Resources Division. It contains 2282 documents in 4 categories.

^††https://archive.ics.uci.edu/dataset/1/abalone

A summary of the above datasets is provided in Table 2 for their statistical characteristics.

Table 2. Statistics characteristics of the datasets.

Datasets	Samples (N)	Features (M)	Classes (C)	Data Types
ORL	400	1024	40	Face image
AR	1680	1024	120	Face image
Yale	165	1024	15	Face image
Handwritten digit	2000	649	10	Digit image
UCI-fac	2000	216	10	Digit image
UCI-fou	2000	76	10	Digit image
UCI-kar	2000	64	10	Digit image
Diag-Bcw	569	30	2	Digit image
Caltech7	1474	3766	7	Object image
Caltech20	2386	3766	20	Object image
isolet3	1560	617	26	Spoken document
Vote	435	16	2	Document
Abalone	2282	8	4	Document

| Show Table

DownLoad: CSV

4.2. Algorithms used for comparison

The proposed method is compared with the well-known K-means method and six other popular NMF based algorithms: NMF, ONMF, GNMF, sparse constrained nonnegative matrix factorization with matrix H (SNMF-H), sparse constrained nonnegative matrix factorization with matrix W (SNMF-W), NMFOS. As to the parameter setup of all compared methods, we adopt their default settings if feasible. For a fair comparison, we only consider algorithms based on the Frobenius norm. The description of each algorithm is as follows:

$\bullet$ K-means^[61]: This algorithm operates on the original matrix without extracting or utilizing information contained in the original matrix.

$\bullet$ NMF^[9]: The goal of this algorithm is to find two nonnegative matrices with significantly lower dimensions than the original matrix, while achieving strategies for features extraction and dimensionality reduction.

$\bullet$ ONMF^[24]: Building upon NMF, an orthogonal constraint is introduced to the basis matrix (or coefficient matrix).

$\bullet$ GNMF^[29]: By integrating graph theory and manifold assumptions with the original NMF, this algorithm preserves the intrinsic geometric structure of the data.

$\bullet$ SNMF-W and SNMF-H^[62]: These two algorithms can generate sparse representations and represent the data as a linear combination of a small number of basis vectors.

$\bullet$ RMNMF^[63]: The Robust manifold nonnegative matrix factorization (RMNMF) algorithm combines the $\ell_{2, 1}$ -norm with spectral clustering to enhance robustness to outliers and preserve data geometry, while an orthonormal constraint addresses the uniqueness of the solution.

$\bullet$ KOGNMF^[64]: Kernel-based orthogonal graph regularized NMF (KOGNMF). By integrating graph constraints into the nonlinear NMF framework, this algorithm develops a kernel-based graph-regularized NMF.

$\bullet$ min-vol NMF^[65]: The min-vol NMF algorithm incorporates volume regularization to address rank-deficient basis matrices, making it applicable to real-world problems such as multispectral image unmixing. It utilizes an alternating fast projected gradient method for optimization.

$\bullet$ NMFOS^[48]: Building upon ONMF, this algorithm incorporates an orthogonal constraint on the coefficient matrix as a penalty term in the objective function.

4.3. Settings

To assess clustering performance, we will adopt purity, normalized mutual information(NMI), accuracy(ACC), and rand index(RI) for comparing clustering results, which are widely used in nonnegative learning literature.

In this subsection, we will provide descriptions and definitions of these four indicators, as well as the selection of initial matrices and the stopping criteria for algorithms.

Purity is used for measuring the extent to which each cluster contained data instances from primary one class, and it is computed by

$\begin{equation} Purity = \sum\limits_{i = 1}^{K} \frac{\max _{j}\left(n_{i}^{j}\right)}{N}. \end{equation}$

(4.1)

The notation $n_{i}^{j}$ represents the count of occurrences where the $j$ -th input class is assigned to the $i$ -th cluster, with $N$ denoting the total number of samples. The described process involves assigning a class to each cluster, and the samples for this class that appear most frequently within the cluster. Therefore, its range is $[0, 1]$ and a higher value implies better clustering performance.

NMI is commonly employed in clustering to measure the similarity between two clustering results. Its range is $[0, 1]$ , with higher values indicating greater similarity between the two clustering results ^[63]. NMI is defined by the following formula:

$\begin{equation} NMI\left(C, C^{\prime}\right) = \frac{M I\left(C, C^{\prime}\right)}{\max \left(H(C), H\left(C^{\prime}\right)\right)}. \end{equation}$

(4.2)

Here, $C$ and $C^{\prime}$ represent a set of true labels and a set of clustering labels, respectively, and $H(\cdot)$ denotes the entropy function.

ACC is used for discovering the one-to-one relationship between classes and clusters, along with measuring the extent to which each cluster's contained data instances from the corresponding class. The range of ACC values is from 0 to 1, where a higher value indicates better clustering results ^[66]. The expression for ACC is defined as follows:

$\begin{equation} A C C = \frac{\sum _{i = 1}^{N} \delta\left(c_{i}, \operatorname{map}\left(c_{j}^{\prime}\right)\right)}{N}, \end{equation}$

(4.3)

where $N$ represents the total number of samples, and $\operatorname{map}(\cdot)$ denotes the mapping of the computed clustering label $c_{j}^{\prime}$ to the true clustering label $c_{i}$ . The definition of $\delta$ is as follows:

$\begin{equation} \delta(i, j) = \left\{\begin{array}{ll} 1, & i = j, \\ 0, & \text {others}. \end{array}\right. \end{equation}$

(4.4)

RI is a clustering evaluation indicator that measures the similarity between two clusters. It determines the agreement between the pairwise relationships of data points in the true cluster and the obtained cluster. RI is calculated by using the following formula:

$\begin{equation} RI = \frac{{a + b}}{{\binom{n}{2}}}, \end{equation}$

(4.5)

where $a$ represents the number of pairwise agreements between data points assigned to the same cluster in both the true and obtained clusterings, $b$ represents the number of pairwise agreements between data points assigned to different clusters in both the true and obtained clusterings, and $n$ denotes the total number of data points, with ${\binom{n}{2}}$ being the combinatorial number.

By unifying (4.1)–(4.3) and (4.5) together, the definitions of all considered clustering indicators are given in Table 3.

Table 3. Definitions of all considered clustering metrics.

Metric	Definition	Range
Purity	$\sum_{i=1}^{K} \frac{\max _{j}\left(n_{i}^{j}\right)}{N}$	$[0, 1]$
Normalized Mutual Information (NMI)	$\frac{M I\left(C, C^{\prime}\right)}{\max \left(H(C), H\left(C^{\prime}\right)\right)}$	$[0, 1]$
Accuracy (ACC)	$\frac{\sum_{i=1}^{N} \delta\left(c_{i}, \operatorname{map}\left(c_{j}^{\prime}\right)\right)}{N}$	$[0, 1]$
Rand Index(RI)	$\frac{{a + b}}{{\binom{n}{2}}}$	$[0, 1]$

| Show Table

DownLoad: CSV

In the clustering experiment, we use the number of clusters $K$ in the dataset to be equal to the maximum number of categories $C$ in the target dataset. Due to the sensitivity of most methods to initial values, for experimental accuracy, we conducted 10 separate runs for all algorithms and calculate their mean and standard deviation. As for the stopping criterion, we only needed to set a maximum number $T = 100$ of iterations.

The experiments were conducted on a computing system operating under Windows 10 Home Edition, 64-bit (version 10.0, Build 19045). The hardware configuration comprises an Intel(R) Core(TM) i5-7300HQ CPU, functioning at a clock speed of 2.50 GHz with four cores, and is supplemented by 8 GB of RAM. The programming environment employed for the implementation of the algorithms is MATLAB R2015b.

4.4. Parameter sensitivity

In this subsection, we will discuss the sensitivity of the parameters.

There are four parameters in our model: the regularization parameter $\lambda$ , the orthogonal penalty parameters $\alpha_1$ , the orthogonal penalty parameters $\alpha_2$ , and the nearest neighbor point $p$ . To reveal the impact of these terms on the performance of the model, we select a batch of parameters such that $\lambda$ , $\alpha_1$ , and $\alpha_2$ are all in the range of $\left\{ 10^{-3}, 10^{-2}, 10^{-1}, 10^{0}, 10^{1}, 10^{2}, 10^{3}\right\}$ . For the parameters $\lambda$ , $\alpha_1$ and $\alpha_2$ , we employed NMI as the evaluation metric to conduct a parameter sensitivity experiment on the ORL dataset. The number of iterations is empirically fixed at 100 times. As stated above, the 0–1 weighting is used for computing the weighting graph in all experiments if not stated otherwise. plots the NMI of the ORL dataset with different parameters. The $x-axis$ represents $\alpha_2$ , the $y-axis$ represents $\alpha_1$ , and the $z-axis$ represents NMI. As shown in , algorithm performance changes with variations in three parameters. For example, we have the following observations that the proposed GNMFOSV algorithm obtains superior performance when $\lambda = 100$ , $\alpha_1$ varies in $\left\{ 10^{-2}, 10^{-1}, 10^{0}, 10^{1}\right\}$ , and achieves consistent good performance regardless of the range of $\alpha_2$ . The specific analysis of Figure 2 is presented in the following paragraphs.

Figure 2. Parameter sensitivity on the ORL dataset, where NMI of the proposed method are reported while fixing the nearest neighbor point

$p$ and clustering number

$K$ .

DownLoad: Full-Size Img PowerPoint

First, a small $\lambda$ and a large $\alpha_1$ may deteriorate the clustering performance. For example, from , GNMFOSV performs poorly when $\alpha_1$ is greater than 100. However, when the value of $\alpha_1$ is in a smaller range, the performance of GNMFOSV is very stable and has significant advantages. So, the value of $\alpha_1$ will be set to $\left\{ 10^{-2}, 10^{-1}, 10^{0}, 10^{1}\right\}$ in the following experiments.

Moreover, when $\lambda\! = \!100$ , the best results appear for $\alpha_1\! = \!0.01$ , and $\alpha_2\! = \!1000$ and this reflects that a small $\lambda$ and a large $\alpha_1$ are needed. Then, by experiments, we will use $\alpha_1\! = \!0.01$ , $\alpha_2\! = \!1000$ , and $\lambda\! = \!100$ .

Second, to choose nearest neighbor point $p$ when $\alpha_1 = 0.01$ , $\alpha_2 = 1000$ , and $\lambda = 100$ . The clustering results of NMI with respect to the neighborhood size $p$ varies from 1 to 10 are shown in . Based on the ORL dataset, GNMFOSV algorithm arrives at the good NMI for the ORL dataset when $p$ varies from 2 to 5. Other indicators and datasets are similar.

Figure 3. The clustering results of NMI on ORL dataset with respect to the neighborhood size

$p$ .

DownLoad: Full-Size Img PowerPoint

In summary, our experiments (all datasets are analyzed) informed us that the parameter selection results are $p = 3$ , $\alpha_1 = 0.01$ , $\alpha_2 = 1000$ , and $\lambda = 100$ . These selected parameter values will be used in numerical experiments for clustering throughout the entire paper. This means that the robustness of the proposed method can be guaranteed when the parameters are selected in an appropriate range. These observations also indicate that the GNMFOSV algorithm is sensitive to the values of parameters, which provides corresponding moderation for practical applications.

At the same time, we also conducted parameter sensitivity tests on 12 other datasets. We found that the model proposed in this paper demonstrates excellent robustness. Regrettably, constrained by the page limit, only the experimental results of the ORL dataset are presented.

4.5. Ablation study

In the ablation study of this subsection, we choose $p = 3$ , $\alpha_1 = 0.01$ , $\alpha_2 = 1000$ , and $\lambda = 100$ as parameter selection of GNMFOSV. The ablation study means that the lost functions of the proposed GNMFOSV are studied by experiments of removing different parts. The combinations of each part in (3.3) are denoted as follows.

$\bullet$ Baseline(NMF): The loss function composed of only the NMF initial item(the first item) in (3.3) is set as the baseline, which is

$\begin{equation*} \min J(W,H) = \min\limits_{W,H} \frac{1}{2}\|X-WH\|_F^2\; s.t.\; W\geq0,H\geq0. \end{equation*}$

$\bullet$ GNMF: The loss function composed of the NMF term and the second term in Eq (3.3), which is

$\begin{equation*} \min J(W,H) = \min\limits_{W,H} \frac{1}{2}\|X-WH\|_F^2+\frac{\lambda }{2}Tr(HLH^T) \; s.t.\; W\geq0,H\geq0. \end{equation*}$

$\bullet$ NMFOSV: The loss function, consisting of the NMF term along with the third and fourth terms in Eq (3.3),

$\begin{equation*} \min J(W,H,V) = \min\limits_{W,H,V} \frac{1}{2}\|X-WH\|_F^2+\frac{\alpha_1} {2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2\; s.t.\; W\geq0,H\geq0,V\geq0. \end{equation*}$

$\bullet$ GNMFOSV: The loss function of Eq (3.3), which is indeed the proposed algorithm in this paper.

Here $X = [x_1, \cdots, x_N]\in R_{+}^{M\times N}$ , $W = [w_1, \cdots, w_M]\in R_{+}^{M\times K}$ , $H = [h_1, \cdots, h_N]\in R_{+}^{K\times N}$ , $L \in R_{+}^{N\times N}$ , $I$ is the identity matrix with $K$ rows and $K$ columns. The clustering performance values presented in Tables 4-6 demonstrate that, all core modules of the proposed image clustering model are indispensable. The experimental results indicate that the integration of graph regularization methods, orthogonal constraints, and the auxiliary variable mechanism can significantly enhance clustering accuracy and stability. Outperforming comparative methods on multiple datasets, this validates the rationality and innovativeness of the proposed model.

Table 4. Results (mean

$\pm$ standard deviation) on AR.

Metric	NMF	GNMF	NMFOSV	GNMFOSV
Purity	0.5262 $\pm$ 0.0166	0.4824 $\pm$ 0.0161	0.5103 $\pm$ 0.0131	0.5718 $\pm$ 0.0147
NMI	0.7636 $\pm$ 0.0090	0.7268 $\pm$ 0.0098	0.7563 $\pm$ 0.0101	0.8011 $\pm$ 0.0085
ACC	0.4904 $\pm$ 0.0154	0.4464 $\pm$ 0.0187	0.4562 $\pm$ 0.0092	0.5438 $\pm$ 0.0191

| Show Table

DownLoad: CSV

Table 5. Results (mean

$\pm$ standard deviation) on Yale.

Metric	NMF	GNMF	NMFOSV	GNMFOSV
Purity	0.4024 $\pm$ 0.0294	0.3909 $\pm$ 0.0103	0.3972 $\pm$ 0.0083	0.4661 $\pm$ 0.0153
NMI	0.4381 $\pm$ 0.0161	0.4329 $\pm$ 0.0146	0.4412 $\pm$ 0.0261	0.5024 $\pm$ 0.0175
ACC	0.3872 $\pm$ 0.0355	0.3757 $\pm$ 0.0151	0.3810 $\pm$ 0.0113	0.6459 $\pm$ 0.0163

| Show Table

DownLoad: CSV

Table 6. Results (mean

$\pm$ standard deviation) on Caltech20.

Metric	NMF	GNMF	NMFOSV	GNMFOSV
Purity	0.5547 $\pm$ 0.0133	0.5428 $\pm$ 0.0028	0.5582 $\pm$ 0.0121	0.6114 $\pm$ 0.0185
NMI	0.2910 $\pm$ 0.0110	0.2731 $\pm$ 0.0044	0.2841 $\pm$ 0.0076	0.3816 $\pm$ 0.0130
ACC	0.2767 $\pm$ 0.0191	0.2403 $\pm$ 0.0095	0.2811 $\pm$ 0.0126	0.3465 $\pm$ 0.0170

| Show Table

DownLoad: CSV

4.6. Clustering results and analysis

In this subsection, extensive experimental results are displayed to validate the effectiveness of the proposed model. The experimental results of GNMFOSV with other comparison algorithms are given in Tables 7-19, and each result is the average of 10 runs after turning the parameters. The clustering results of all comparison algorithms are evaluated according to the former 4 indicators. The best result for each indicator is marked in bold. "-" indicates that clustering results could not be obtained or there was insufficient memory.

Table 7. Results (mean

$\pm$ standard deviation) on ORL.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.6140¡À0.0350	0.6350 $\pm$ 0.0302	0.6180 $\pm$ 0.0399	0.6540 $\pm$ 0.0022	0.6431 $\pm$ 0.0292	0.6020 $\pm$ 0.0205	0.5432 $\pm$ 0.0211	0.1407 $\pm$ 0.0039	0.5199 $\pm$ 0.0325	0.5730 $\pm$ 0.0309	0.7185 $\pm$ 0.0328
NMI	0.6082 $\pm$ 0.0214	0.5967 $\pm$ 0.0146	0.5927 $\pm$ 0.0233	0.6171 $\pm$ 0.0015	0.5937 $\pm$ 0.0213	0.5930 $\pm$ 0.0139	0.4951 $\pm$ 0.0207	0.0130 $\pm$ 0.0017	0.5417 $\pm$ 0.0179	0.5597 $\pm$ 0.0179	0.6572 $\pm$ 0.0152
ACC	0.5875 $\pm$ 0.0518	0.6080 $\pm$ 0.0338	0.5785 $\pm$ 0.0444	0.6065 $\pm$ 0.0022	0.6415 $\pm$ 0.0368	0.5805 $\pm$ 0.0270	0.5119 $\pm$ 0.0146	0.1357 $\pm$ 0.0027	0.5058 $\pm$ 0.0406	0.5375 $\pm$ 0.0359	0.7185 $\pm$ 0.0411
RI	0.9008 $\pm$ 0.0081	0.9020 $\pm$ 0.0041	0.8919 $\pm$ 0.0068	0.9097 $\pm$ 0.0005	0.9002 $\pm$ 0.0050	0.8969 $\pm$ 0.0039	0.4001 $\pm$ 0.0110	0.1014 $\pm$ 0.0008	0.8801 $\pm$ 0.0065	0.8827 $\pm$ 0.0055	0.9165 $\pm$ 0.0058

| Show Table

DownLoad: CSV

Table 8. Results (mean

$\pm$ standard deviation) on AR.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.4222 $\pm$ 0.0135	0.5262 $\pm$ 0.0166	0.4876 $\pm$ 0.0171	0.4824 $\pm$ 0.0161	0.5258 $\pm$ 0.0199	0.5387 $\pm$ 0.0130	0.5593 $\pm$ 0.0191	0.4190 $\pm$ 0.0153	0.2874 $\pm$ 0.0092	0.5095 $\pm$ 0.0100	0.5718 $\pm$ 0.0147
NMI	0.6992 $\pm$ 0.0059	0.7636 $\pm$ 0.0090	0.7289 $\pm$ 0.0131	0.7268 $\pm$ 0.0098	0.7536 $\pm$ 0.0125	0.7707 $\pm$ 0.0077	0.7557 $\pm$ 0.0127	0.6722 $\pm$ 0.0078	0.6084 $\pm$ 0.0081	0.7633 $\pm$ 0.0074	0.8011 $\pm$ 0.0085
ACC	0.3883 $\pm$ 0.0154	0.4904 $\pm$ 0.0154	0.4511 $\pm$ 0.0193	0.4464 $\pm$ 0.0187	0.4873 $\pm$ 0.0201	0.5011 $\pm$ 0.0136	0.4973 $\pm$ 0.0206	0.3860 $\pm$ 0.0156	0.2629 $\pm$ 0.0118	0.4666 $\pm$ 0.0138	0.5438 $\pm$ 0.0191
RI	0.9867 $\pm$ 0.0003	0.9885 $\pm$ 0.0004	0.9874 $\pm$ 0.0005	0.9865 $\pm$ 0.0005	0.9879 $\pm$ 0.0005	0.9885 $\pm$ 0.0003	0.9865 $\pm$ 0.0009	0.9836 $\pm$ 0.0013	0.9843 $\pm$ 0.0005	0.9878 $\pm$ 0.0004	0.9896 $\pm$ 0.0006

| Show Table

DownLoad: CSV

Table 9. Results (mean

$\pm$ standard deviation) on Yale.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.4079 $\pm$ 0.0346	0.4024 $\pm$ 0.0294	0.4055 $\pm$ 0.0181	0.3909 $\pm$ 0.0103	0.4018 $\pm$ 0.0277	0.4042 $\pm$ 0.0228	0.3285 $\pm$ 0.0251	0.4067 $\pm$ 0.0194	0.4430 $\pm$ 0.0283	0.3964 $\pm$ 0.0333	0.4661 $\pm$ 0.0153
NMI	0.4408 $\pm$ 0.0392	0.4381 $\pm$ 0.0161	0.4502 $\pm$ 0.0206	0.4329 $\pm$ 0.0146	0.4361 $\pm$ 0.0188	0.4393 $\pm$ 0.0227	0.3477 $\pm$ 0.0259	0.4566 $\pm$ 0.0231	0.4573 $\pm$ 0.0213	0.4353 $\pm$ 0.0306	0.5024 $\pm$ 0.0175
ACC	0.3836 $\pm$ 0.0333	0.3872 $\pm$ 0.0355	0.3860 $\pm$ 0.0217	0.3757 $\pm$ 0.0151	0.3793 $\pm$ 0.0325	0.3800 $\pm$ 0.0336	0.3072 $\pm$ 0.0221	0.3903 $\pm$ 0.0162	0.4200 $\pm$ 0.0281	0.3793 $\pm$ 0.0377	0.4593 $\pm$ 0.0163
RI	0.8785 $\pm$ 0.0164	0.8975 $\pm$ 0.0062	0.8980 $\pm$ 0.0043	0.8974 $\pm$ 0.0037	0.8956 $\pm$ 0.0045	0.8958 $\pm$ 0.0051	0.8751 $\pm$ 0.0061	0.8810 $\pm$ 0.0083	0.8963 $\pm$ 0.0050	0.8973 $\pm$ 0.0045	0.9014 $\pm$ 0.0025

| Show Table

DownLoad: CSV

Table 10. Results (mean

$\pm$ standard deviation) on Handwritten digit.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.5461 $\pm$ 0.0065	0.5382 $\pm$ 0.0299	0.5358 $\pm$ 0.0160	0.5521 $\pm$ 0.0060	0.5030 $\pm$ 0.0271	0.5537 $\pm$ 0.0156	0.4924 $\pm$ 0.0249	0.3105 $\pm$ 0.0102	0.4701 $\pm$ 0.0048	0.5583 $\pm$ 0.0445	0.6010 $\pm$ 0.0216
NMI	0.5276 $\pm$ 0.0098	0.5162 $\pm$ 0.0260	0.5063 $\pm$ 0.0118	0.5563 $\pm$ 0.0052	0.5061 $\pm$ 0.0298	0.5070 $\pm$ 0.0133	0.4629 $\pm$ 0.0158	0.2847 $\pm$ 0.0106	0.4663 $\pm$ 0.0041	0.5210 $\pm$ 0.0256	0.6128 $\pm$ 0.0205
ACC	0.4865 $\pm$ 0.0650	0.5450 $\pm$ 0.0449	0.5220 $\pm$ 0.0520	0.5900 $\pm$ 0.0476	0.4510 $\pm$ 0.0340	0.5125 $\pm$ 0.0527	0.4531 $\pm$ 0.0258	0.2970 $\pm$ 0.0165	0.4189 $\pm$ 0.0162	0.5015 $\pm$ 0.0549	0.6075 $\pm$ 0.0748
RI	0.8679 $\pm$ 0.0086	0.8688 $\pm$ 0.0068	0.8786 $\pm$ 0.0017	0.8737 $\pm$ 0.0005	0.8578 $\pm$ 0.0115	0.8665 $\pm$ 0.0053	-	0.8432 $\pm$ 0.0043	-	0.8529 $\pm$ 0.0110	0.8913 $\pm$ 0.0030

| Show Table

DownLoad: CSV

Table 11. Results (mean

$\pm$ standard deviation) on UCI-fac.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.6095 $\pm$ 0.0233	0.5316 $\pm$ 0.0365	0.5812 $\pm$ 0.0235	0.5934 $\pm$ 0.0057	0.5284 $\pm$ 0.0204	0.5320 $\pm$ 0.0185	0.5342 $\pm$ 0.0256	0.1362 $\pm$ 0.0038	0.3986 $\pm$ 0.0692	0.5427 $\pm$ 0.0247	0.6355 $\pm$ 0.0232
NMI	0.5809 $\pm$ 0.0165	0.4674 $\pm$ 0.0282	0.5283 $\pm$ 0.0185	0.5349 $\pm$ 0.0043	0.4658 $\pm$ 0.0136	0.4682 $\pm$ 0.0155	0.4491 $\pm$ 0.0186	0.0097 $\pm$ 0.0016	0.3376 $\pm$ 0.0605	0.4695 $\pm$ 0.0186	0.6028 $\pm$ 0.0136
ACC	0.5855 $\pm$ 0.0462	0.5119 $\pm$ 0.0432	0.5478 $\pm$ 0.0285	0.5597 $\pm$ 0.0057	0.4965 $\pm$ 0.0340	0.5107 $\pm$ 0.0196	0.5040 $\pm$ 0.0345	0.1324 $\pm$ 0.0045	0.3730 $\pm$ 0.0656	0.5232 $\pm$ 0.0267	0.6118 $\pm$ 0.0276
RI	0.8940 $\pm$ 0.0057	0.8760 $\pm$ 0.0074	0.8856 $\pm$ 0.0059	0.8869 $\pm$ 0.0008	0.8741 $\pm$ 0.0036	0.8754 $\pm$ 0.0032	0.8682 $\pm$ 0.0076	0.8202 $\pm$ 0.0003	0.0484 $\pm$ 0.8464	0.8768 $\pm$ 0.0045	0.8966 $\pm$ 0.0052

| Show Table

DownLoad: CSV

Table 12. Results (mean

$\pm$ standard deviation) on UCI-fou.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.6140 $\pm$ 0.0350	0.6350 $\pm$ 0.0302	0.6180 $\pm$ 0.0399	0.6540 $\pm$ 0.0022	0.6431 $\pm$ 0.0292	0.6020 $\pm$ 0.0205	0.5432 $\pm$ 0.0211	0.1407 $\pm$ 0.0039	0.5199 $\pm$ 0.0325	0.5730 $\pm$ 0.0309	0.7185 $\pm$ 0.0328
NMI	0.6082 $\pm$ 0.0214	0.5967 $\pm$ 0.0146	0.5927 $\pm$ 0.0233	0.6171 $\pm$ 0.0015	0.5937 $\pm$ 0.0213	0.5930 $\pm$ 0.0139	0.4951 $\pm$ 0.0207	0.0130 $\pm$ 0.0017	0.5417 $\pm$ 0.0179	0.5597 $\pm$ 0.0179	0.6572 $\pm$ 0.0152
ACC	0.5875 $\pm$ 0.0518	0.6080 $\pm$ 0.0338	0.5785 $\pm$ 0.0444	0.6065 $\pm$ 0.0022	0.6415 $\pm$ 0.0368	0.5805 $\pm$ 0.0270	0.5119 $\pm$ 0.0146	0.1357 $\pm$ 0.0027	0.5058 $\pm$ 0.0406	0.5375 $\pm$ 0.0359	0.7185 $\pm$ 0.0411
RI	0.9008 $\pm$ 0.0081	0.9020 $\pm$ 0.0041	0.8919 $\pm$ 0.0068	0.9097 $\pm$ 0.0005	0.9002 $\pm$ 0.0050	0.8969 $\pm$ 0.0039	0.4001 $\pm$ 0.0110	0.1014 $\pm$ 0.0008	0.8801 $\pm$ 0.0065	0.8827 $\pm$ 0.0055	0.9165 $\pm$ 0.0058

| Show Table

DownLoad: CSV

Table 13. Results (mean

$\pm$ standard deviation) on UCI-kar.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.6170 $\pm$ 0.0347	0.6990 $\pm$ 0.0240	0.6085 $\pm$ 0.0268	0.6085 $\pm$ 0.0172	0.6650 $\pm$ 0.0352	0.6910 $\pm$ 0.0474	0.5809 $\pm$ 0.0384	0.2394 $\pm$ 0.0123	0.6721 $\pm$ 0.0408	0.6145 $\pm$ 0.0507	0.7759 $\pm$ 0.0247
NMI	0.5258 $\pm$ 0.0267	0.6286 $\pm$ 0.0208	0.5186 $\pm$ 0.0215	0.5553 $\pm$ 0.0104	0.6438 $\pm$ 0.0311	0.6590 $\pm$ 0.0286	0.5110 $\pm$ 0.0238	0.1495 $\pm$ 0.0128	0.6244 $\pm$ 0.0328	0.6339 $\pm$ 0.0392	0.7212 $\pm$ 0.0245
ACC	0.5890 $\pm$ 0.0469	0.6675 $\pm$ 0.0310	0.5760 $\pm$ 0.0318	0.5350 $\pm$ 0.0237	0.6330 $\pm$ 0.0436	0.6910 $\pm$ 0.0564	0.5567 $\pm$ 0.0443	0.2243 $\pm$ 0.0098	0.6402 $\pm$ 0.0609	0.5410 $\pm$ 0.0591	0.7480 $\pm$ 0.0317
RI	0.8877 $\pm$ 0.0088	0.9125 $\pm$ 0.0054	0.8962 $\pm$ 0.0056	0.8994 $\pm$ 0.0029	0.8597 $\pm$ 0.0088	0.9084 $\pm$ 0.0087	0.8814 $\pm$ 0.0090	0.8288 $\pm$ 0.0023	0.9028 $\pm$ 0.0131	0.9142 $\pm$ 0.0098	0.9266 $\pm$ 0.0070

| Show Table

DownLoad: CSV

Table 14. Results (mean

$\pm$ standard deviation) on Diag-Bcw.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.8799 $\pm$ 0.0008	0.7374 $\pm$ 0.0993	0.8660 $\pm$ 0.0059	0.8365 $\pm$ 0.0370	0.7179 $\pm$ 0.1090	0.6769 $\pm$ 0.1045	0.7840 $\pm$ 0.0195	0.6274 $\pm$ 0.0000	0.8746 $\pm$ 0.0008	0.7079 $\pm$ 0.1047	0.8922 $\pm$ 0.0125
NMI	0.4652 $\pm$ 0.0029	0.1862 $\pm$ 0.1602	0.4076 $\pm$ 0.0173	0.3666 $\pm$ 0.0233	0.1684 $\pm$ 0.1837	0.1010 $\pm$ 0.1776	0.2319 $\pm$ 0.0245	0.0007 $\pm$ 0.0007	0.4320 $\pm$ 0.0020	0.1434 $\pm$ 0.1796	0.4761 $\pm$ 0.0383
ACC	0.8822 $\pm$ 0.0008	0.7165 $\pm$ 0.1272	0.8660 $\pm$ 0.0059	0.8365 $\pm$ 0.0392	0.6869 $\pm$ 0.1389	0.6050 $\pm$ 0.1476	0.7840 $\pm$ 0.0195	0.5158 $\pm$ 0.0114	0.8746 $\pm$ 0.0008	0.6720 $\pm$ 0.1379	0.8957 $\pm$ 0.0125
RI	0.7918 $\pm$ 0.0013	0.6222 $\pm$ 0.1087	0.7676 $\pm$ 0.0086	0.7260 $\pm$ 0.0090	0.6039 $\pm$ 0.1200	0.5605 $\pm$ 0.1168	0.6614 $\pm$ 0.0215	0.4998 $\pm$ 0.0009	0.7804 $\pm$ 0.0012	0.5927 $\pm$ 0.1166	0.8080 $\pm$ 0.0195

| Show Table

DownLoad: CSV

Table 15. Results (mean

$\pm$ standard deviation) on Caltech7.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.7818 $\pm$ 0.0021	0.7898 $\pm$ 0.0102	0.7794 $\pm$ 0.0046	0.7824 $\pm$ 0.0010	0.7875 $\pm$ 0.0109	0.7822 $\pm$ 0.0178	0.7787 $\pm$ 0.0077	0.5467 $\pm$ 0.0086	0.5559 $\pm$ 0.0025	0.7818 $\pm$ 0.0093	0.8110 $\pm$ 0.0066
NMI	0.2426 $\pm$ 0.0062	0.2688 $\pm$ 0.0168	0.2415 $\pm$ 0.0075	0.2423 $\pm$ 0.0007	0.2727 $\pm$ 0.0181	0.2685 $\pm$ 0.0153	0.2467 $\pm$ 0.0146	0.0193 $\pm$ 0.0075	0.0563 $\pm$ 0.0013	0.2631 $\pm$ 0.0177	0.3482 $\pm$ 0.0094
ACC	0.3390 $\pm$ 0.0162	0.3968 $\pm$ 0.0441	0.3352 $\pm$ 0.0121	0.3215 $\pm$ 0.0019	0.4233 $\pm$ 0.0490	0.4119 $\pm$ 0.0384	0.3616 $\pm$ 0.0214	0.1978 $\pm$ 0.0107	0.2421 $\pm$ 0.0087	0.4216 $\pm$ 0.0499	0.4548 $\pm$ 0.0290
RI	0.6620 $\pm$ 0.0034	0.6790 $\pm$ 0.0131	0.6618 $\pm$ 0.0021	0.6606 $\pm$ 0.0004	0.6834 $\pm$ 0.0138	0.6803 $\pm$ 0.0096	0.6544 $\pm$ 0.0052	0.5856 $\pm$ 0.0021	0.5890 $\pm$ 0.0004	0.6695 $\pm$ 0.0142	0.6812 $\pm$ 0.0072

| Show Table

DownLoad: CSV

Table 16. Results (mean

$\pm$ standard deviation) on Caltech20.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.5539 $\pm$ 0.0083	0.5547 $\pm$ 0.0133	0.5369 $\pm$ 0.0091	0.5428 $\pm$ 0.0028	0.5496 $\pm$ 0.0123	0.5636 $\pm$ 0.0116	0.5439 $\pm$ 0.0123	0.3388 $\pm$ 0.0037	0.3497 $\pm$ 0.0016	0.5579 $\pm$ 0.0143	0.6114 $\pm$ 0.0185
NMI	0.2859 $\pm$ 0.0105	0.2910 $\pm$ 0.0110	0.2650 $\pm$ 0.0051	0.2731 $\pm$ 0.0044	0.2837 $\pm$ 0.0087	0.2934 $\pm$ 0.0107	0.2746 $\pm$ 0.0114	0.0467 $\pm$ 0.0041	0.0788 $\pm$ 0.0012	0.2922 $\pm$ 0.0157	0.3816 $\pm$ 0.0130
ACC	0.2485 $\pm$ 0.0140	0.2767 $\pm$ 0.0191	0.2481 $\pm$ 0.0113	0.2403 $\pm$ 0.0095	0.2772 $\pm$ 0.0197	0.2739 $\pm$ 0.0127	0.2623 $\pm$ 0.0189	0.0949 $\pm$ 0.0023	0.1099 $\pm$ 0.0020	0.2753 $\pm$ 0.0131	0.3465 $\pm$ 0.0170
RI	0.8280 $\pm$ 0.0019	0.8300 $\pm$ 0.0028	0.8272 $\pm$ 0.0012	0.8272 $\pm$ 0.0015	0.8299 $\pm$ 0.0018	0.8306 $\pm$ 0.0022	-	0.8042 $\pm$ 0.0002	0.8023 $\pm$ 0.0014	0.8302 $\pm$ 0.0024	0.8352 $\pm$ 0.0025

| Show Table

DownLoad: CSV

Table 17. Results (mean

$\pm$ standard deviation) on isolet3.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.5759 $\pm$ 0.0210	0.5800 $\pm$ 0.0275	0.5807 $\pm$ 0.0188	0.5391 $\pm$ 0.0219	0.5908 $\pm$ 0.0216	0.5665 $\pm$ 0.0209	0.5064 $\pm$ 0.0260	0.4964 $\pm$ 0.0266	0.5435 $\pm$ 0.0338	0.5839 $\pm$ 0.0240	0.6227 $\pm$ 0.0239
NMI	0.7161 $\pm$ 0.0105	0.6989 $\pm$ 0.0156	0.6947 $\pm$ 0.0130	0.7129 $\pm$ 0.0119	0.7068 $\pm$ 0.0131	0.6879 $\pm$ 0.0146	0.6209 $\pm$ 0.0200	0.6304 $\pm$ 0.0142	0.6766 $\pm$ 0.0189	0.7008 $\pm$ 0.0162	0.7392 $\pm$ 0.0129
ACC	0.5312 $\pm$ 0.0205	0.5505 $\pm$ 0.0301	0.5438 $\pm$ 0.0212	0.4820 $\pm$ 0.0284	0.5562 $\pm$ 0.0281	0.5300 $\pm$ 0.0236	0.4787 $\pm$ 0.0289	0.4631 $\pm$ 0.0213	0.5016 $\pm$ 0.0317	0.5507 $\pm$ 0.0285	0.5866 $\pm$ 0.0257
RI	0.9575 $\pm$ 0.0013	0.9589 $\pm$ 0.0020	0.9583 $\pm$ 0.0017	0.9448 $\pm$ 0.0017	0.9594 $\pm$ 0.0028	0.9563 $\pm$ 0.0026	0.9520 $\pm$ 0.0020	0.9497 $\pm$ 0.0020	0.9534 $\pm$ 0.0035	0.9589 $\pm$ 0.0025	0.9612 $\pm$ 0.0020

| Show Table

DownLoad: CSV

Table 18. Results (mean

$\pm$ standard deviation) on Vote.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.8712 $\pm$ 0.0022	0.8664 $\pm$ 0.0020	0.8667 $\pm$ 0.0000	0.8667 $\pm$ 0.0000	0.8667 $\pm$ 0.0000	0.6161 $\pm$ 0.0000	0.8791 $\pm$ 0.0022	0.7094 $\pm$ 0.0540	0.8667 $\pm$ 0.0000	0.6161 $\pm$ 0.0000	0.8824 $\pm$ 0.0016
NMI	0.4592 $\pm$ 0.0037	0.4393 $\pm$ 0.0075	0.4375 $\pm$ 0.0000	0.4375 $\pm$ 0.0000	0.4440 $\pm$ 0.0000	0.0032 $\pm$ 0.0000	0.4420 $\pm$ 0.0045	0.1426 $\pm$ 0.0726	0.4440 $\pm$ 0.0000	0.0032 $\pm$ 0.0000	0.4680 $\pm$ 0.0045
ACC	0.8712 $\pm$ 0.0022	0.8664 $\pm$ 0.0020	0.8666 $\pm$ 0.0000	0.8666 $\pm$ 0.0000	0.8666 $\pm$ 0.0000	0.6160 $\pm$ 0.0000	0.8790 $\pm$ 0.0022	0.7000 $\pm$ 0.0729	0.8666 $\pm$ 0.0000	0.6160 $\pm$ 0.0000	0.8824 $\pm$ 0.0016
RI	0.7751 $\pm$ 0.0033	0.7680 $\pm$ 0.0029	0.7683 $\pm$ 0.0000	0.7683 $\pm$ 0.0000	0.7683 $\pm$ 0.0000	0.5258 $\pm$ 0.0000	0.7869 $\pm$ 0.0033	0.5886 $\pm$ 0.0464	0.7683 $\pm$ 0.0000	0.5258 $\pm$ 0.0000	0.7768 $\pm$ 0.0024

| Show Table

DownLoad: CSV

Table 19. Results (mean

$\pm$ standard deviation) on Abalone.

Metric	K-means	NMF	ONMF	GNMF	SNMF-H	SNMF-W	RNMF	KOGNMF	min-vol NMF	NMFOS	GNMFOSV
Purity	0.3624 $\pm$ 0.0000	0.3537 $\pm$ 0.0132	0.3582 $\pm$ 0.0119	0.3501 $\pm$ 0.0035	0.3575 $\pm$ 0.0099	0.3511 $\pm$ 0.0141	0.3849 $\pm$ 0.0154	0.3053 $\pm$ 0.0030	0.3553 $\pm$ 0.0014	0.3588 $\pm$ 0.0044	0.3834 $\pm$ 0.0010
NMI	0.0703 $\pm$ 0.0002	0.0628 $\pm$ 0.0140	0.0671 $\pm$ 0.0144	0.0595 $\pm$ 0.0010	0.0664 $\pm$ 0.0111	0.0605 $\pm$ 0.0141	0.0834 $\pm$ 0.0091	0.0017 $\pm$ 0.0006	0.0639 $\pm$ 0.0015	0.0687 $\pm$ 0.0069	0.0810 $\pm$ 0.0009
ACC	0.3312 $\pm$ 0.0011	0.3373 $\pm$ 0.0118	0.3413 $\pm$ 0.0131	0.3326 $\pm$ 0.0066	0.3405 $\pm$ 0.0146	0.3385 $\pm$ 0.0122	0.3526 $\pm$ 0.0118	0.2686 $\pm$ 0.0049	0.3242 $\pm$ 0.0020	0.3414 $\pm$ 0.0062	0.3745 $\pm$ 0.0026
RI	0.5972 $\pm$ 0.0004	0.6099 $\pm$ 0.0094	0.6175 $\pm$ 0.0088	0.5629 $\pm$ 0.0085	0.6055 $\pm$ 0.0222	0.6081 $\pm$ 0.0160	0.6308 $\pm$ 0.0054	0.6198 $\pm$ 0.0005	0.5755 $\pm$ 0.0068	0.6078 $\pm$ 0.0095	0.6332 $\pm$ 0.0011

| Show Table

DownLoad: CSV

From the above tables, we will have the following observations.

First, the proposed method GNMFOSV achieves the best Purity, ACC, NMI, and RI values among all compared algorithms from a total of 13 datasets. It should be pointed that GNMFOSV outperforms other compared methods, with considerable advantages on ORL, AR, UCI-fou, UCI-kar datasets. For example, it makes about 7.7% increases against the second best method SNMF-H on the dataset UCI-fou in terms of ACC. Second, from the data in the above tables, the clustering algorithms of GNMF and GNMFOSV show certain competitive power at Purity, NMI, ACC, and RI comparing to the other methods. It indicates that the graph-based clustering algorithms are good at modeling the intrinsic structure of data. Furthermore, the orthogonal-based learning is promising over original NMF learning, which can be observed from the fact that the orthogonal-based algorithms, i.e., ONMF, NMFOS, and GNMFOSV, substantially boost the clustering performance on document datasets.

Third, compared with GNMF and NMFOS algorithms, neither the graph regularization algorithm nor the orthogonal subspace algorithm can achieve excellent clustering performance. However the GNMFOSV algorithm combines the advantages of graph regularity and orthogonal subspace, and it can significantly exceed for the original single-constrained NMF algorithm. In general, our proposed GNMFOSV algorithm consistently outperforms the other compared algorithms in terms of Purity, NMI, ACC, and RI metrics, which demonstrates the clear advance of our method.

In conclusion, the GNMFOSV method performs acceptably on all datasets, i.e., face image, digit image, object image, and document, which further validates its robustness in terms of steady clustering performance. The quantitative results fully demonstrate the effectiveness of our proposed GNMFOSV approach.

4.7. Convergence analysis

In this subsection, we investigate how fast the convergence speed is. The convergence curves of GNMFOSV on all datasets are demonstrated in based on all 13 datasets. The $x-axis$ denotes the number of iterations, and the $y-axis$ denotes the objective function value.

Figure 4. The convergence curves of the objective function value in GNMFOSV on (a) ORL, (b) AR, (c) Yale, (d) Handwritten digit, (e) UCI-fac, (f) UCI-fou, (g) [UCI-kar, (h) Diag-Bcw, (i) Caltech7, (j) Caltech20, (k) isolet3, (l) Vote, (m) Abalone datasets.

DownLoad: Full-Size Img PowerPoint

It can be seen from the convergence curve (tested on each dataset depicted in the figure) that our algorithm can achieve rapid convergence on each data set. In the datasets of AR, Handwritten digit, Diag-Bcw, Caltech7, Caltech20, and Vote, when the number of iterations is less than 20, the GNMFOSV algorithm can achieve stable convergence. On the ORL, Yale, UCI-fac, UCI-fou, UCI-kar and Abalone datasets, the speed of GNMFOSV drops rapidly, and then it becomes a gentle decline, and finally achieves stable convergence at about 100 iterations. On the isolet3 dataset, the convergence curve of GNMFOSV is very smooth, and convergence is achieved around 100 iterations.

So, Figure 4 indicates that the GNMFOSV algorithm has done a very good job in actual convergence experiments, and the choosen of 100 (for iteration) is good enough in actual experiments.

5. Conclusions

In this paper, we have proposed a new NMF cost function for clustering problems. The derived optimization model incorporates the orthogonal constraints, a graph Laplacian term, and auxiliary variable mechanism. The experiments show the superiority of the proposed algorithm (at least on the datasets used in this work). The pro-posed algorithm has the following advantages: (i) It leverages the geometric structure information hidden in both data and feature spaces effectively. Specifically, this model effectively exploits the geometric structure information hidden in the data and feature spaces. It models the data space as a sub-manifold embedded in a high-dimensional space. After conducting matrix factorization on this sub-manifold, it obtains a low-dimensional representation of data points, which is beneficial for data point clustering. Therefore, this model is more discriminative than algorithms based on orthogonal subspaces that only take into account the Euclidean structure of the data. Also, it can achieve better classification performance than graph-based algorithms because the latter do not consider the orthogonalization of the vectors in the factor matrices. (ii) By introducing an auxiliary variable, it provides a complete proof of convergence, filling the gap in demonstrating convergence under orthogonality conditions for such algorithms. (iii) After conducting 13 groups experiments with data of different sizes and types, the results show that, it is less sensitive to changing the size and type of the data. The performance of the proposed algorithm remains more or less stable even when the size and type of the data changes.

Author contributions

Caiping Wang: Conceptualization; Wen Li: Formal analysis; Caiping Wang, Wen Li, Junjian Zhao and Yasong Chen wrote the main manuscript text. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Use of Generative-AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

The first author is partially supported by the China Tobacco Hebei industrial co., Ltd Technology Project, China (Grant No. HBZY2023A034). The third and fourth authors are partially supported by the Natural Science Foundation of Tianjin City, China (Grant Nos. 24JCYBJC00430 and 18JCYBJC16300), the Natural Science Foundation of China (Grant No. 11401433), and the Ordinary Universities Undergraduate Education Quality Reform Research Project of Tianjin, China (Grant No. A231005802).

Conflict of interest

The authors declare there is no conflict of interest.

Appendix

Convergence analysis of Algorithm 1

In this section, we will investigate the convergence of Algorithm 1. Since the objective function of the GNMFOSV model (3.3) has a lower bound, we only need to prove (3.3) is nonincreasing with respect to the update rules (3.14)–(3.16). Before that, we first introduce a definition given by Lee and Seung ^[50], as well as a lemma with a concise proof.

Definition 1. ^[50] Let $F:{R}_+^{\upsilon\times\omega}\to {R}$ be continuously differentiable with $\upsilon, \omega\in\mathbb{N}$ , and $G:{R}_+^{\upsilon\times\omega}\times {R}_+^{\upsilon\times\omega}\rightarrow {R}$ also be continuously differentiable. Then $G$ is called an auxiliary function of $F$ if it satisfies the following conditions:

$\begin{equation*} G(U,U) = F(U),\ G(U,U^t)\geq F(U), \end{equation*}$

for any $U, U^t\in {R}_+^{\upsilon\times\omega}$ .

Lemma 1. Let $G$ be an auxiliary function of $F$ . Then $F$ is nonincreasing under the sequence of matrices $\{U^{t}\}_{t = 0}^{+\infty}$ generated by the following update rule:

$\begin{equation} U^{t+1} = \arg\min\limits_{U\in R^{\upsilon\times\omega}}G(U,U^t). \end{equation}$

(5.1)

Proof. By Definition 1 and (5.1), the following formula

$\begin{equation} G(U^{t+1},U^{t+1}) = F(U^{t+1})\leq G(U^{t+1},U^t)\leq G(U^t,U^t) = F(U^t) \end{equation}$

(5.2)

holds naturally. □

Next, we will provide the convergence proofs for the updated rules (3.14)–(3.16), which is also an important innovation of this paper. To this end, we first present the objective function $F$ , and it is introduced to replace the symbols in the objective function $J(W, H)$ in (3.4), i.e., $F = J(W, V, H)$ . Moreover, let $F_{W_{ik}}$ , $F_{V_{kj}}$ , and $F_{H_{kj}}$ represent the parts of the objective function (3.4) associated with the elements $W_{ik}$ , $V_{kj}$ , and $H_{kj}$ , respectively. In addition, due to the complexity of representation of $F(H)$ based only on $H$ , the function $F(H)$ will be rewritten as $F(H) = F_1(H)+F_2(H)$ with

$\begin{equation} F_1(H) = \frac{1}{2}\|X-WH\|_F^2+\frac{\lambda }{2}Tr(HLH^T),\ F_2(H) = \frac{\alpha_1} {2}\|I-HV^T\|_F^2+\frac{\alpha_2}{2}\|V-H\|_F^2. \end{equation}$

(5.3)

The first and second-order derivatives of $F$ with respect to $W_{ik}$ are listed below:

$\begin{equation*} F_{W_{ik}}^{'} = \left(\frac{\partial J(W,V,H)}{\partial W}\right)_{ik} = (WHH^T-XH^T)_{ik},\ F_{W_{ik}}^{''} = (HH^T)_{kk}. \end{equation*}$

Meanwhile, the first and second-order derivatives of $F_1(H)$ with respect to $H_{kj}$ will be shown below:

$\begin{equation*} F_{1_{H_{kj}}}^{'} = \left(\frac{\partial J(W,V,H)}{\partial H}\right)_{kj} = (W^TWH-W^TX+\lambda HL)_{kj},\ F_{1_{H_{kj}}}^{''} = (W^TW)_{kk}+(\lambda L)_{jj}. \end{equation*}$

Note that we have omitted the partial derivatives of $F_2$ and $F_{V_{kj}}$ because they are not required when constructing the auxiliary functions.

We can now define auxiliary functions. Since the update rules (3.14)–(3.16) are all updated in the form of each element of the matrix, according to Lemma 1, it is sufficient to prove that $F_{W_{ik}}$ , $F_{V_{kj}}$ and $F_{H_{kj}}$ do not increase under the sequences $\{W_ {ik}^t\}_{t = 0}^{+\infty}$ , $\{V_{kj}^t\}_{t = 0}^{+\infty}$ , and $\{H_{kj}^t\}_{t = 0}^{+\infty}$ generated by (3.14)–(3.16), respectively. Then, the specific expressions of auxiliary functions are

$\begin{equation} G_W(W_{ik},W_{ik}^{t}) = F_{W_{ik}}(W_{ik}^{t})+F_{W_{ik}}^{'}(W_{ik}^{t})(W_{ik}-W_{ik}^{t})+\frac{1}{2}\frac{(W^{t}HH^T)_{ik}}{W_{ik}^{t}}(W_{ik}-W_{ik}^{t})^{2}, \end{equation}$

(5.4)

$\begin{equation} G_V(V_{kj},V_{kj}^{t}) = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\frac1{(H{V^t}^T)_{ab}}\sum\limits_{r = 1}^nH_{ar}{V^t}^T_{rb}\left[\delta_{ab} -\frac{V_{br}}{V^t_{br}}(H{V^t}^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2, \end{equation}$

(5.5)

$\begin{equation} G_{H_1}(H_{kj},H_{kj}^{t}) = F_{1_{H_{kj}}}(H_{kj}^{t})+F_{1_{H_{kj}}}^{'}(H_{kj}^{t})(H_{kj}-H_{kj}^{t})+\frac{1}{2}\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^{t}}(H_{kj}-H_{kj}^{t})^{2}, \end{equation}$

(5.6)

$\begin{equation} G_{H_2}(H_{kj},H_{kj}^{t}) = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\frac1{(H^tV^T)_{ab}}\sum\limits_{r = 1}^nH^t_{ar}V^T_{rb} \left[\delta_{ab}-\frac{H_{ar}}{H^t_{ar}}(H^tV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2. \end{equation}$

(5.7)

We now have the following auxiliary function for $H$ via (5.6) and (5.7):

$\begin{equation} G_H(H_{kj},H_{kj}^{t}) = G_{H_1}(H_{kj},H_{kj}^{t})+G_{H_2}(H_{kj},H_{kj}^{t}). \end{equation}$

(5.8)

Next, we will provide one by one in the form of lemmas that (5.4), (5.5), and (5.8) are indeed auxiliary functions. The first is to show that $G_W(W_{ik}, W_{ik}^{t})$ is an auxiliary function of $F_{W_{ik}}$ .

Lemma 2. Let $G_W(W_{ik}, W_{ik}^{t})$ be defined by the formula (5.4). Then, $G_W(W_{ik}, W_{ik}^{t})$ is an auxiliary function of $F_{W_{ik}}$ .

Proof. By expanding $F_{W_{ik}}(W_{ik})$ based on the Taylor series, we have the following expansion:

$\begin{equation*} F_{W_{ik}}(W_{ik}) = F_{W_{ik}}(W_{ik}^t)+F_{W_{ik}}^{'}(W_{ik}^t)(W_{ik}-W_{ik}^t)+\frac{1}{2}(HH^T)_{kk}(W_{ik}-W_{ik}^t)^2. \end{equation*}$

To prove that $G_W(W_{ik}, W_{ik}^{t})\geq F_{W_{ik}}(W_{ik})$ , it is sufficient to prove the following inequality:

$\begin{equation} \frac{(W^tHH^T)_{ik}}{W_{ik}^t}\geq(HH^T)_{kk}. \end{equation}$

(5.9)

In fact, (5.9) is true via

$\begin{equation} (W^tHH^T)_{ik} = \sum\limits_{r = 1}^KW_{ir}^t(HH^T)_{rk}\geq W_{ik}^t(HH^T)_{kk}. \end{equation}$

(5.10)

In addition, (5.10) can also lead to

$\begin{equation*} \frac{1}{2}\frac{(W^{t}HH^T)_{ik}}{W_{ik}^{t}}(W_{ik}-W_{ik}^{t})^{2} \geq \frac{1}{2}\frac{W_{ik}^t(HH^T)_{kk}}{W_{ik}^t}(W_{ik}-W_{ik}^t)^2 = \frac{1}{2}(HH^T)_{kk}(W_{ik}-W_{ik}^t)^2. \end{equation*}$

Therefore, $G_W(W_{ik}, W_{ik}^t)\geq F_{W_{ik}}(W_{ik})$ follows. Meanwhile, $G_W(W_{ik}, W_{ik}) = F_{W_{ik}}(W_{ik})$ holds automatically. In summary, one has that $G_W(W_{ik}, W_{ik}^{t})$ is an auxiliary function of $F_{W_{ik}}$ . □

The following lemmas are needed to prove $G_V(V_{kj}, V_{kj}^{t})$ is an auxiliary function of $F_{V_{kj}}$ .

Lemma 3. ^[67] Let $\Omega\subseteq R^N$ be a convex set, $F:\Omega\to\ R$ a convex function, and $\lambda_{i}\in[0, 1]$ with $i\in\{1, \cdots, k\}$ and $\sum_{i = 1}^k\lambda_i = 1$ . Then, for all $x_{i}\in\Omega$ , $F\left(\sum_{i = 1}^k\lambda_i x_i\right)\leq\sum_{i = 1}^k\lambda_iF(x_i).$

Lemma 4. Suppose $F(X) = \frac12\|Y-KX\|_F^2$ with $X\in R^{p\times m}_+$ . Let $A\in R^{p\times m}_+$ and $K\in R^{n\times p}_+$ ), then,

$\begin{equation*} F(X)\leq\frac12\sum\limits_{i = 1}^n\sum\limits_{j = 1}^m\frac1{(KA)_{ij}}\sum\limits_{k = 1}^pK_{ik}A_{kj}\left(Y_{ij}-\frac{X_{kj}}{A_{kj}}(KA)_{ij}\right)^2. \end{equation*}$

Proof. First, let's expand the Frobenius norm of $F(X)$ in the following way:

$\begin{equation*} F(X) = \frac{1}{2}\|Y-KX\|_{F}^{2} = \frac{1}{2}\sum\limits_{i = 1}^{n}\sum\limits_{j = 1}^{m}(Y_{ij}-(KX)_{ij})^{2}. \end{equation*}$

Second, by Lemma 3, letting $\lambda_k = \frac{K_{ik}A_{kj}}{(KA)_{ij}}$ and using the auxiliary variable $A$ , we have that

$\begin{equation*} [Y_{ij}-(KX)_{ij}]^2 = [Y_{ij}-\sum\limits_{k = 1}^p\frac{K_{ik}A_{kj}}{(KA)_{ij}}\frac{X_{kj}(KA)_{ij}}{A_{kj}}]^2 \leq\sum\limits_{k = 1}^p\frac{K_{ik}A_{kj}}{(KA)_{ij}}\left(Y_{ij}-\frac{X_{kj}}{A_{kj}}(KA)_{ij}\right)^2. \end{equation*}$

Therefore, it leads to

$\begin{equation*} F(X) = \frac12\|Y-KX\|_F^2 \leq\frac12\sum\limits_{i = 1}^n\sum\limits_{j = 1}^m\frac1{(KA)_{ij}}\sum\limits_{k = 1}^pK_{ik}A_{kj}\left(Y_{ij}-\frac{X_{kj}}{A_{kj}}(KA)_{ij}\right)^2, \end{equation*}$

and the result holds. □

So, we have the following conclusion that $G_V(V_{kj}, V_{kj}^{t})$ is an auxiliary function of $F_{V_{kj}}$ .

Lemma 5. Let $G_V(V_{kj}, V_{kj}^{t})$ be defined by (5.5), then it serves as an auxiliary function for $F_{V_{kj}}$ .

Proof. According to Lemmas 3 and 4, we can estimate $G_V(V_{kj}, V_{kj}^{t})$ as follows:

$\begin{equation*} \begin{aligned} G_V(V_{kj},V_{kj}^{t})& = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\frac1{(H{V^t}^T)_{ab}}\sum\limits_{r = 1}^nH_{ar}{V^t}^T_{rb}\left[\delta_{ab}-\frac{V_{br}}{V^t_{br}}(H{V^t}^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ &\geq\frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-\sum\limits_{r = 1}^n\frac{H_{ar}V^{t^T}_{rb}}{(HV^{t^T})_{ab}}\frac{V_{br}}{V_{br}^t}(HV^{t^T})_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ & = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-(HV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ & = F(V). \end{aligned} \end{equation*}$

In addition, it is easy to see that $G_V(V_{kj}, V_{kj}) = F_{V_{kj}}(V_{kj})$ and the result holds. □

The third is to prove that $G_H(H_{kj}, H_{kj}^{t})$ can serve as an auxiliary function of $F_{H_{kj}}$ .

Lemma 6. Let $G_H(H_{kj}, H_{kj}^{t})$ be defined as in (5.8), then it can serve as an auxiliary function for $F_{H_{kj}}$ .

Proof. According to the definitions of $F_1(H)$ and $F_2(H)$ in (5.3), to prove

$G_H(H_{kj},H_{kj}^{t}) = G_{H_1}(H_{kj},H_{kj}^{t})+G_{H_2}(H_{kj},H_{kj}^{t})\geq F_1(H_{kj})+F_2(H_{kj}) = F(H_{kj}),$

it is sufficient to show that

$\begin{equation} G_{H_1}(H_{kj},H_{kj}^{t})\geq F_1(H_{kj}),\ G_{H_2}(H_{kj},H_{kj}^{t})\geq F_2(H_{kj}). \end{equation}$

(5.11)

The proof of the first part in (5.11) is similar to Lemma 2. To eliminate unnecessary misunderstandings, we will provide it in detail. First, by a Taylor series expansion of $F_{1_{H_{kj}}}(H_{kj})$ , we have

$\begin{equation*} F_{1}(H_{kj}) = F_{1}(H_{kj}^{t})+F_{1}^{'}(H_{kj}^{t})(H_{kj}-H_{kj}^{t}) +\frac{1}{2}[(W^TW)_{kk}+(\lambda L)_{jj}](H_{kj}-H_{kj}^{t})^{2}. \end{equation*}$

Note that $(W^TWH^t)_{kj} = \sum_{r = 1}^{K}(W^TW)_{kr}(H^t)_{rj}\geq (W^TW)_{kk}H^t_{kj},$ $(\lambda H^t D)_{kj} = \lambda \sum_{r = 1}^{K}H^t_{kr}D_{rj}\geq \lambda H^t_{kj}D_{jj} \geq \lambda H^t_{kj}(D-S)_{jj} = \lambda H^t_{kj}L_{jj},$ then, $\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^{t}} \geq (W^TW)_{kk}+(\lambda L)_{jj}.$ Therefore,

$\begin{equation*} \begin{aligned} \frac{1}{2}\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^{t}}(H_{kj}-H_{kj}^{t})^{2} &\geq \frac{1}{2}\frac{(W^TW)_{kk}(H^t)_{kj}+\lambda(H^t)_{kj}L_{jj}}{H_{kj}^{t}} (H_{kj}-H_{kj}^{t})^{2} \\ & = \frac{1}{2}[(W^TW)_{kk}+(\lambda L)_{jj}](H_{kj}-H_{kj}^{t})^{2} \end{aligned} \end{equation*}$

holds and $G_{H_1}(H_{kj}, H_{kj}^t)\geq F_{1}(H_{kj})$ follows.

The second part of (5.11) is also similar to Lemma 5. By Lemmas 3 and 4,

$\begin{equation*} \begin{aligned} G_{H_2}(H_{kj},H_{kj}^{t}) & = \frac{\alpha_{1}}{2}\sum\limits_{a = 1}^{k}\sum\limits_{b = 1}^{k}\frac{1}{(H^tV^{T})_{ab}}\sum\limits_{r = 1}^{n}H^t_{ar}V^{T}_{rb}\left[\delta_{ab}-\frac{H_{ar}}{H_{ar}^{t}}(H^tV^{T})_{ab}\right]^{2}+\frac{\alpha_{2}}{2}\|V-H\|_{F}^{2}\\ &\geq\frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-\sum\limits_{r = 1}^n\frac{H^t_{ar}V_{rb}^T}{(H^tV^T)_{ab}}\frac{H_{ar}}{H_{ar}^t}(H^tV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2\\ & = \frac{\alpha_1}2\sum\limits_{a = 1}^k\sum\limits_{b = 1}^k\left[\delta_{ab}-(HV^T)_{ab}\right]^2+\frac{\alpha_2}2\|V-H\|_F^2 = F_2(H_{kj}). \end{aligned} \end{equation*}$

follows and $G_{H_2}(H_{kj}, H_{kj}^t)\geq F_2(H_{kj})$ holds. Then the result follows. □

Based on the above preparations, we now provide the main conclusion of this paper.

Theorem 1. Let $X\in R_{+}^{m\times n}$ , $W^0\in R_{+}^{m\times k}$ , $V^0\in R_{+}^{k\times n}$ , and $H^0\in R_{+}^{k\times n}$ . Then, the objective function in (3.4) is nonincreasing under sequences $\{W^{t}, V^{t}, H^{t}\}_{t = 0}^{+\infty}$ generated by (3.14)–(3.16).

Proof. First, since $\frac{\partial G_W(W_{ik}, W_{ik}^{t})}{\partial W_{ik}} = F_{W_{ik}}^{'}(W_{ik}^{t})+\frac{(W^{t}HH^T)_{ik}}{W_{ik}^{t}}(W_{ik}-W_{ik}^{t})$ and letting $\frac{\partial G_W(W_{ik}, W_{ik}^{t})}{\partial W_{ik}} = 0$ , it then goes to $F_{W_{ik}}^{'}(W_{ik}^{t})W_{ik}^{t}+(W^{t}HH^T)_{ik}W_{ik}-(W^{t}HH^T)_{ik}W_{ik}^{t} = 0.$ According to Lemma 2,

$\begin{equation*} W_{ik} = \frac{W_{ik}^{t}(W^{t}HH^T)_{ik}-W_{ik}^{t}F_{W_{ik}}^{'}(W_{ik}^{t})}{(W^{t}HH^T)_{ik}} = \frac{W_{ik}^{t}(W^{t}HH^T)_{ik}-W_{ik}^{t}(W^{t}HH^T-XH^T)_{ik}}{(W^{t}HH^T)_{ik}} = W_{ik}^{t}\frac{(XH^T)_{ik}}{(W^{t}HH^T)_{ik}}. \end{equation*}$

By Lemma 1, if $W_{ik}^{t+1} = \arg\min_{W_{ik}\in R}G_W(W_{ik}, W_{ik}^{t}) = W_{ik}^{t}\frac{XH^T}{(W^{t}HH^T)_{ik}},$ then, (3.14) is nonincreasing. This with (3.4) has a nonnegative lower bound, then, (3.14) is a convergent iterative formula.

Second, due to the fact that

$\begin{equation*} \begin{aligned} \frac{\partial G_{V}(V_{kj},V_{kj}^{t})}{\partial V_{kj}} = &\frac{\alpha_1}2\sum\limits_{a = 1}^k\frac{H_{aj}V^{t^T}_{jk}}{(HV^{t^T})_{ak}}2\left[1-\frac{V_{kj}}{V_{kj}^t}(HV^{t^T})_{ak}\right] \left[-\frac{(HV^{t^T})_{ak}}{V^{t^T}_{kj}}\right]+\frac{\alpha_2}22(V_{kj}-H_{kj})\\ = &\alpha_1\sum\limits_{a = 1}^k\left[-H_{aj}+\frac{H_{aj}}{V_{kj}^t}(HV^{t^T})_{ak}V_{kj}\right]+\alpha_2(V_{kj}-H_{kj})\\ = &\alpha_1\sum\limits_{a = 1}^kH_{aj}\left[\frac{V_{kj}}{V_{kj}^t}(HV^{t^T})_{ak}-\delta_{ak}\right]+\alpha_2(V_{kj}-H_{kj})\\ = &\alpha_1\left[\frac{V_{kj}}{V_{kj}^t}(V^tH^TH)_{kj}-H_{kj}\right]+\alpha_2(V_{kj}-H_{kj}). \end{aligned} \end{equation*}$

Let $\frac{\partial G_{V}(V_{kj}, V_{kj}^{t})}{\partial V_{kj}} = 0$ , then, $\alpha_1\frac{V_{kj}}{V_{kj}^t}(V^tH^TH)_{kj}+\alpha_2V_{kj} = (\alpha_1+\alpha_2)H_{kj}, \alpha_1V_{kj}(V^tH^TH)_{kj}+\alpha_2V_{kj}V_{kj}^t = (\alpha_1+\alpha_2)H_{kj}V_{kj}^t.$ According to Lemma 5, we can get that $V_{kj} = V_{kj}^{t}\frac{\left[(\alpha_1+\alpha_2)H \right]_{kj}}{\left[\alpha_1V^tH^TH+\alpha_2V^t \right]_{kj}}.$ So, by Lemma 1, if $V_{kj}^{t+1} = \arg\min_{V_{kj}\in R}G_V(V_{kj}, V_{kj}^{t}) = V_{kj}^{t}\frac{\left[(\alpha_1+\alpha_2)H \right]_{kj}}{\left[\alpha_1V^tH^TH+\alpha_2V^t \right]_{kj}},$ then (3.15) is nonincreasing. Besides, the objective function in (3.4) has a nonnegative lower bound, and it can be concluded that the formula in (3.15) is a convergent iterative formula.

Third, it is true that $\frac{\partial G_{H_1}(H_{kj}, H_{kj}^t)}{\partial H_{kj}} = F_{1_{H_{kj}}}^{'}(H_{kj}^t)+\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^t}(H_{kj}-H_{kj}^t)$ and

$\begin{equation*} \begin{aligned} \frac{\partial G_{H_2}(H_{kj},H_{kj}^t)}{\partial H_{kj}} = &\frac{\alpha_{1}}{2}\sum\limits_{b = 1}^{k}\frac{H^t_{kj}V_{jb}^T}{(H^tV^T)_{kb}}2\left[1-\frac{H_{kj}}{H_{kj}^{t}}(H^tV^T)_{kb}\right]\left[-\frac{(H^tV^T)_{kb}}{H_{kj}^{t}}\right]+\frac{\alpha_{2}}{2}2(H_{kj}-V_{kj})\\ = &\alpha_{1}\sum\limits_{b = 1}^{k}\left[-V^T_{jb}+\frac{V^T_{jb}}{H_{kj}^{t}}(H^tV^T)_{kb}H_{kj}\right]+\alpha_{2}(H_{kj}-V_{kj})\\ = &\alpha_{1}\sum\limits_{b = 1}^{k}V^T_{jb}\left[\frac{H_{kj}}{H_{kj}^{t}}(H^tV^T)_{kb}-\delta_{kb}\right]+\alpha_{2}(H_{kj}-V_{kj})\\ = &\alpha_1\frac{H_{kj}}{H_{kj}^t}(H^tV^TV)_{kj}+\alpha_2H_{kj}-(\alpha_1+\alpha_2)V_{kj}. \end{aligned} \end{equation*}$

Let $\frac{\partial G_{H}(H_{kj}, H_{kj}^{t})}{\partial H_{kj}} = \frac{\partial G_{H_1}(H_{kj}, H_{kj}^{t})}{\partial H_{kj}}+\frac{\partial G_{H_2}(H_{kj}, H_{kj}^{t})}{\partial H_{kj}} = 0$ , then,

$\begin{equation*} \begin{aligned} &F_{1_{H_{kj}}}^{'}(H_{kj}^t)+\frac{(W^TWH^t+\lambda H^tD)_{kj}}{H_{kj}^t}(H_{kj}-H_{kj}^t)+\alpha_1\frac{H_{kj}}{H_{kj}^t}(H^tV^TV)_{kj}+\alpha_2H_{kj}-(\alpha_1+\alpha_2)V_{kj} = 0\\ &F_{1_{H_{kj}}}^{'}(H_{kj}^{t})H_{kj}^{t}+(W^{T}WH^{t}+\lambda H^{t}D)_{kj}H_{kj}-(W^{T}WH^{t}+\lambda H^{t}D)_{kj}H_{kj}^{t}+\alpha_{1}H_{kj}(H^{t}V^{T}V)_{kj}\\ &+\alpha_{2}H_{kj}H^t_{kj}-(\alpha_{1}+\alpha_{2})V_{kj}{H_{kj}^{t}} = 0. \end{aligned} \end{equation*}$

According to Lemma 6, it leads to $H_{kj} = H^t_{kj}\frac{\left[W^TX+\lambda H^tS+(\alpha_1+\alpha_2)V \right]_{kj}}{\left[W^TWH^t+\lambda H^tD+\alpha_1H^tV^TV+\alpha_2H^t \right]_{kj}}.$ By Lemma 1, if

$\begin{equation*} H_{kj}^{t+1} = \arg\min\limits_{H_{kj}\in R}G_H(H_{kj},H_{kj}^{t}) = H^t_{kj}\frac{\left[W^TX+\lambda H^tS+(\alpha_1+\alpha_2)V \right]_{kj}}{\left[W^TWH^t+\lambda H^tD+\alpha_1H^tV^TV+\alpha_2H^t \right]_{kj}} \end{equation*}$

holds, then (3.16) is nonincreasing. Once again, since the objective function in (3.4) has a nonnegative lower bound, it can be concluded that the formula (3.16) is a convergent iterative formula. □

References

[1]	M. K. Aouf, J. Dziok, J. Sokól, On a subclass of strongly starlike functions, Appl. Math. Lett., 24 (2011), 27-32. doi: 10.1016/j.aml.2010.08.004
[2]	Y. R. Chen, R. Srivastava, J. L. Liu, A linear operator associated with a certain variation of the Bessel function J_ν(z) and related conformal mappings, J. Pseudo-Differ. Oper. Appl., (2019), 1-14.
[3]	N. E. Cho, H. J. Lee, J. H. Park, et al. Some applications of the first-order differential subordinations, Filomat, 30 (2016), 1456-1474.
[4]	S. Devi, H. M. Srivastava, A. Swaminathan, Inclusion properties of a class of functions involving the Dziok-Srivastava operator, Korean J. Math., 24 (2016), 139-168. doi: 10.11568/kjm.2016.24.2.139
[5]	J. Dziok, Classes of meromorphic functions associated with conic regions, Acta Math. Sci., 32 (2012), 765-774. doi: 10.1016/S0252-9602(12)60056-3
[6]	Y. C. Kim, Mapping properties of differential inequalities related to univalent functions, Appl. Math. Comput., 187 (2007), 272-279.
[7]	J. L. Liu, Applications of differential subordinations for generalized Bessel functions, Houston J. Math., 45 (2019), 71-85.
[8]	J. L. Liu, R. Srivastava, Hadamard products of certain classes of p-valent starlike functions, Rev. Real Acad. Cienc. Exactas Fís. Natur. Ser. A Mat., 113 (2019), 2001-2015.
[9]	S. Mahmood, J. Sokól, New subclass of analytic functions in conical domain associated with Ruscheweyh q-differential operator, Results Math., 71 (2017), 1345-1357. doi: 10.1007/s00025-016-0592-1
[10]	S. S. Miller, P. T. Mocanu, Differential subordinations and univalent functions, Michigan Math. J., 28 (1981), 157-171.
[11]	M. Nunokawa, H. M. Srivastava, N. Tuneski, Some Marx-Strohhäcker type results for a class of multivalent functions, Miskolc Math. Notes, 18 (2017), 353-364. doi: 10.18514/MMN.2017.1952
[12]	H. M. Srivastava, M. K. Aouf, A. O. Mostafa, et al. Certain subordination-preserving family of integral operators associated with p-valent functions, Appl. Math. Inform. Sci., 11 (2017), 951- 960.
[13]	H. M. Srivastava, R. M. El-Ashwah, N. Breaz, A certain subclass of multivalent functions involving higher-order derivatives, Filomat, 30 (2016), 113-124. doi: 10.2298/FIL1601113S
[14]	H. M. Srivastava, B. Khan, N. Khan, et al. Coefficient inequalities for q-starlike functions associated with the Janowski functions, Hokkaido Math. J., 48 (2019), 407-425. doi: 10.14492/hokmj/1562810517
[15]	Y. Sun, Y. P. Jiang, A. Rasila, et al. Integral representations and coefficient estimates for a subclass of meromorphic starlike functions, Complex Anal. Oper. Theory, 11 (2017), 1-19. doi: 10.1007/s11785-016-0531-x

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.1

Metrics

Article views(3674) PDF downloads(218) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Mathematics

On second-order differential subordination for certain meromorphically multivalent functions

Related Papers:

Abstract

1. Introduction

2. Symbols and classic models

2.1. NMF

2.2. NMFOS

2.3. GNMF

3. A new class of NMF-type model and algorithm

3.1. Methodology

3.2. Complexity analysis of algorithm

4. Experimental results and analysis

4.1. Datasets

4.2. Algorithms used for comparison

4.3. Settings

4.4. Parameter sensitivity

4.5. Ablation study

4.6. Clustering results and analysis

4.7. Convergence analysis

5. Conclusions

Author contributions

Use of Generative-AI tools declaration

Acknowledgments

Conflict of interest

Appendix

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Catalog

AIMS Mathematics

On second-order differential subordination for certain meromorphically multivalent functions

Related Papers:

Abstract

1. Introduction

2. Symbols and classic models

2.1. NMF

2.2. NMFOS

2.3. GNMF

3. A new class of NMF-type model and algorithm

3.1. Methodology

3.2. Complexity analysis of algorithm

4. Experimental results and analysis

4.1. Datasets

4.2. Algorithms used for comparison

4.3. Settings

4.4. Parameter sensitivity

4.5. Ablation study

4.6. Clustering results and analysis

4.7. Convergence analysis

5. Conclusions

Author contributions

Use of Generative-AI tools declaration

Acknowledgments

Conflict of interest

Appendix

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog