Loading [MathJax]/extensions/TeX/mathchoice.js
Research article Special Issues

Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization


  • Non-negative matrix factorization (NMF) has been widely used in machine learning and data mining fields. As an extension of NMF, non-negative matrix tri-factorization (NMTF) provides more degrees of freedom than NMF. However, standard NMTF algorithm utilizes Frobenius norm to calculate residual error, which can be dramatically affected by noise and outliers. Moreover, the hidden geometric information in feature manifold and sample manifold is rarely learned. Hence, a novel robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization (RCHNMTF) is proposed. First, a robust capped norm is adopted to handle extreme outliers. Second, dual hyper-graph regularization is considered to exploit intrinsic geometric information in feature manifold and sample manifold. Third, orthogonality constraints are added to learn unique data presentation and improve clustering performance. The experiments on seven datasets testify the robustness and superiority of RCHNMTF.

    Citation: Jiyang Yu, Baicheng Pan, Shanshan Yu, Man-Fai Leung. Robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization[J]. Mathematical Biosciences and Engineering, 2023, 20(7): 12486-12509. doi: 10.3934/mbe.2023556

    Related Papers:

    [1] Shaofang Hong, Rongjun Wu . On deep holes of generalized Reed-Solomon codes. AIMS Mathematics, 2016, 1(2): 96-101. doi: 10.3934/Math.2016.2.96
    [2] Jing Huang, Jingge Liu, Dong Yu . Dimensions of the hull of generalized Reed-Solomon codes. AIMS Mathematics, 2024, 9(6): 13553-13569. doi: 10.3934/math.2024661
    [3] Xiaofan Xu, Yongchao Xu, Shaofang Hong . Some results on ordinary words of standard Reed-Solomon codes. AIMS Mathematics, 2019, 4(5): 1336-1347. doi: 10.3934/math.2019.5.1336
    [4] Xiaofan Xu, Yongchao Xu . Some results on deep holes of generalized projective Reed-Solomon codes. AIMS Mathematics, 2019, 4(2): 176-192. doi: 10.3934/math.2019.2.176
    [5] Claude Carlet . Identifying codewords in general Reed-Muller codes and determining their weights. AIMS Mathematics, 2024, 9(5): 10609-10637. doi: 10.3934/math.2024518
    [6] Xuesong Si, Chuanze Niu . On skew cyclic codes over M2(F2). AIMS Mathematics, 2023, 8(10): 24434-24445. doi: 10.3934/math.20231246
    [7] Wei Qi . The polycyclic codes over the finite field Fq. AIMS Mathematics, 2024, 9(11): 29707-29717. doi: 10.3934/math.20241439
    [8] Guanghui Zhang, Shuhua Liang . On the construction of constacyclically permutable codes from constacyclic codes. AIMS Mathematics, 2024, 9(5): 12852-12869. doi: 10.3934/math.2024628
    [9] Adel Alahmadi, Tamador Alihia, Patrick Solé . The build up construction for codes over a non-commutative non-unitary ring of order 9. AIMS Mathematics, 2024, 9(7): 18278-18307. doi: 10.3934/math.2024892
    [10] Ismail Aydogdu . On double cyclic codes over Z2+uZ2. AIMS Mathematics, 2024, 9(5): 11076-11091. doi: 10.3934/math.2024543
  • Non-negative matrix factorization (NMF) has been widely used in machine learning and data mining fields. As an extension of NMF, non-negative matrix tri-factorization (NMTF) provides more degrees of freedom than NMF. However, standard NMTF algorithm utilizes Frobenius norm to calculate residual error, which can be dramatically affected by noise and outliers. Moreover, the hidden geometric information in feature manifold and sample manifold is rarely learned. Hence, a novel robust capped norm dual hyper-graph regularized non-negative matrix tri-factorization (RCHNMTF) is proposed. First, a robust capped norm is adopted to handle extreme outliers. Second, dual hyper-graph regularization is considered to exploit intrinsic geometric information in feature manifold and sample manifold. Third, orthogonality constraints are added to learn unique data presentation and improve clustering performance. The experiments on seven datasets testify the robustness and superiority of RCHNMTF.



    Data-intensive machine learning has become widely used, and as the size of training data increases, distributed methods are becoming increasingly popular. However, the performance of distributed methods is mainly determined by stragglers, i.e., nodes that are slow to respond or are unavailable.

    Raviv et al. [11] used coding theory and graph theory to reduce stragglers in distributed synchronous gradient descent. A coding theory framework for straggler mitigation, called gradient coding, was first introduced by Tandon et al. [14]. Gradient coding consists of a system with one master and n worker nodes, where the data are partitioned into k parts, and one or more parts are assigned to each worker. In turn, each worker computes the partial gradients on each given partition, combines the results linearly according to a predefined vector of coefficients, and sends this linear combination back to the primary node. By choosing the coefficients at each node appropriately, it can be guaranteed that the primary node can reconstruct the full gradient even if a machine fails to do its job.

    The importance of straggler mitigation is demonstrated in [8,16]. Specifically, it was shown by Tandon et al. [14] that stragglers run up to 5 times slower than the performance of typical workers (8 times in [16]). In [11], for gradient calculations, a cyclic maximum distance separable (MDS) code is used to obtain a better deterministic construction scheme than existing solutions, both in the range of parameters that can be applied and in the complexity of the algorithms involved.

    One well-known family of MDS codes is generalized Reed-Solomon (GRS) codes. GRS codes have interesting mathematical structures and many real-world applications, such as mass storage systems, cloud storage systems, and public-key cryptosystems. On the other hand, although more complex than cyclic codes, quasi-cyclic codes satisfy the condition of the Gilbert-Varshamov lower bound at minimum distances, as shown in [6]. Quasi-cyclic codes are also equivalent to linear codes with circulant block generator matrices. This type of matrix has circular blocks of the same size, such as m, which denotes the co-indexes of the associated quasi-cyclic code. From this point of view, one way to generalize quasi-cyclic codes is to let the generator matrix have circular blocks of different sizes. This code is called a generalized quasi-cyclic code with shared indices (m1,m2,,mk), where m1,m2,,mk represents the size of the circular block in the generator matrix.

    In [10], a generalized quasi-cyclic code without block length limitations is studied. By relaxing the conditions on block length, several new optimal codes with small lengths could be found. In addition, the code decomposition and dimension formulas given by [3,12,13] have been generalized.

    In this paper, we describe the construction of generalized quasi-cyclic GRS codes over totally real number fields, as well as their application in exact gradient coding. The construction method is derived by integrating known results from the inverse Galois problem for totally real number fields. Furthermore, methods in [2,4,11,14] will be adapted to generalized quasi-cyclic GRS codes to mitigate stragglers.

    Let F be a Galois extension of Q and choose non-zero elements v1,,vn in F and distinct elements a1,,an in F. Also, let v=(v1,,vn) and a=(a1,,an). For 1kn, define the GRS codes as follows:

    GRSn,k(a,v)={(v1f(a1),,vnf(an))|f(x)F[x]k},

    where F[x]k is the set of all polynomials over F with degree less than k. The canonical generator of GRSn,k(a,v) is given by the following matrix:

    G=(v1v2vjvnv1a1v2a2vjajvnanv1a21v2a22vja2jvna2nv1ai1v2ai2vjaijvnainv1ak11v2ak12vjak1jvnak1n) (2.1)

    Theorem 2.1. [7] Let vFn be a tuple of non-zero elements in F and aFn be a tuple of pairwise distinct elements in F; then,

    a) The GRSn,k(a,v) is a [n,k,nk+1] code, i.e., GRS codes are MDS codes.

    b) The dual code of GRSn,k(a,v) is as follows:

    GRSn,k(a,v)=GRSn,nk(a,u),

    where u=(u1,,un) with

    u1i=viji(aiaj).

    Proof. (a) See the proof of [7, Theorem 6.3.3]. (b) See the proof of [7, Theorem 6.5.1].

    Let ¯F=F{} and a be an n-tuple of mutually distinct elements of ¯F, and let c be an n-tuple of non-zero elements of F. Also, define

    [ai,aj]=aiaj,[,aj]=1[ai,]=1for allai,ajF.

    Definition 2.2. ([9]) Let B(a,c) be the k×(nk) matrix with the following entries:

    cj+kci[aj+k,ai],for1ik,1jnk.

    The generalized Cauchy code Ck(a,c) is an [n,k,nk+1] code defined by the generator matrix (Ik|B(a,c)).

    The following proposition shows that the GRS codes are also generalized Cauchy codes.

    Proposition 2.3. [9, Proposition C.2] Let a be an n-tuple of mutually distinct elements of ¯F, and let c be an n-tuple of non-zero elements of F. Also, let

    ci={bikt=1,ti[ai,at],if1ik;bikt=1[ai,at],ifk+1in.

    Then, GRSn,k(a,b)=Ck(a,c).

    Let Gal(F/Q) be the Galois group of F over Q and PΓL(2,F) denote the group of semilinear fractional transformations given by

    f:¯F¯Fxaγ(x)+bcγ(x)+d,

    where adbc0 and γGal(F/Q). Let Sn be the symmetric group on a set of n elements and Per(C)={ξSn|ξ(C)=C}, where n is the length of the code C. The set Per(C) is called the permutation group of the code C. We have the following theorem that is related to the permutation group of a Cauchy code.

    Theorem 2.4. [1, Corollary 2] Let C=Ck(a,y) be a Cauchy code over F, where 2kn2 and a=(a1,,an). Also, let L={a1,,an}. Then, the map

    ω:{fPΓL(2,F)|f(L)=L}Per(C)fσ,

    where aσ(i)=f(ai) for i=1,,n is a surjective group homomorphism.

    A number field F is a finite Galois extension of the rational field Q. In this section, we describe a way to construct a number field F with Gal(F/Q)σ for σSn, where σ is a cyclic subgroup generated by σ.

    Let σ=σ1σ2σt be a permutation in Sn, where σ1,σ2,,σt are disjoint cycles. Also, let σ be the cyclic group generated by σ. Let l(σj) be the length of the cycle σj, and define a set P={p:pprime andj{1,,t}p|l(σj)}. Since P is finite, assume that p1<p2<<p|P| are all elements in P. For any j, we have

    l(σj)=|P|i=1pαiji, (3.1)

    where αijZ0. Based on Eq (3.1), we have

    ord(σ)=|σ|=|P|i=1pmaxj{αij}i, (3.2)

    where ord(σ) is the order of the permutation σ. Since σ contains the element of order pmaxj{αij}i for all i=1,,|P|, by the structure theorem for finite Abelian groups, we have

    σ|P|i=1Zpmaxj{αij}iZZ|P|i=1pmaxj{αij}iZ. (3.3)

    Let ζp be the primitive p-th root of unity and Q(ζp) be the corresponding cyclotomic extension of Q. The following theorem shows a Galois extension of Q, where its Galois group is isomorphic to σ. The proof of the theorem is similar to the proof of [5, Theorem 3.1.11]. We write the proof here to give a sense of how to construct the related Galois extension.

    Theorem 3.1. There exists a totally real Galois extension K of Q such that Gal(K/Q)σ.

    Proof. By Eq (3.3), we have

    σ|P|i=1Zpmaxj{αij}iZZ|P|i=1pmaxj{αij}iZ.

    Now, choose a prime p such that

    p\equiv 1 \mod 2\prod\limits_{i = 1}^{|\mathcal{P}|}p_i^{\max\limits_j\{\alpha_{ij}\}}.

    Let \zeta_p be the p -th root of unity. By [5, Theorem C.0.3], \mathbb{Q}(\zeta_p) is a Galois extension of \mathbb{Q}, with its corresponding Galois group being isomorphic to G = \left(\mathbb{Z}/p\mathbb{Z}\right)^\times, where \left(\mathbb{Z}/p\mathbb{Z}\right)^\times is the multiplicative group of \mathbb{Z}/p\mathbb{Z}-\left\{\overline{0}\right\}. Since p is a prime number, G is a cyclic group. Moreover, we can find a unique subgroup H of G such that

    |H| = {\frac{p-1}{\prod_{i = 1}^{|\mathcal{P}|}p_i^{\max_j\{\alpha_{ij}\}}}}.

    Let \mathbb{Q}(\zeta_p)^H be a subset of \mathbb{Q}(\zeta_p) which is invariant under the action of H. By the fundamental theorem of Galois theory ([15, Theorem 25]), \mathbb{Q}(\zeta_p)^H is also a Galois extension of \mathbb{Q}, with the corresponding Galois group isomorphic to G/H. Moreover, \left|G/H\right| = \prod_{i = 1}^{|\mathcal{P}|}p_i^{\max_j\{\alpha_{ij}\}}, and, as a consequence,

    G/H\cong \frac{\mathbb{Z}}{\prod_{i = 1}^{|\mathcal{P}|}p_i^{\max_j\{\alpha_{ij}\}}\mathbb{Z}} \cong\langle\sigma\rangle.

    Also, by using a similar argument as in the proof of [5, Theorem 3.1.11], we have that \mathbb{Q}(\zeta_p)^H is a totally real Galois extension of \mathbb{Q}. The following algorithm provides a way to construct \mathbb{Q}(\zeta_p)^H in the proof of Theorem 3.1. The algorithm is based on Theorem 3.1 and [5, Proposition 3.3.2].

    Algorithm 3.2. Suppose that \sigma\in S_n and G = Gal(\mathbb{Q}(\zeta_p)/\mathbb{Q})\cong \left(\mathbb{Z}/p\mathbb{Z}\right)^\times, where p is a prime number such that p\equiv 1\mod 2\cdot ord(\sigma).

    1) Choose H\subseteq G, where H is the subgroup of G with order {\frac{p-1}{ord(\sigma)}}.

    2) Calculate

    \alpha = \sum\limits_{\lambda\in H}\lambda(\zeta_p).

    3) Find minimal polynomial m_\alpha(x) of \alpha over \mathbb{Q}.

    4) Construct the splitting field \mathbb{F} of m_\alpha(x) by using Algorithm A.1.

    5) Then, \mathbb{F} = \mathbb{Q}(\zeta_p)^H.

    In this section, we describe a way to construct an invariant GRS code under a given permutation in S_n. We call this GRS code the GRS generalized quasi-cyclic (GQC) code. Let \sigma = \sigma_1, \sigma_2, \cdots, \sigma_t be a permutation in S_n, where \sigma_1, \sigma_2, \dots, \sigma_t are disjoint cycles. Also, let G = \langle\sigma\rangle be a cyclic group generated by \sigma.

    Theorem 4.1. If \sigma is a permutation in S_n, then there exists a GQC GRS_{n, k}(\overline{\alpha}, \mathbf{b}) over \mathbb{F}, with its corresponding permutation being \sigma for some totally real number field \mathbb{F}.

    Proof. We can find the number field \mathbb{F} and its corresponding minimal polynomial m_\alpha(x) with Gal(\mathbb{F}/\mathbb{Q})\cong \langle \sigma\rangle by using Algorithm 3.2. Since Gal(\mathbb{F}/\mathbb{Q})\cong \langle \sigma\rangle, there exists \gamma\in Gal(\mathbb{F}/\mathbb{Q}) to be associated with \sigma\in \langle\sigma\rangle. Let L = \{\alpha_1, \dots, \alpha_n\} be the roots of m_\alpha(x) and some additional elements from linear combinations of the roots. We can see that \gamma is a permutation on L, i.e., \gamma(L) = L. Note that the orbit of L under H can be used to rearrange the elements of L such that

    \begin{equation} \gamma(\alpha_i) = \alpha_{\sigma(i)}, \end{equation} (4.1)

    for all i = 1, 2, \dots, n. Let \overline{\alpha} = (\alpha_1, \alpha_2, \dots, \alpha_n) and \mathbf{b} = (b_1, b_2, \dots, b_n) be an n -tuple of non-zero elements in \mathbb{F}. Define a Cauchy code C_k(\overline{\alpha}, \mathbf{c}), where \mathbf{c} = (c_1, c_2, \dots, c_n), with

    \begin{equation} c_i = \left\{ \begin{array}{ll} b_i\prod_{t = 1,t\not = i}^k [\alpha_i,\alpha_t], & \text{if}\;1\leq i\leq k; \\ b_i\prod_{t = 1}^k [\alpha_i,\alpha_t], & \text{if}\;k+1\leq i\leq n. \end{array} \right. \end{equation} (4.2)

    Then, by Proposition 2.3, C_k(\overline{\alpha}, \mathbf{c}) is a GRS_{n, k}(\overline{\alpha}, \mathbf{b}) code. Moreover, according to Theorem 2.4 and Eq (4.1), \omega(\gamma) = \sigma is an element in Per\left(C_k(\overline{\alpha}, \mathbf{c})\right).

    Consider the following example.

    Example 4.2. Let \sigma = (1, 2, 3, 4)(5, 6) in S_6. We would like to construct a GRS code of length 6 over a totally real number field that is invariant under the action of \sigma. We can see that ord(\sigma) = 4 and \langle\sigma\rangle = \mathbb{Z}/4\mathbb{Z}. Choose p = 17 so that p\equiv 1\mod 2\times 4. The corresponding subgroup H of Gal(\mathbb{Q}(\zeta_{17})/\mathbb{Q}) will have the order equal to 4. Since the unique subgroup of \left(\mathbb{Z}/17\mathbb{Z}\right)^\times with order 4 is \{1, 4, 13, 16\}, we have

    H = \{\lambda_k|k = 1,4,13,16\},

    where \lambda_k:\zeta_{17}\mapsto \zeta_{17}^k. Then, we have

    \alpha = \sum\limits_{\lambda\in H}\lambda(\zeta_{17}) = \zeta_{17}+\zeta_{17}^{4}+\zeta_{17}^{13}+\zeta_{17}^{16}.

    From [5, Example 3.3.3], the minimal polynomial of \alpha is as follows:

    m_\alpha(x) = x^4+x^3-6x^2-x+1.

    The roots of m_\alpha(x) given by

    r_1 = {\frac{1}{4}\left(-1-\sqrt{17}-\sqrt{34+\sqrt{17}}\right)},\quad r_2 = {\frac{1}{4}\left(-1-\sqrt{17}+\sqrt{34+\sqrt{17}}\right)},
    r_3 = {\frac{1}{4}\left(-1+\sqrt{17}-\sqrt{34-\sqrt{17}}\right)}.,\quad r_4 = {\frac{1}{4}\left(-1+\sqrt{17}+\sqrt{34-\sqrt{17}}\right)}.

    Let \gamma be a map such that

    r_1\mapsto r_2, \quad r_2\mapsto r_3,\quad r_3\mapsto r_4,\quad r_4\mapsto r_1.

    We can see that \langle \gamma\rangle = Gal(\mathbb{Q}(\zeta_{17})^H/\mathbb{Q})\cong \mathbb{Z}/4\mathbb{Z}.

    Choose L = \{\alpha_1, \dots, \alpha_6\}, where \alpha_i = r_i for i = 1, 2, 3, 4, \; \alpha_5 = r_1+r_3, and \alpha_6 = r_2+r_4. We can check that

    \gamma(\alpha_i) = \alpha_{\sigma(i)},

    for all i = 1, \dots, 6. Take \overline{\alpha} = (\alpha_1, \dots, \alpha_6) , any n -tuple of non-zero elements \mathbf{b} (from the set of linear combinations of roots of m_\alpha(x) ), and \mathbf{c} = (c_1, \dots, c_6), where c_i is as in Eq 4.2. We have that C_k(\overline{\alpha}, \mathbf{c}) is a GQC GRS code with corresponding permutation \sigma.

    In Section 4, we described the construction of GRS code, which is invariant under the action of a given permutation in S_n. Moreover, the alphabet for the corresponding codes is a totally real number field, not a complex number field. This feature can be useful for bandwidth reduction in exact gradient coding schemes.

    Algorithm 1 describes the process of gradient coding. The algorithm is a slight modification of [11, Algorithm 1].

    Algorithm 1 Gradient coding
    Input:
      Data \mathcal{S} = \left\{z_i = (x_i, y_i)\right\}_{i = 1}^m, number of iterations t > 0, learning rate \{\eta\}_{r = 1}^t,
      straggler tolerance parameter \{s_r\}_{r = 1}^t, a matrix \mathbf{B}\in\mathbb{C}^{n\times n}, a function
       \Lambda:\mathcal{P}(n)\rightarrow \mathbb{C}^n, a vector of non-zero elements \overline{\beta} = (\beta_1, \dots, \beta_n)\in\mathbb{C}^n
    Initialize:
       \mathbf{w}^{(1)} \gets (0, 0, \dots, 0)
    Partition \mathcal{S} = \bigcup_{i = 1}^n\mathcal{S}_i and send \{\mathcal{S}_j|j\in supp(\mathbf{b}_i)\} to W_i for every i\in [n]
    for r = 1 to t do
       M broadcasts \mathbf{w}^{(r)} to all nodes
      Each W_j sends \sum_{i\in supp(\mathbf{b}_j)}b_{j, i}\frac{\nabla L_{\mathcal{S}_i}(\mathbf{w}^{(r)})}{\beta_i} to M
       M waits until at least n-s_r nodes have responded
       M computes \mathbf{v}_r = \Lambda\left(\mathcal{K}_r\right)\cdot\mathbf{C}, where the i -th row of \mathbf{C} is \frac{1}{n} times the response from W_i if it has responded, and 0 otherwise; also, \mathcal{K}_r is the set of non-stragglers in the current iteration r
       M updates \mathbf{w}^{(r+1)}\gets \mathbf{w}^{(r)}-\eta_r\mathbf{v}_r
    end for
    return \frac{1}{t}\sum_{r = 1}^t\mathbf{w}^{(r+1)}

    Algorithm 1 works in the following way. In order to execute the gradient descent process, the master node M distributes a particular partition of the training set \mathcal{S} to all worker nodes W_j, where j = 1, \dots, n. In the r -th iteration of the gradient descent process, the master M broadcasts the parameter \mathbf{w}^{(r)} to all worker nodes. Using the received parameter \mathbf{w}^{(r)}, the worker node W_j calculates the partial gradient \nabla L_{\mathcal{S}_i}(\mathbf{w}^{(r)}) and sends its linear combination \sum_{i\in supp(\mathbf{b}_j)}b_{j, i}\frac{\nabla L_{\mathcal{S}_i}(\mathbf{w}^{(r)})}{\beta_i} to M. The linear combination is chosen from the entries b_{j, i} of a particular matrix \mathbf{B}. In this work, \mathbf{B} is constructed by using GRS codes which are invariant under the action of a particular permutation. After M has received the linear combinations of partial gradients from some number of worker nodes, M updates the parameter \mathbf{w} by using the decoding vector \Lambda\left(\mathcal{K}_r\right), \; \mathbf{w}^{(r)}, and some other additional vectors (mentioned in the algorithm). Note that we will see later that the decoding vector \Lambda\left(\mathcal{K}_r\right) can be computed by using Algorithm 2[11, Algorithm 2].

    Definition 5.1. A matrix \mathbf{B}\in\mathbb{C}^{n\times n} and a function \Lambda:\mathcal{P}(n)\rightarrow \mathbb{C}^n satisfy the exact computation (EC) condition with respect to \overline{\beta}\in\mathbb{C}^n, where \overline{\beta} is an n -tuple of non-zero elements in \mathbb{C}^n if, for all \mathcal{K}\subseteq [n] such that |\mathcal{K}|\geq \max_{r\in [t]}s_r, we have that \Lambda(\mathcal{K})\cdot \mathbf{B} = \overline{\beta}.

    Note that Definition 5.1 is a slight modification of [11, Definition 2]. Let \overline{\beta} = (\beta_1, \dots, \beta_n) be an n -tuple of non-zero elements of \mathbb{C}^n and

    \mathbf{N}_{\overline{\beta}}(\mathbf{w}) = \frac{1}{n}\left( \begin{array}{c} \frac{\nabla L_{\mathcal{S}_1}(\mathbf{w})}{\beta_1}\\ \frac{\nabla L_{\mathcal{S}_2}(\mathbf{w})}{\beta_2}\\ \vdots\\ \frac{\nabla L_{\mathcal{S}_n}(\mathbf{w})}{\beta_n}\\ \end{array} \right).

    Lemma 5.2. If \Lambda and \mathbf{B} satisfy the EC condition with respect to \overline{\beta}, then, for all r\in [t], we have that \mathbf{v}_r = \nabla L_{\mathcal{S}}(\mathbf{w}^{(r)}).

    Proof. Given r\in [t], let \mathbf{B}' be the matrix whose i -th row \mathbf{b}_i' equals to \mathbf{b}_i if i\in\mathcal{K}_r, and \mathbf{0} otherwise. The matrix \mathbf{C} in Algorithm 1 can be written as \mathbf{C} = \mathbf{B}'\cdot \mathbf{N}_{\overline{\beta}}(\mathbf{w}^{(r)}). Since supp\left(\Lambda\left(\mathcal{K}_r\right)\right)\subseteq \mathcal{K}_r, we have that \Lambda\left(\mathcal{K}_r\right)\cdot \mathbf{B}' = \Lambda\left(\mathcal{K}_r\right)\cdot \mathbf{B}. Therefore, we have

    \begin{array}{lll} \mathbf{v}_r & = & \Lambda\left(\mathcal{K}_r\right)\cdot \mathbf{C} \\ & = & \Lambda\left(\mathcal{K}_r\right)\cdot \mathbf{B}\cdot \mathbf{N}_{\mathbf{\beta}}(\mathbf{w}^{(r)})\\ & = & \mathbf{\beta}\cdot \mathbf{N}_{\mathbf{\beta}}(\mathbf{w}^{(r)})\\ & = & \frac{1}{n}\sum_{i = 1}^n\nabla L_{\mathcal{S}_i}\left(\mathbf{w}^{(r)}\right)\\ & = & \frac{1}{n}\sum_{i = 1}^n\frac{1}{m/n}\sum_{z\in\mathcal{S}_i}\nabla l\left(\mathbf{w}^{(r)},z\right)\\ & = & \frac{1}{m}\sum_{z\in\mathcal{S}}\nabla l\left(\mathbf{w}^{(r)},z\right)\\ & = & \nabla L_{\mathcal{S}}\left(\mathbf{w}^{(r)}\right). \end{array}

    For a given n and s, let C = GRS_{n, n-s}(\overline{\alpha}, \overline{\beta}) GQC code over a number field \mathbf{F} with corresponding permutation \pi of order n. Clearly, the vector \overline{\beta} is in C. Moreover, by [11, Lemma 8], there exists a codeword \mathbf{c}_1 in C whose support is \{1, 2, \dots, s+1\}. Let \mathbf{c}_i = \pi^{i-1}(\mathbf{c}_1) for i = 2, \dots, n and \mathbf{B} = \left(\mathbf{c}_1^T, \mathbf{c}_2^T, \dots, \mathbf{c}_n^T\right).

    Theorem 5.3. The matrix \mathbf{B} satisfies the following properties:

    a) Each row of \mathbf{B} is a codeword in \sigma(C), where \sigma is a permutation such that

    \begin{equation} \sigma^{-1} = \left( \begin{array}{ccccccc} 1 & 2 & 3 & \cdots & i & \cdots & n \\ n & \pi^{n-1}(n) & \pi^{n-2}(n) & \cdots & \pi^{n-(i-1)}(n) & \cdots & \pi(n) \end{array} \right). \end{equation} (5.1)

    b) w_H(\mathbf{b}) = s+1 for each row \mathbf{b} in \mathbf{B}.

    c) The column span of \mathbf{B} is the code C.

    d) Every set of n-s rows of \mathbf{B} are linearly independent over \mathbb{F}.

    Proof. (a) Let \mathbf{c}_1 = (c_1, \dots, c_n). Notice that the i -th row of \mathbf{B} is as follows:

    \left(c_i,c_{\pi^{n-1}(i)},c_{\pi^{n-2}(i)},\dots,c_{\pi(i)}\right).

    Since ord(\pi) = n, the i -th row of \mathbf{B} is a permutation of \mathbf{c}_1 for all i = 1, \dots, n. Moreover, by considering the last row of \mathbf{B}, we can see that all rows of \mathbf{B} constitute a codeword in \sigma(C), where \sigma is the permutation as in Eq (5.1).

    (b) By part (a), we have that the Hamming weight of every row of \mathbf{B} is $ s+1.

    (c) Let \sigma = \left(1, 2, \dots, n\right) be a cyclic permutation and G_1 be a cyclic group generated by \sigma. Also, let G_2 be a cyclic group generated by \pi. Define \overline{S}_1 = span(G_1\mathbf{c}_1) and \overline{S}_2 = span(G_2\mathbf{c}_1), where G\mathbf{c}_1 = \{\lambda(\mathbf{c}_1)|\lambda\in G\}. Since ord(\sigma) = ord(\pi) = n, we have that G_1\cong G_2 by the following group isomorphism:

    \begin{array}{llll} \tau: & G_1 & \rightarrow & G_2 \\ & \sigma^i & \mapsto & \pi^i. \end{array}

    Define the following map:

    \begin{array}{llll} \overline{\tau}: & \overline{S}_1 & \rightarrow & \overline{S}_2 \\ & \sum_{i = 1}^{n}\alpha_i\sigma^{i}(\mathbf{c}_1) & \mapsto & \sum_{i = 1}^n\alpha_i\pi^{i}(\mathbf{c}_1). \end{array}

    The map \overline{\tau} is a linear map. Since it is induced by \tau, \; \overline{\tau} is a bijective map. So, \overline{S}_1\cong \overline{S}_2. By [11, Lemma 12 B3], \overline{S}_1 = C. Since \overline{S}_2\subseteq C and dim\; \overline{S}_2 = n-s, we have that \overline{S}_2 = C.

    (d) Similar to [11, Lemma 12 B4].

    Let \mathbf{G} be the canonical generator for the C = GRS_{n, n-s}(\overline{\alpha}, \overline{\beta}) GQC code, as in Eq (2.1). By Theorem 2.1(b), the canonical generator for the dual code C^\perp is \mathbf{G}^\perp = \mathbf{G}\cdot \mathbf{D}, where \mathbf{D} = diag(u_1, \dots, u_n), with

    {u_i = \frac{1}{\beta_i^2\prod_{j\not = i}\left(\alpha_i-\alpha_j\right)}}

    for all i = 1, \dots, n. Using this setting, Algorithm 2[11, Algorithm 2] can be used to compute the decoding vector a\left(\mathcal{K}\right).

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was funded by Hibah PPMI KK Aljabar Institut Teknologi Bandung 2023.

    The authors declare no conflict of interest.

    The following algorithm provides a way to construct the unique splitting field of a given polynomial f(x) in \mathbb{Q}[x].

    Algorithm A.1. Given a polynomial f(x) in \mathbb{Q}[x], we will construct the splitting field L of f(x) based on the construction of a chain of number fields:

    K_0 = \mathbb{Q}\subset K_1\subset K_2\subset\cdots\subset K_{s-1}\subset K_s = L

    such that K_i is an extension of K_{i-1} containing a new root of f(x).

    1) Factorize f(x) over K_i into irreducible factors f_1(x)f_2(x)\cdots f_t(x).

    2) Choose any non linear irreducible factor g(x) = f_j(x) for some j\in\{1, \dots, t\}.

    3) Construct the field extension K_{i+1} = \frac{K_i[x]}{\langle g(x)\rangle}.

    4) Repeat the process for K_{i+1} until f(x) completely factors.

    The following algorithm can be used to compute the decoding vector in the exact gradient coding scheme [11, Algorithm 2].

    Algorithm 2 Computing decoding vector \Lambda\left(\mathcal{K}\right)
    Data: any vector \mathbf{x}'\in\mathbb{C}^n such that \mathbf{x}'\mathbf{B} = \mathbf{\beta}
    Input:
      A set \mathcal{K}\subseteq [n] of n-s non-stragglers
    Output: a vector \Lambda\left(\mathcal{K}\right) such that supp\left(\Lambda\left(\mathcal{K}\right)\right)\subseteq \mathcal{K} and \Lambda\left(\mathcal{K}\right)\mathbf{B} = \mathbf{\beta}
    find \mathbf{f}\in\mathbb{C}^s such that \mathbf{f}\mathbf{G}_{\mathcal{K}^c} = -\mathbf{x}'_{\mathcal{K}^c}\mathbf{D}_{\mathcal{K}^c}^{-1}
    \mathbf{y}\gets \mathbf{f}\mathbf{G}\mathbf{D}
    return \Lambda\left(\mathcal{K}\right)\gets \mathbf{y}+\mathbf{x}'



    [1] I. T. Jolliffe, J. Cadima, in Principal component analysis: a review and recent developments, Philosophical transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374 (2016), 20150202. https://doi.org/10.1098/rsta.2015.0202
    [2] R. O. Duda, P. E. Hart, Pattern Classification, John Wiley & Sons, 2006.
    [3] A. Gersho, R. M. Gray, Vector Quantization and Signal Compression, Springer Science & Business Media, 2012.
    [4] D. Seung, L. Lee, Algorithms for non-negative matrix factorization, Adv. Neural Inf. Process. Syst., 13 (2001), 556–562.
    [5] D. Li, S. Zhang, X. Ma, Dynamic module detection in temporal attributed networks of cancers, IEEE/ACM Trans. Comput. Biol. Bioinf., 19 (2021), 2219–2230. https://doi.org/10.1109/TCBB.2021.3069441 doi: 10.1109/TCBB.2021.3069441
    [6] Z. Zhao, Z. Ke, Z. Gou, H. Guo, K. Jiang, R. Zhang, The trade-off between topology and content in community detection: An adaptive encoder–decoder-based nmf approach, Expert Syst. Appl., 209 (2022), 118230. https://doi.org/10.1016/j.eswa.2022.118230 doi: 10.1016/j.eswa.2022.118230
    [7] N. Yu, M. J. Wu, J. X. Liu, C. H. Zheng, Y. Xu, Correntropy-based hypergraph regularized nmf for clustering and feature selection on multi-cancer integrated data, IEEE Trans. Cybern., 51 (2020), 3952–3963. https://doi.org/10.1109/TCYB.2020.3000799 doi: 10.1109/TCYB.2020.3000799
    [8] N. Yu, Y. L. Gao, J. X. Liu, J. Wang, J. Shang, Hypergraph regularized nmf by l 2, 1-norm for clustering and com-abnormal expression genes selection, in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2018), 578–582.
    [9] M. Venkatasubramanian, K. Chetal, D. J. Schnell, G. Atluri, N. Salomonis, Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and nmf, Bioinformatics, 36 (2020), 3773–3780. https://doi.org/10.1093/bioinformatics/btaa201 doi: 10.1093/bioinformatics/btaa201
    [10] W. Wu, X. Ma, Network-based structural learning nonnegative matrix factorization algorithm for clustering of scrna-seq data, IEEE/ACM Trans. Comput. Biol. Bioinf., 20 (2022), 566–575. https://doi.org/10.1038/s41579-022-00790-1 doi: 10.1038/s41579-022-00790-1
    [11] R. Egger, J. Yu, A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts, Front. Soc., 7 (2022).
    [12] H. Che, J. Wang, Nonnegative matrix factorization algorithm based on a discrete-time projection neural network, Neural Networks, 103 (2018), 63–71. https://doi.org/10.1016/j.neunet.2018.03.003 doi: 10.1016/j.neunet.2018.03.003
    [13] H. Che, J. Wang, A. Cichocki, Bicriteria sparse nonnegative matrix factorization via two-timescale duplex neurodynamic optimization, IEEE Trans. Neural Networks Learn. Syst., 2021 (2021).
    [14] H. Che, J. Wang, A two-timescale duplex neurodynamic approach to mixed-integer optimization, IEEE Trans. Neural Networks Learn. Syst., 32 (2020), 36–48. https://doi.org/10.1109/TNNLS.2020.2973760 doi: 10.1109/TNNLS.2020.2973760
    [15] X. Ma, W. Zhao, W. Wu, Layer-specific modules detection in cancer multi-layer networks, IEEE/ACM Trans. Comput. Biol. Bioinf., 2022 (2022).
    [16] S. Wang, A. Huang, Penalized nonnegative matrix tri-factorization for co-clustering, Expert Syst. Appl., 78 (2017), 64–73.
    [17] F. Shang, L. Jiao, F. Wang, Graph dual regularization non-negative matrix factorization for co-clustering, Pattern Recognit., 45 (2012), 2237–2250. https://doi.org/10.1016/j.patcog.2011.12.015 doi: 10.1016/j.patcog.2011.12.015
    [18] C. Ding, T. Li, W. Peng, H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2006), 126–135.
    [19] J. Li, H. Che, X. Liu, Circuit design and analysis of smoothed l_0 norm approximation for sparse signal reconstruction, Circuits Syst. Signal Process., (2022), 1–25.
    [20] X. Ju, H. Che, C. Li, X. He, Solving mixed variational inequalities via a proximal neurodynamic network with applications, Neural Process. Lett., 54 (2022), 207–226. https://doi.org/10.1007/s11063-021-10628-1 doi: 10.1007/s11063-021-10628-1
    [21] H. Che, J. Wang, A collaborative neurodynamic approach to global and combinatorial optimization, Neural Networks, 114 (2019), 15–27. https://doi.org/10.1016/j.neunet.2019.02.002 doi: 10.1016/j.neunet.2019.02.002
    [22] X. Ju, H. Che, C. Li, X. He, G. Feng, Exponential convergence of a proximal projection neural network for mixed variational inequalities and applications, Neurocomputing, 454 (2021), 54–64. https://doi.org/10.1016/j.neucom.2021.04.059 doi: 10.1016/j.neucom.2021.04.059
    [23] C. Dai, H. Che, M.-F. Leung, A neurodynamic optimization approach for l 1 minimization with application to compressed image reconstruction, Int. J. Artif. Intell. Tools, 30 (2021), 2140007. https://doi.org/10.1142/S0218213021400078 doi: 10.1142/S0218213021400078
    [24] H. Che, J. Wang, A. Cichocki, Sparse signal reconstruction via collaborative neurodynamic optimization, Neural Networks, 154 (2022), 255–269. https://doi.org/10.1016/j.neunet.2022.07.018 doi: 10.1016/j.neunet.2022.07.018
    [25] H. Che, J. Wang, A. Cichocki, Neurodynamics-based iteratively reweighted convex optimization for sparse signal reconstruction, in 2022 12th International Conference on Information Science and Technology (ICIST), IEEE, (2022), 45–51.
    [26] Y. Wang, J. Wang, H. Che, Two-timescale neurodynamic approaches to supervised feature selection based on alternative problem formulations, Neural Networks, 142 (2021), 180–191. https://doi.org/10.1016/j.neunet.2021.04.038 doi: 10.1016/j.neunet.2021.04.038
    [27] X. Ju, C. Li, H. Che, X. He, G. Feng, A proximal neurodynamic network with fixed-time convergence for equilibrium problems and its applications, IEEE Trans. Neural Networks Learn. Syst., 2022 (2022).
    [28] F. Shang, L. Jiao, J. Shi, J. Chai, Robust positive semidefinite l-isomap ensemble, Pattern Recognit. Lett., 32 (2011), 640–649. https://doi.org/10.1016/j.patrec.2010.12.005 doi: 10.1016/j.patrec.2010.12.005
    [29] M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples., J. Mach. Learn. Res., 7 (2006).
    [30] K. Chen, H. Che, X. Li, M. F. Leung, Graph non-negative matrix factorization with alternative smoothed l_0 regularizations, Neural Comput. Appl., 2022 (2022), 1–15.
    [31] X. Yang, H. Che, M. F. Leung, C. Liu, Adaptive graph nonnegative matrix factorization with the self-paced regularization, Appl. Intell., 2022 (2022), 1–18.
    [32] Z. Huang, Y. Wang, X. Ma, Clustering of cancer attributed networks by dynamically and jointly factorizing multi-layer graphs, IEEE/ACM Trans. Comput. Biol. Bioinf., 19 (2021), 2737–2748. https://doi.org/10.1137/19M1301746 doi: 10.1137/19M1301746
    [33] J. B. Tenenbaum, V. d. Silva, J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319–2323. https://doi.org/10.1126/science.290.5500.2319 doi: 10.1126/science.290.5500.2319
    [34] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, Adv. Neural Inf. Process. Syst., 14 (2001).
    [35] D. Cai, X. He, J. Han, T. S. Huang, Graph regularized nonnegative matrix factorization for data representation, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2010), 1548–1560.
    [36] D. Zhou, J. Huang, B. Schölkopf, Learning with hypergraphs: Clustering, classification, and embedding, Adv. Neural Inf. Process. Syst., 19 (2006).
    [37] J. Yu, D. Tao, M. Wang, Adaptive hypergraph learning and its application in image classification, IEEE Trans. Image Process., 21 (2012), 3262–3272.
    [38] P. Zhou, X. Wang, L. Du, X. Li, Clustering ensemble via structured hypergraph learning, Inf. Fusion, 78 (2022), 171–179. https://doi.org/10.1016/j.inffus.2021.09.003 doi: 10.1016/j.inffus.2021.09.003
    [39] L. Xia, C. Huang, Y. Xu, J. Zhao, D. Yin, J. Huang, Hypergraph contrastive collaborative filtering, in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2022), 70–79.
    [40] Y. Feng, H. You, Z. Zhang, R. Ji, Y. Gao, Hypergraph neural networks, in Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 3558–3565. https://doi.org/10.1609/aaai.v33i01.33013558
    [41] J. Jiang, Y. Wei, Y. Feng, J. Cao, Y. Gao, Dynamic hypergraph neural networks, in International Joint Conference on Artificial Intelligence, (2019), 2635–2641.
    [42] X. Liao, Y. Xu, H. Ling, Hypergraph neural networks for hypergraph matching, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 1266–1275.
    [43] K. Zeng, J. Yu, C. Li, J. You, T. Jin, Image clustering by hyper-graph regularized non-negative matrix factorization, Neurocomputing, 138 (2014), 209–217. https://doi.org/10.1016/j.neucom.2014.01.043 doi: 10.1016/j.neucom.2014.01.043
    [44] L. Du, X. Li, Y. D. Shen, Robust nonnegative matrix factorization via half-quadratic minimization, in 2012 IEEE 12th International Conference on Data Mining, (2012), 201–210.
    [45] D. Kong, C. Ding, H. Huang, Robust nonnegative matrix factorization using l21-norm, in Proceedings of the 20th ACM International Conference on Information and Knowledge Management, (2011), 673–682. https://doi.org/10.3917/ag.682.0673
    [46] H. Gao, F. Nie, W. Cai, H. Huang, Robust capped norm nonnegative matrix factorization: Capped norm nmf, in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015 2015,871–880.
    [47] Z. Li, J. Tang, X. He, Robust structured nonnegative matrix factorization for image representation, IEEE Trans. Neural Networks Learn. Syst., 29 (2017), 1947–1960. https://doi.org/10.1109/TNNLS.2017.2691725 doi: 10.1109/TNNLS.2017.2691725
    [48] N. Guan, T. Liu, Y. Zhang, D. Tao, L. S. Davis, Truncated cauchy non-negative matrix factorization, IEEE Trans. Pattern Anal. Mach. Intell., 41 (2017), 246–259.
    [49] N. Guan, D. Tao, Z. Luo, J. Shawe-Taylor, Mahnmf: Manhattan non-negative matrix factorization, Statics, 1050 (2012), 14.
    [50] S. Peng, W. Ser, B. Chen, Z. Lin, Robust orthogonal nonnegative matrix tri-factorization for data representation, Knowl. Based Syst., 201 (2020), 106054. https://doi.org/10.1016/j.knosys.2020.106054 doi: 10.1016/j.knosys.2020.106054
    [51] C. Y. Wang, N. Yu, M. J. Wu, Y. L. Gao, J. X. Liu, J. Wang, Dual hyper-graph regularized supervised nmf for selecting differentially expressed genes and tumor classification, IEEE/ACM Trans. Comput. Biol. Bioinf., 18 (2020), 2375–2383. https://doi.org/10.1109/TCBB.2020.2975173 doi: 10.1109/TCBB.2020.2975173
    [52] L. Lovász, M. D. Plummer, Matching theory, American Mathematical Society, 2009. https://doi.org/10.1090/chel/367
    [53] X. Gao, X. Ma, W. Zhang, J. Huang, H. Li, Y. Li, J. Cui, Multi-view clustering with self-representation and structural constraint, IEEE Trans. Big Data, 8 (2021), 882–893. https://doi.org/10.1109/TBDATA.2021.3128906 doi: 10.1109/TBDATA.2021.3128906
    [54] C. Liu, W. Cao, S. Wu, W. Shen, D. Jiang, Z. Yu, H.-S. Wong, Supervised graph clustering for cancer subtyping based on survival analysis and integration of multi-omic tumor data, IEEE/ACM Trans. Comput. Biol. Bioinf., 19 (2020), 1193–1202.
    [55] C. Liu, S. Wu, R. Li, D. Jiang, H. S. Wong, Self-supervised graph completion for incomplete multi-view clustering, IEEE Trans. Knowl. Data Eng., 2023 (2023), forthcoming.
    [56] C. Liu, R. Li, S. Wu, H. Che, D. Jiang, Z. Yu, H.-S. Wong, Self-guided partial graph propagation for incomplete multiview clustering, IEEE Trans. Neural Networks Learn. Syst., 2023 (2023).
    [57] C. Li, H. Che, M. F. Leung, C. Liu, Z. Yan, Robust multi-view non-negative matrix factorization with adaptive graph and diversity constraints, Inf. Sci., 2023 (2023).
    [58] B. Pan, C. Li, H. Che, Nonconvex low-rank tensor approximation with graph and consistent regularizations for multi-view subspace learning, Neural Networks, 161 (2023), 638–658.
    [59] S. Wang, Z. Chen, S. Du, Z. Lin, Learning deep sparse regularizers with applications to multi-view clustering and semi-supervised classification, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 5042–5055.
    [60] S. Du, Z. Liu, Z. Chen, W. Yang, S. Wang, Differentiable bi-sparse multi-view co-clustering, IEEE Trans. Signal Process., 69 (2021), 4623–4636. https://doi.org/10.1109/TSP.2021.3101979 doi: 10.1109/TSP.2021.3101979
    [61] S. Wang, X. Lin, Z. Fang, S. Du, G. Xiao, Contrastive consensus graph learning for multi-view clustering, IEEE/CAA J. Autom. Sin., 9 (2022), 2027–2030. https://doi.org/10.1109/JAS.2022.105959 doi: 10.1109/JAS.2022.105959
    [62] Z. Fang, S. Du, X. Lin, J. Yang, S. Wang, Y. Shi, Dbo-net: Differentiable bi-level optimization network for multi-view clustering, Inf. Sci., 2023 (2023).
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2583) PDF downloads(112) Cited by(1)

Figures and Tables

Figures(8)  /  Tables(7)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog