Food volume estimation by multi-layer superpixel

Xin Zheng; Chenhan Liu; Yifei Gong; Qian Yin; Wenyan Jia; Mingui Sun; Xin Zheng; Chenhan Liu; Yifei Gong; Qian Yin; Wenyan Jia; Mingui Sun

doi:10.3934/mbe.2023271

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 4: 6294-6311. doi: 10.3934/mbe.2023271

Previous Article Next Article

Research article Special Issues

Food volume estimation by multi-layer superpixel

1.
School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China
2.
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
3.
Beijing Sankuai Online Technology Co., Ltd., Beijing 100190, China
4.
Department of Neurosurgery, University of Pittsburgh, PA 15260, USA
5.
Department of Electrical and Computer Engineering, University of Pittsburgh, PA 15260, USA

Academic Editor: Hamid Reza Karimi

Received: 30 August 2022 Revised: 10 December 2022 Accepted: 09 January 2023 Published: 31 January 2023

Estimating the volume of food plays an important role in diet monitoring. However, it is difficult to perform this estimation automatically and accurately. A new method based on the multi-layer superpixel technique is proposed in this paper to avoid tedious human-computer interaction and improve estimation accuracy. Our method includes the following steps: 1) obtain a pair of food images along with the depth information using a stereo camera; 2) reconstruct the plate plane from the disparity map; 3) warp the input image and the disparity map to form a new direction of view parallel to the plate plane; 4) cut the warped image into a series of slices according to the depth information and estimate the occluded part of the food; and 5) rescale superpixels for each slice and estimate the food volume by accumulating all available slices in the segmented food region. Through a combination of image data and disparity map, the influences of noise and visual error in existing interactive food volume estimation methods are reduced, and the estimation accuracy is improved. Our experiments show that our method is effective, accurate and convenient, providing a new tool for promoting a balanced diet and maintaining health.

Keywords:

Citation: Xin Zheng, Chenhan Liu, Yifei Gong, Qian Yin, Wenyan Jia, Mingui Sun. Food volume estimation by multi-layer superpixel[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6294-6311. doi: 10.3934/mbe.2023271

Related Papers:

[1]	Huiling Wang, Nian-Ci Wu, Yufeng Nie . Two accelerated gradient-based iteration methods for solving the Sylvester matrix equation AX + XB = C. AIMS Mathematics, 2024, 9(12): 34734-34752. doi: 10.3934/math.20241654
[2]	Zhensheng Yu, Peixin Li . An active set quasi-Newton method with projection step for monotone nonlinear equations. AIMS Mathematics, 2021, 6(4): 3606-3623. doi: 10.3934/math.2021215
[3]	Nunthakarn Boonruangkan, Pattrawut Chansangiam . Convergence analysis of a gradient iterative algorithm with optimal convergence factor for a generalized Sylvester-transpose matrix equation. AIMS Mathematics, 2021, 6(8): 8477-8496. doi: 10.3934/math.2021492
[4]	Changzhou Li, Chao Yuan, Shiliang Chen . On positive definite solutions of the matrix equation $X-\sum_{i = 1}^{m}A_{i}^{\ast}X^{-p_{i}}A_{i} = Q$ . AIMS Mathematics, 2024, 9(9): 25532-25544. doi: 10.3934/math.20241247
[5]	Huiling Wang, Zhaolu Tian, Yufeng Nie . The HSS splitting hierarchical identification algorithms for solving the Sylvester matrix equation. AIMS Mathematics, 2025, 10(6): 13476-13497. doi: 10.3934/math.2025605
[6]	Junaid Ahmad, Kifayat Ullah, Reny George . Numerical algorithms for solutions of nonlinear problems in some distance spaces. AIMS Mathematics, 2023, 8(4): 8460-8477. doi: 10.3934/math.2023426
[7]	Chanchal Garodia, Izhar Uddin . A new fixed point algorithm for finding the solution of a delay differential equation. AIMS Mathematics, 2020, 5(4): 3182-3200. doi: 10.3934/math.2020205
[8]	Lale Cona . Convergence and data dependence results of the nonlinear Volterra integral equation by the Picard's three step iteration. AIMS Mathematics, 2024, 9(7): 18048-18063. doi: 10.3934/math.2024880
[9]	Xiaopeng Yi, Chongyang Liu, Huey Tyng Cheong, Kok Lay Teo, Song Wang . A third-order numerical method for solving fractional ordinary differential equations. AIMS Mathematics, 2024, 9(8): 21125-21143. doi: 10.3934/math.20241026
[10]	Wenxiu Guo, Xiaoping Lu, Hua Zheng . A two-step iteration method for solving vertical nonlinear complementarity problems. AIMS Mathematics, 2024, 9(6): 14358-14375. doi: 10.3934/math.2024698

Abstract

1. Introduction

The Sylvester equation

$\begin{eqnarray} AX+XB = C \end{eqnarray}$

(1.1)

appears frequently in many areas of applied mathematics. We refer readers to the elegant survey by Bhatia and Rosenthal ^[1] and the references therein for the history of the Sylvester equation and many interesting and important theoretical results. The Sylvester equation is important in a number of applications such as matrix eigenvalue decompositions ^[2,3], control theory ^[3,4,5], model reduction ^[6,7,8,9], physics mathematics to construct exact solutions of nonlinear integrable equations ^[10], feature problemss of slice semi-regular functions ^[11] and the numerical solution of the matrix differential Riccati equations ^[12,13,14]. There are several numerical algorithms to compute the solution of the Sylvester equation. The standard ones are the Bartels Stewart algorithm ^[15] and the Hessenberg Schur method first described by Enright ^[14], but more often attributed to Golub, Nash and Van Loan ^[16]. Other computationally efficient approaches for the case that both $A$ and $B$ are stable, i.e., both $A$ and $B$ have all their eigenvalues in the open left half plane, are the sign function method ^[17], Smith method ^[18] and ADI iteration methods ^{[19,20,21,22]}. All these methods are efficient for the small size of the dense matrices $A$ and $B$ .

The recent interest is directed more towards the large and sparse matrices $A$ and $B$ , and $C$ with low rank. For the dense $A$ and $B$ , the approach based on the sign function method is suggested in ^[23] that exploits the low rank structure of $C$ . This approach is further used in ^[24] in order to solve the large scale Sylvester equation with sparse $A$ and $B$ , i.e., the matrices $A$ and $B$ can be represented by $O(n log(n))$ data. Problems for the sensitivity of the solution of the Sylvester equation are also widely studied. There are several books that contain the results for these problems ^[25,26,27].

In this paper, we focus our attention on the multiple constrained least squares solution of the Sylvester equation, that is, the following multiple constrained least squares problem:

$\begin{eqnarray} \min\limits_{X^T = X, L\leq X\leq U, \lambda_{min}(X)\geq \varepsilon > 0} f(X) = \frac{1}{2}\|AX+XB-C\|^2 \end{eqnarray}$

(1.2)

where $A, B, C, L$ and $U$ are given $n\times n$ real matrices, $X$ is a $n\times n$ real symmetric matrix which we wish to find, $\lambda_{min}(X)$ represents the smallest eigenvalue of the symmetric matrix $X$ , and $\varepsilon$ is a given positive constant. The inequality $X\geq Y$ , for any two real matrices, means that $X_{ij}\geq Y_{ij}$ , here $X_{ij}$ and $Y_{ij}$ denote the $ij$ th entries of the matrices $X$ and $Y$ , respectively.

Multiple constrained conditions least squares estimations of matrices are widely used in mathematical economics, statistical data analysis, image reconstruction, recommendation problems and so on. They differ from the ordinary least squares problems, and the estimated matrices are usually required to be symmetric positive definite, bounded and, sometimes, to have some special construction patterns. For example, in the dynamic equilibrium model of economy ^[28], one needs to estimate an aggregate demand function derived from second order analysis of the utility function of individuals. The formulation of this problem is to find the least squares solution of the matrix equation $AX = B$ , where $A$ and $B$ are given, the fitting matrix $X$ is a symmetric and bounded matrix, and the smallest eigenvalue is no less than a specified positive number since, in the neighborhood of equilibrium, the approximate of the utility function is a quadratic and strictly concave with Hessian matrix. Other examples discussed in ^[29,30] are respectively to find a symmetric positive definite patterned matrix closest to a sample covariance matrix and to find a symmetric and diagonally dominant matrices with positive diagonal matrix closest to a given matrix. Based on the above analysis, we have a strong motivation to study the multiple constrained least squares problem (1.2).

In this paper, we first transform the multiple constrained least squares (1.2) into an equivalent constrained optimization problem. Then, we give the necessary and sufficient conditions for the existence of a solution to the equivalent constrained optimization problem. Noting that the alternating direction method of multipliers (ADMM) is one-step iterative method, we propose a multi-step alternating direction method of multipliers (MSADMM) to the multiple constrained least squares (1.2), and analyze the global convergence of the proposed algorithm. We will give some numerical examples to illustrate the effectiveness of the proposed algorithm to the multiple constrained least squares (1.2) and list some problems that should be studied in the near future. We also give some numerical comparisons between MSADMM, ADMM and ADMM with Anderson acceleration (ACADMM).

Throughout this paper, $R^{m\times n}$ , $SR^{n\times n}$ and $SR_0^{n\times n}$ denote the set of $m\times n$ real matrices, $n\times n$ symmetric matrices and $n\times n$ symmetric positive semidefinite matrices, respectively. $I_n$ stands for the $n\times n$ identity matrix. $A_+$ denotes a matrix with $ij$ th entry equal to $\max{\{0, A_{ij}\}}$ . The inner product in space $R^{m\times n}$ defined as $\langle A, B\rangle = tr(A^TB) = \sum_{ij}A_{ij}B_{ij}$ for all $A, B\in R^{m\times n}$ , and the associated norm is Frobenius norm denoted by $\|A\|$ . $P_{\Omega}(X)$ denotes the projection of the matrix $X$ onto the constrained matrix set $\Omega$ , that is $P_{\Omega}(X) = arg\min_{Z\in \Omega}\|Z-X\|$ .

2. Preliminaries

In this section, we give an existence theorem for a solution of the multiple constrained least squares problem (1.2) and some theoretical results for the optimization problems which are useful for discussions in the next sections.

Theorem 2.1. The matrix $\tilde{X}$ is a solution of the multiple constrained least squares problem (1.2) if and only if there exist matrices $\tilde{\Lambda}_1, \tilde{\Lambda}_2$ and $\tilde{\Lambda}_3$ such that the following conditions (2.1)–(2.4) are satisfied.

$\begin{eqnarray} A^T(A\tilde{X}+\tilde{X}B-C)+(A\tilde{X}+\tilde{X}B-C)B^T-\tilde{\Lambda}_1+\tilde{\Lambda}_2-\tilde{\Lambda}_3 = 0, \end{eqnarray}$

(2.1)

$\begin{eqnarray} \langle\tilde{\Lambda}_1, \tilde{X}-L\rangle = 0, \tilde{X}-L\geq 0, \tilde{\Lambda}_1\geq 0, \end{eqnarray}$

(2.2)

$\begin{eqnarray} \langle\tilde{\Lambda}_2, U-\tilde{X}\rangle = 0, U-\tilde{X}\geq 0, \tilde{\Lambda}_2\geq 0, \end{eqnarray}$

(2.3)

$\begin{eqnarray} \langle\begin{array}{cc}\tilde{\Lambda}_3+\tilde{\Lambda}_3^T, \tilde{X}-\varepsilon I_n\end{array}\rangle = 0, \tilde{X}-\varepsilon I_n\in SR_0^{n\times n}, \tilde{\Lambda}_3+\tilde{\Lambda}_3^T\in SR_0^{n\times n}. \end{eqnarray}$

(2.4)

Proof. Obviously, the multiple constrained least squares problem (1.2) can be rewritten as

$\begin{eqnarray} &&\min\limits_{X\in SR^{n\times n}} F(X) = \frac{1}{2}\|AX+XB-C\|^2\\ &&\text{s.t. } X-L\geq 0, U-X\geq 0, X-\varepsilon I_n\in SR_0^{n\times n}. \end{eqnarray}$

(2.5)

Then, if $\tilde{X}$ is a solution to the constrained optimization problem (2.5), $\tilde{X}$ certainly satisfies KKT conditions of the constrained optimization problem (2.5), and hence of the multiple constrained least squares problem (1.2). That is, there exist matrices $\tilde{\Lambda}_1, \tilde{\Lambda}_2$ and $\tilde{\Lambda}_3$ such that conditions (2.1)–(2.4) are satisfied.

Conversely, assume that there exist matrices $\tilde{\Lambda}_1, \tilde{\Lambda}_2$ and $\tilde{\Lambda}_3$ such that conditions (2.1)–(2.4) are satisfied. Let

$\begin{eqnarray*} \bar{F}(X) = F(X)-\langle \tilde{\Lambda}_1, X-L\rangle-\langle \tilde{\Lambda}_2, U-X\rangle-\langle\begin{array}{cc}\frac{\tilde{\Lambda}_3+\tilde{\Lambda}_3^T}{2}, X-\varepsilon I_n\end{array}\rangle, \end{eqnarray*}$

then, for any matrix $W\in SR^{n\times n}$ , we have

$\begin{eqnarray*} &&\bar{F}(\tilde{X}+W)\\ && = \frac{1}{2}\|A(\tilde{X}+W)+(\tilde{X}+W)B-C\|^2\\ &&\ \ \ -\langle \tilde{\Lambda}_1, \tilde{X}+W-L\rangle-\langle \tilde{\Lambda}_2, U-\tilde{X}-W\rangle-\langle\begin{array}{cc}\frac{\tilde{\Lambda}_3+\tilde{\Lambda}_3^T}{2}, \tilde{X}+W-\varepsilon I_n\end{array}\rangle\\ && = \bar{F}(\tilde{X})+\frac{1}{2}\|AW+WB\|^2+\langle AW+WB, A\tilde{X}+\tilde{X}B-C\rangle\\ &&\ \ \ -\langle \tilde{\Lambda}_1, W\rangle+\langle \tilde{\Lambda}_2, W\rangle-\langle\begin{array}{cc}\frac{\tilde{\Lambda}_3+\tilde{\Lambda}_3^T}{2}, W\end{array}\rangle\\ && = \bar{F}(\tilde{X})+\frac{1}{2}\|AW+WB\|^2\\ &&\ \ \ +\langle W, A^T(A\tilde{X}+\tilde{X}B-C)+(A\tilde{X}+\tilde{X}B-C)B^T-\tilde{\Lambda}_1+\tilde{\Lambda}_2-\tilde{\Lambda}_3\rangle\\ && = \bar{F}(\tilde{X})+\frac{1}{2}\|AW+WB\|^2\\ &&\geq \bar{F}(\tilde{X}). \end{eqnarray*}$

This implies that $\tilde{X}$ is a global minimizer of the function $\bar{F}(X)$ with $X\in SR^{n\times n}$ . Since

$\langle\tilde{\Lambda}_1, \tilde{X}-L\rangle = 0, \langle\tilde{\Lambda}_2, U-\tilde{X}\rangle = 0, \langle\begin{array}{cc}\tilde{\Lambda}_3+\tilde{\Lambda}_3^T, \tilde{X}-\varepsilon I_n\end{array}\rangle = 0,$

and $\bar{F}(X)\geq \bar{F}(\tilde{X})$ holds for all $X\in SR^{n\times n}$ , we have

$F(X)\geq F(\tilde{X})+\langle \tilde{\Lambda}_1, X-L\rangle+\langle \tilde{\Lambda}_2, U-X\rangle+\langle\begin{array}{cc}\frac{\tilde{\Lambda}_3+\tilde{\Lambda}_3^T}{2}, X-\varepsilon I_n\end{array}\rangle.$

Noting that $\tilde{\Lambda}_1\geq 0, \tilde{\Lambda}_2\geq 0$ and $\tilde{\Lambda}_3+\tilde{\Lambda}_3^T\in SR_0^{n\times n}$ , then $F(X)\geq F(\tilde{X})$ holds for all $X$ with $X-L\geq 0$ , $U-X\geq 0$ and $X-\varepsilon I_n\in SR_0^{n\times n}$ . Hence, $\tilde{X}$ is a solution of the constrained optimization problem (2.5), that is, $\tilde{X}$ a solution of the multiple constrained least squares problem (1.2). □

Lemma 2.1. ^[31] Assume that $\tilde{x}$ is a solution of the optimization problem

$\begin{eqnarray*} \min f(x)\ \ \mathit{\text{s.t.}} x\in\Omega, \end{eqnarray*}$

where $f(x)$ is a continuously differentiable function, $\Omega$ is a closed convex set, then

$\langle \nabla f(\tilde{x}), x-\tilde{x}\rangle\geq 0, \; \forall x\in \Omega.$

Lemma 2.2. ^[31] Assume that $(\tilde{x}_1, \tilde{x}_2, ..., \tilde{x}_n)$ is a solution of the optimization problem

$\begin{eqnarray} \min \sum\limits_{i = 1}^nf_i(x_i)\ \ \mathit{\text{s.t.}} \sum\limits_{i = 1}^nA_ix_i = b, \;x_i\in \Omega_i, \; i = 1, 2, ..., n \end{eqnarray}$

(2.6)

where $f_i(x_i)\; (i = 1, 2, ..., n)$ are continuously differentiable functions, $\Omega_i\; (i = 1, 2, ..., n)$ are closed convex sets, then

$\langle \nabla_{x_i} f(\tilde{x}_i)-A_i^T\tilde{\lambda}, x_i-\tilde{x}_i\rangle\geq 0, \; \forall x_i\in \Omega_i, \;i = 1, 2, ..., n,$

where $\tilde{\lambda}$ is a solution to the dual problem of (2.6).

Lemma 2.3. Assume that $\Omega = \{X\in R^{n\times n}:L\leq X\leq U\}$ , then, for any matrix $M\in R^{n\times n}$ , if $Y = P_{\Omega}(Y-M)$ , we have

$\langle (M)_+, Y-L\rangle = 0,\;\langle (-M)_+,U-Y\rangle = 0.$

Proof. Let

$\begin{eqnarray*} \tilde{Z} = arg\min\limits_{Z\in \Omega}\|Z-(Y-M)\|^2 \end{eqnarray*}$

and noting that the optimization problem

$\begin{eqnarray*} \min\limits_{Z\in \Omega}\|Z-(Y-M)\|^2 \end{eqnarray*}$

is equivalent to the optimization problem

$\begin{eqnarray} \min\limits_{Z\in R^{n\times n}, Z-L\geq 0,U-Z\geq 0} \|Z-(Y-M)\|^2, \end{eqnarray}$

(2.7)

then $\tilde{Z}$ satisfies the KKT conditions for the optimization problem (2.7). That is, there exist matrices $\tilde{\Lambda}_1\geq 0$ and $\tilde{\Lambda}_2\geq 0$ such that

$\begin{eqnarray*} \tilde{Z}-Y+M-\tilde{\Lambda}_1+\tilde{\Lambda}_2 = 0,\langle \tilde{\Lambda}_1,\tilde{Z}-L\rangle = 0, \langle \tilde{\Lambda}_2,U-\tilde{Z}\rangle = 0, \tilde{Z}-L\geq 0,\; U-\tilde{Z}\geq 0. \end{eqnarray*}$

Since

$Y = P_{\Omega}(Y-M) = arg\min\limits_{Z\in \Omega}\|Z-(Y-M)\|^2,$

then

$M-\tilde{\Lambda}_1+\tilde{\Lambda}_2 = 0,\;\langle \tilde{\Lambda}_1,Y-L\rangle = 0,\; \langle \tilde{\Lambda}_2,U-Y\rangle = 0,\;Y-L\geq 0,\; U-Y\geq 0.$

So we have from the above conditions that $(\tilde{\Lambda}_1)_{ij}(\tilde{\Lambda}_2)_{ij} = 0$ when $L_{ij}\neq U_{ij}$ , and $(\tilde{\Lambda}_1)_{ij}$ and $(\tilde{\Lambda}_2)_{ij}$ can be arbitrarily selected as $(\tilde{\Lambda}_1)_{ij}\geq 0$ and $(\tilde{\Lambda}_2)_{ij}\geq 0$ when $L_{ij} = U_{ij}$ . Noting that $M = \tilde{\Lambda}_1-\tilde{\Lambda}_2$ , $\tilde{\Lambda}_1$ and $\tilde{\Lambda}_2$ can be selected as $\tilde{\Lambda}_1 = (M)_+$ and $\tilde{\Lambda}_2 = (-M)_+$ . Hence, the results hold. □

Lemma 2.4. Assume that $\Omega = \{X\in R^{n\times n}:X^T = X, \lambda_{min}(X)\geq \varepsilon > 0\}$ , then, for any matrix $M\in R^{n\times n}$ , if $Y = P_{\Omega}(Y-M)$ , we have

$\langle M+M^T, Y-\varepsilon I_n\rangle = 0,\;M+M^T\in SR_0^{n\times n}, Y-\varepsilon I_n\in SR_0^{n\times n}.$

Proof. Let

$\begin{eqnarray*} \tilde{Z} = arg\min\limits_{Z\in \Omega}\|Z-(Y-M)\|^2 \end{eqnarray*}$

and noting that the optimization problem

$\begin{eqnarray*} \min\limits_{Z\in \Omega}\|Z-(Y-M)\|^2 \end{eqnarray*}$

is equivalent to the optimization problem

$\begin{eqnarray} \min\limits_{Z\in SR^{n\times n}, Z-\varepsilon I_n\in SR_0^{n\times n}} \|Z-(Y-M)\|^2, \end{eqnarray}$

(2.8)

then $\tilde{Z}$ satisfies the KKT conditions for the optimization problem (2.8). That is, there exists a matrix $\tilde{\Lambda}\in SR_0^{n\times n}$ such that

$\begin{eqnarray*} \tilde{Z}-Y+M-\tilde{\Lambda}+(\tilde{Z}-Y+M-\tilde{\Lambda})^T = 0, \langle \tilde{\Lambda},\tilde{Z}-\varepsilon I_n\rangle = 0,\;\tilde{Z}-\varepsilon I_n\in SR_0^{n\times n}. \end{eqnarray*}$

Since

$Y = P_{\Omega}(Y-M) = arg\min\limits_{Z\in \Omega}\|Z-(Y-M)\|^2,$

then

$\begin{eqnarray*} M+M^T-2\tilde{\Lambda} = 0,\;\langle \tilde{\Lambda},Y-\varepsilon I_n\rangle = 0,\;Y-\varepsilon I_n\in SR_0^{n\times n}. \end{eqnarray*}$

Hence, the results hold. □

3. Accelerated ADMM

In this section we give a multi-step alternating direction method of multipliers (MSADM) tothe multiple constrained least squares problem (1.2). Obviously, the multiple constrained least squares problem (1.2) is equivalent to the following constrained optimization problem

$\begin{eqnarray} &&\min\limits_{X}F(X) = \frac{1}{2}\|AX+XB-C\|^2, \\ &&\text{s.t. }X-Y = 0,X-Z = 0,\\ && \ \ \ \ \ \ X\in R^{n\times n}, \\ && \ \ \ \ \ \ Y\in \Omega_1 = \{Y\in R^{n\times n}:L\leq Y\leq U\}, \\ && \ \ \ \ \ \ Z\in \Omega_2 = \{Z\in R^{n\times n}:Z^T = Z, \lambda_{min}(Z)\geq \varepsilon > 0\}. \end{eqnarray}$

(3.1)

The Lagrange function, augmented Lagrangian function and dual problem to the constrained optimization problem (3.1) are, respectively,

$\begin{eqnarray} L(X, Y,Z, M,N) = F(X)-\langle M, X-Y\rangle-\langle N, X-Z\rangle, \end{eqnarray}$

(3.2)

$\begin{eqnarray} L_{\alpha}(X, Y,Z, M,N) = F(X) +\frac{\alpha}{2}\|X\!-\!Y\!-\!M/\alpha\|^2+\frac{\alpha}{2}\|X\!-\!Z\!-\!N/\alpha\|^2, \end{eqnarray}$

(3.3)

$\begin{eqnarray} \max\limits_{M,N\in R^{n\times n}}\inf\limits_{X\in R^{n\times n},Y\in \Omega_1,Z\in \Omega_2}L(X, Y,Z, M,N), \end{eqnarray}$

(3.4)

where $M$ and $N$ are Lagrangian multipliers and $\alpha$ is penalty parameter.

The alternating direction method of multipliers ^[32,33] to the constrained optimization problem (3.1) can be described as the following Algorithm 3.1.

Algorithm 3.1. ADMM to solve problem (3.1).
Step 1. Input matrices $A, B, C, L$ and $U$ . Input constant $\varepsilon > 0$ , error tolerance $\varepsilon_{out} > 0$ and penalty parameter $\alpha > 0$ . Choose initial matrices $Y_0, Z_0, M_0, N_0\in R^{n\times n}$ . Let $k=:0$ ;
Step 2. Compute
$\begin{eqnarray} & & (a)\; X_{k+1}=arg\min_{X\in R^{n\times n}}L_{\alpha}(X, Y_k, Z_k, M_k, N_k), \\ & & (b)\; Y_{k+1}=arg\min_{Y\in {\Omega_1}}L_{\alpha}(X_{k+1}, Y, Z_k, M_k, N_k)=P_{\Omega_1}(X_{k+1}-M_k/\alpha), \\ & & (c)\; Z_{k+1}=arg\min_{Z\in {\Omega_2}}L_{\alpha}(X_{k+1}, Y_{k+1}, Z, M_k, N_k)=P_{\Omega_2}(X_{k+1}-N_k/\alpha), \\ & & (d)\; M_{k+1}=M_k-\alpha (X_{k+1}-Y_{k+1}), \\ & & (e)\; N_{k+1}=N_k-\alpha (X_{k+1}-Z_{k+1}); \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.5)$
Step 3. If $(\\|Y_{k+1}-Y_k\\|^2+\\|Z_{k+1}-Z_k\\|^2+\\|M_{k+1}-M_k\\|^2+\\|N_{k+1}-N_k\\|^2)^{1/2} < \varepsilon_{out}$ , stop. In this case, $X_{k+1}$ is an approximate solution of problem (3.1);
Step 4. Let $k=:k+1$ and go to step 2.

| Show Table

DownLoad: CSV

Alternating direction method of multipliers (ADMM) has been well studied in the context of the linearly constrained convex optimization. In the last few years, we have witnessed a number of novel applications arising from image processing, compressive sensing and statistics, etc. ADMM is a splitting version of the augmented Lagrange method (ALM) where the ALM subproblem is decomposed into multiple subproblems at each iteration, and thus the variables can be solved separably in alternating order. ADMM, in fact, is one-step iterative method, that is, the current iterates is obtained by the information only from the previous step, and the convergence rate of ADMM is only linear, which was proved in ^[33]. In this paper we propose a multi-step alternating direction method of multipliers (MSADMM), which is more effective than ADMM, to the constrained optimization problem (3.1). The iterative pattern of MSADMM can be described as the following Algorithm 3.2.

Algorithm 3.2. MSADMM to solve problem (3.1).
Step 1. Input matrices $A, B, C, L$ and $U$ . Input constant $\varepsilon > 0$ , error tolerance $\varepsilon_{out} > 0$ , penalty parameter $\alpha > 0$ and correction factor $\gamma\in (0, 2)$ . Choose initial matrices $Y_0, Z_0, M_0, N_0\in R^{n\times n}$ . Let $k=:0$ ;
Step 2. ADMM step
$\begin{eqnarray} &&(a)\; \tilde{X}_{k}=arg\min_{X\in R^{n\times n}}L_{\alpha}(X, Y_k, Z_k, M_k, N_k), \end{eqnarray}\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.6)$
$\begin{eqnarray} &&(b)\; \tilde{M}_{k}=M_k-\alpha (\tilde{X}_{k}-Y_{k}), \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.7)$
$\begin{eqnarray} &&(c)\; \tilde{N}_{k}=N_k-\alpha (\tilde{X}_{k}-Z_{k}), \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.8)$
$\begin{eqnarray} &&(d)\; \tilde{Y}_{k}=arg\min_{Y\in \Omega_1}L_{\alpha}(\tilde{X}_{k}, Y, Z_k, \tilde{M}_k, \tilde{N}_k)=P_{\Omega_1}(\tilde{X}_{k}-\tilde{M}_k/\alpha), \quad \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.9)$
$\begin{eqnarray} &&(e)\; \tilde{Z}_{k}=arg\min_{Z\in \Omega_2}L_{\alpha}(\tilde{X}_{k}, \tilde{Y}_{k}, Z, \tilde{M}_k, \tilde{N}_k)=P_{\Omega_2}(\tilde{X}_{k}-\tilde{N}_k/\alpha); \quad \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.10)$
Step 3. Correction step
$\begin{eqnarray} &&(a)\; Y_{k+1}=Y_k-\gamma (Y_k-\tilde{Y}_k), \ \ \ \ \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.11)$
$\begin{eqnarray} &&(b)\; Z_{k+1}=Z_k-\gamma (Z_k-\tilde{Z}_k), \ \ \ \ \quad \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.12)$
$\begin{eqnarray} &&(c)\; M_{k+1}=M_k-\gamma (M_k-\tilde{M}_k).\ \ \ \ \end{eqnarray}\; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.13)$
$\begin{eqnarray} &&(d)\; N_{k+1}=N_k-\gamma (N_k-\tilde{N}_k); \ \ \ \ \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; (3.14)$
Step 4. If $(\\|Y_{k+1}-Y_k\\|^2+\\|Z_{k+1}-Z_k\\|^2+\\|M_{k+1}-M_k\\|^2+\\|N_{k+1}-N_k\\|^2)^{1/2} < \varepsilon_{out}$ , stop. In this case, $\tilde{X}_k$ is an approximate solution of problem (3.1);
Step 5. Let $k=:k+1$ and go to step 2.

| Show Table

DownLoad: CSV

Compared to ADMM, MSADMM yields the new iterate in the order $X\rightarrow M\rightarrow N\rightarrow Y\rightarrow Z$ with the difference in the order of $X\rightarrow Y\rightarrow Z\rightarrow M\rightarrow N$ . Despite this difference, MSADMM and ADMM are equally effective to exploit the separable structure of (3.1) and equally easy to implement. In fact, the resulting subproblems of these two methods are of the same degree of decomposition and they are of the same difficulty. We shall verify by numerical experiments that these two methods are also equally competitive in numerical senses, and that, if we choose the correction factor $\gamma$ suitably, MSADMM is more efficient than ADMM.

4. Convergence analysis

In this section, we prove the global convergence and the $O(1/t)$ convergence rate for the proposed Algorithm 3.2. We start the proof with some lemmas which are useful for the analysis of coming theorems.

To simplify our analysis, we use the following notations throughout this section.

$\begin{eqnarray*} \Omega = R^{n\times n}\times \Omega_1\times \Omega_2\times R^{n\times n}\times R^{n\times n}; \mathcal{V} = \left(\begin{array}{cc} Y\\ Z\\ M\\ N \end{array}\right); \mathcal {W} = \left(\begin{array}{cc} X\\ \mathcal{V} \end{array}\right); \end{eqnarray*}$

$\begin{eqnarray*} G(\mathcal {W}) = \left(\begin{array}{ccc} \nabla F(X)-M-N\\ M\\ N\\ X-Y\\ X-Z \end{array}\right);\\ Q = \left(\begin{array}{cccc} \alpha I_n&0&I_n&0\\ 0&\alpha I_n&0&I_n\\ I_n&0&\frac{1}{\alpha}I_n&0\\ 0&I_n&0&\frac{1}{\alpha}I_n \end{array}\right). \end{eqnarray*}$

Lemma 4.1. Assume that $(X^*, Y^*, Z^*)$ is a solution of problem (3.1), $(M^*, N^*)$ is a solution of the dual problem (3.4) to the constrained optimization problem (3.1), and that the sequences $\{\mathcal{V}_k\}$ and $\{\tilde{\mathcal{W}}_k\}$ are generated by Algorithm 3.2, then we have

$\begin{eqnarray} \langle \mathcal{V}_k-\mathcal{V}^*,Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k)\rangle\geq \langle \mathcal{V}_k-\tilde{\mathcal{V}}_k,Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k) \rangle. \end{eqnarray}$

(4.1)

Proof. By (3.6)–(3.10) and Lemma 2.1, we have, for any $(X, Y, Z, M, N)\in \Omega$ , that

$\begin{eqnarray} \langle\begin{array}{r}\!\!\!\left(\begin{array}{ccc} \!\!\!X-\tilde{X}_k\\ \!\!\!Y-\tilde{Y}_k\\ \!\!\!Z-\tilde{Z}_k\\ \!\!\!M-\tilde{M}_k\\ \!\!\!N-\tilde{N}_k \end{array}\!\!\!\right), \left(\begin{array}{ccc} \!\!\!\nabla F(\tilde{X}_k)-\tilde{M}_k-\tilde{N}_k\\ \!\!\!\tilde{M}_k\\ \!\!\!\tilde{N}_k\\ \!\!\!\tilde{X}_k-\tilde{Y}_k\\ \!\!\!\tilde{X}_k-\tilde{Z}_k \end{array}\!\!\!\right)+\left(\begin{array}{cccc} \!\!\!0&0&0&0\\ \!\!\!\alpha I_n&0&I_n&0\\ \!\!\!0&\alpha I_n&0&I_n\\ \!\!\!I_n&0&\frac{1}{\alpha}I_n&0\\ \!\!\!0&I_n&0&\frac{1}{\alpha}I_n \end{array}\!\!\!\right)\left(\begin{array}{ccc} \!\!\!\tilde{Y}_k-Y_k\\ \!\!\!\tilde{Z}_k-Z_k\\ \!\!\!\tilde{M}_k-M_k\\ \!\!\!\tilde{N}_k-N_k \end{array}\!\!\!\right)\end{array}\!\!\!\rangle\geq 0, \end{eqnarray}$

(4.2)

or compactly,

$\begin{eqnarray} \langle \mathcal {W}-\mathcal {\tilde{W}}_k, G(\mathcal {\tilde{W}}_k)+\left(\begin{array}{c}0\\ Q\end{array}\right)(\tilde{\mathcal{V}}_k-\mathcal{V}_k)\rangle\geq 0. \end{eqnarray}$

(4.3)

Choosing $\mathcal {W}$ as $\mathcal {W}^* = \left(\begin{array}{ccc} X^*\\ \mathcal{V}^* \end{array}\right)$ , then (4.3) can be rewritten as

$\begin{eqnarray*} \langle \tilde{\mathcal{V}}_k-\mathcal{V}^*, Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k)\rangle\geq \langle \mathcal {\tilde{W}}_k-\mathcal {W}^*, G(\mathcal {\tilde{W}}_k)\rangle. \end{eqnarray*}$

Noting that the monotonicity of the gradients of the convex functions, we have by Lemma 2.2 that

$\begin{eqnarray*} \langle \mathcal {\tilde{W}}_k-\mathcal {W}^*, G(\mathcal {\tilde{W}}_k)\rangle\geq\langle\mathcal {\tilde{W}}_k-\mathcal {W}^*,G(\mathcal {W}^*)\rangle\geq 0. \end{eqnarray*}$

Therefore, the above two inequalities imply that

$\begin{eqnarray*} \langle \tilde{\mathcal{V}}_k-\mathcal{V}^*, Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k)\rangle\geq 0, \end{eqnarray*}$

from which the assertion (4.1) is immediately derived. □

Noting that the matrix $Q$ is a symmetric and positive semi-definite matrix, we use, for convenience, the notation

$\begin{eqnarray*} \|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q: = \sqrt{\langle \mathcal{V}_k-\tilde{\mathcal{V}}_k,Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k)\rangle}. \end{eqnarray*}$

Then, the assertion (4.1) can be rewritten as

$\begin{eqnarray} \langle \mathcal{V}_k-\mathcal{V}^*,Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k)\rangle\geq \|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q^2. \end{eqnarray}$

(4.4)

Lemma 4.2. Assume that $(X^*, Y^*, Z^*)$ is a solution of the constrained optimization problem (3.1), $(M^*, N^*)$ is a solution of the dual problem (3.4) to the constrained optimization problem (3.1), and that the sequences $\{\mathcal{V}_k\}, \{\tilde{\mathcal{V}}_k\}$ are generated by Algorithm 3.2. Then, we have

$\begin{eqnarray} \|\mathcal{V}_{k+1}-\mathcal{V}^*\|_Q^2\leq \|\mathcal{V}_k-\mathcal{V}^*\|_Q^2-\gamma(2-\gamma)\|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q^2. \end{eqnarray}$

(4.5)

Proof. By elementary manipulation, we obtain

$\begin{eqnarray*} \|\mathcal{V}_{k+1}-\mathcal{V}^*\|_Q^2& = &\|(\mathcal{V}_k-\mathcal{V}^*)-\gamma(\mathcal{V}_k-\tilde{\mathcal{V}}_k)\|_Q^2\\ & = &\|\mathcal{V}_k-\mathcal{V}^*\|_Q^2-2\gamma\langle\mathcal{V}_k-\mathcal{V}^*,Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle+\gamma^2\|\mathcal {V}_k-\tilde{\mathcal {V}}_k\|_Q^2\\ &\leq&\|\mathcal{V}_k-\mathcal{V}^*\|_Q^2-2\gamma\|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q^2+\gamma^2\|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q^2\\ & = &\|\mathcal{V}_k-\mathcal{V}^*\|_Q^2-\gamma(2-\gamma)\|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q^2, \end{eqnarray*}$

where the inequality follows from (4.1) and (4.4). □

Lemma 4.3. The sequences $\{\mathcal{V}_k\}$ and $\{\tilde{\mathcal{W}}_k\}$ generated by Algorithm 3.2 satisfy

$\begin{eqnarray} \langle\mathcal {W}-\tilde{\mathcal {W}}_k,G(\mathcal {\tilde{W}}_k)\rangle+\frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_k\|_Q^2-\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2)\geq (1-\frac{\gamma}{2})\|\mathcal {V}_k-\tilde{\mathcal {V}}_k\|_Q^2 \end{eqnarray}$

(4.6)

for any $(X, Y, Z, M, N)\in \Omega$ .

Proof. By (4.2) or its compact form (4.3), we have, for any $(X, Y, Z, M, N)\in \Omega$ , that

$\begin{eqnarray} \langle\mathcal {W}-\tilde{\mathcal {W}}_k,G(\mathcal {\tilde{W}}_k)\rangle\geq-\langle\mathcal {W}-\tilde{\mathcal {W}}_k,\left(\begin{array}{c}0\\ Q\end{array}\right)(\tilde{\mathcal{V}}_k-\mathcal{V}_k)\rangle = \langle\mathcal {V}-\tilde{\mathcal {V}}_k,Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle. \end{eqnarray}$

(4.7)

Thus, it suffices to show that

$\begin{eqnarray} \langle\mathcal {V}-\tilde{\mathcal {V}}_k,Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle+\frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_k\|_Q^2-\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2)\geq (1-\frac{\gamma}{2})\|\mathcal {V}_k-\tilde{\mathcal {V}}_k\|_Q^2. \end{eqnarray}$

(4.8)

By using the formula $2\langle a, Qb\rangle = \|a\|_Q^2+\|b\|_Q^2-\|a-b\|_Q^2$ , we derive that

$\begin{eqnarray} \langle\mathcal {V}-\mathcal {V}_{k+1},Q(\mathcal {V}_k-\mathcal {V}_{k+1})\rangle = \frac{1}{2}\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2+\frac{1}{2}\|\mathcal {V}_k-\mathcal {V}_{k+1}\|_Q^2-\frac{1}{2}\|\mathcal {V}-\mathcal {V}_k\|_Q^2. \end{eqnarray}$

(4.9)

Moreover, since (3.11)–(3.14) can be rewritten as $(\mathcal {V}_k-\mathcal {V}_{k+1}) = \gamma(\mathcal {V}_k-\tilde{\mathcal {V}}_k)$ , we have

$\begin{eqnarray} \langle\mathcal {V}-\mathcal {V}_{k+1},Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle = \frac{1}{\gamma}\langle\mathcal {V}-\mathcal {V}_{k+1},Q(\mathcal {V}_k-\mathcal {V}_{k+1})\rangle. \end{eqnarray}$

(4.10)

Combining (4.9) and (4.10), we obtain

$\begin{eqnarray} \langle\mathcal {V}-\mathcal {V}_{k+1},Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle = \frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2-\|\mathcal {V}-\mathcal {V}_k\|_Q^2)+\frac{1}{2\gamma}\|\mathcal {V}_k-\mathcal {V}_{k+1}\|_Q^2. \end{eqnarray}$

(4.11)

On the other hand, we have by using (3.11)–(3.14) that

$\begin{eqnarray} \langle\mathcal {V}_{k+1}-\tilde{\mathcal {V}}_k,Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle = (1-\gamma)\|\mathcal {V}_k-\tilde{\mathcal {V}}_k\|_Q^2. \end{eqnarray}$

(4.12)

By adding (4.11) and (4.12), and again using the fact that $(\mathcal {V}_k-\mathcal {V}_{k+1}) = \gamma(\mathcal {V}_k-\tilde{\mathcal {V}}_k)$ , we obtain that

$\begin{eqnarray*} &&\langle\mathcal {V}-\tilde{\mathcal {V}}_k,Q(\mathcal {V}_k-\tilde{\mathcal {V}}_k)\rangle\\ && = \frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2-\|\mathcal {V}-\mathcal {V}_k\|_Q^2)+\frac{1}{2\gamma}\|\mathcal {V}_k-\mathcal {V}_{k+1}\|_Q^2+(1-\gamma)\|\mathcal {V}_k-\tilde{\mathcal {V}}_k\|_Q^2\\ && = \frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2-\|\mathcal {V}-\mathcal {V}_k\|_Q^2)+(1-\frac{\gamma}{2})\|\mathcal {V}_k-\tilde{\mathcal {V}}_k\|_Q^2 \end{eqnarray*}$

which is equivalent to (4.8). Hence, the lemma is proved. □

Theorem 4.1. The sequences $\{\mathcal{V}_k\}$ and $\{\tilde{\mathcal{W}}_k\}$ generated by Algorithm 3.2 are bounded, and furthermore, any accumulation point $\tilde{X}$ of the sequence $\{\tilde{X}_k\}$ is a solution of problem (1.2).

Proof. The inequality (4.5) with the restriction $\gamma\in (0, 2)$ implies that

(ⅰ) $\lim_{k\rightarrow \infty} \|\mathcal{V}_k-\tilde{\mathcal{V}}_k\|_Q = 0$ ;

(ⅱ) $\|\mathcal{V}_k-\mathcal{V}^*\|_Q$ is bounded upper.

Recall that the matrix $Q$ is symmetric and positive semi-definite. Thus, we have by the assertion (ⅰ) that $Q(\mathcal{V}_k-\tilde{\mathcal{V}}_k) = 0$ ( $k\rightarrow \infty$ ) which, together with (3.7) and (3.8), imply that $\tilde{Y}_k = \tilde{X}_k = \tilde{Z}_k$ ( $k\rightarrow \infty$ ). The assertion (ⅱ) implies that the sequences $\{Y_k\}$ , $\{Z_k\}$ , $\{M_k\}$ and $\{N_k\}$ are bounded. Equations (3.11)–(3.14) hold, and the sequences $\{Y_k\}$ , $\{Z_k\}$ , $\{M_k\}$ and $\{N_k\}$ are bounded imply the sequence $\{\tilde{Y}_k\}$ , $\{\tilde{Z}_k\}$ , $\{\tilde{M}_k\}$ and $\{\tilde{N}_k\}$ are also bounded. Hence, by the clustering theorem and together with (3.11)–(3.14), there exist subsequences $\{\tilde{X}_k\}_{\mathcal{K}}$ , $\{\tilde{Y}_k\}_{\mathcal{K}}$ , $\{\tilde{Z}_k\}_{\mathcal{K}}$ , $\{\tilde{M}_k\}_{\mathcal{K}}$ , $\{\tilde{N}_k\}_{\mathcal{K}}$ , $\{Y_k\}_{\mathcal{K}}$ , $\{Z_k\}_{\mathcal{K}}$ , $\{M_k\}_{\mathcal{K}}$ and $\{N_k\}_{\mathcal{K}}$ such that

$\lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}\tilde{X}_k = \tilde{X}, \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}\tilde{Y}_k = \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}Y_k = \tilde{Y}, \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}\tilde{Z}_k = \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}Z_k = \tilde{Z},$

$\lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}\tilde{M}_k = \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}M_k = \tilde{M}, \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}\tilde{N}_k = \lim\limits_{k\rightarrow \infty, k\in \mathcal{K}}N_k = \tilde{N}.$

Furthermore, we have by (3.7) and (3.8) that

$\begin{eqnarray} \tilde{X} = \tilde{Y} = \tilde{Z}. \end{eqnarray}$

(4.13)

By (3.6)–(3.8), we have

$\begin{eqnarray*} A^T(A\tilde{X}_k+\tilde{X}_kB-C)+(A\tilde{X}_k +\tilde{X}_kB-C)B^T-\tilde{M}_k-\tilde{N}_k = 0. \end{eqnarray*}$

So we have

$\begin{eqnarray} A^T(A\tilde{X}+\tilde{X}B-C)+(A\tilde{X} +\tilde{X}B-C)B^T-\tilde{M}-\tilde{N} = 0. \end{eqnarray}$

(4.14)

Let $k\rightarrow \infty, k\in \mathcal{K}$ , we have by (3.9), (3.10) and (4.13) that

$\begin{eqnarray} \tilde{X} = P_{\Omega_1}(\tilde{X}-\tilde{M}/\alpha), \tilde{X} = P_{\Omega_2}(\tilde{X}-\tilde{N}/\alpha). \end{eqnarray}$

(4.15)

Noting that $\alpha > 0$ , we have by (4.15), Lemma 2.3 and Lemma 2.4 that

$\begin{eqnarray} \langle \tilde{M}_+, \tilde{X}-L\rangle = 0, \tilde{X}-L\geq 0, \end{eqnarray}$

(4.16)

$\begin{eqnarray} \langle (-\tilde{M})_+, U-\tilde{X}\rangle = 0, U-\tilde{X}\geq 0, \end{eqnarray}$

(4.17)

and

$\begin{eqnarray} \langle \tilde{N}+\tilde{N}^T, \tilde{X}-\varepsilon I\rangle = 0, \tilde{N}+\tilde{N}^T\in SR_0^{n\times n}, \tilde{X}-\varepsilon I\in SR_0^{n\times n}. \end{eqnarray}$

(4.18)

Let

$\begin{eqnarray*} \tilde{\Lambda}_1 = (\tilde{M})_+, \tilde{\Lambda}_2 = (-\tilde{M})_+, \tilde{\Lambda}_3 = \tilde{N}, \end{eqnarray*}$

we have by (4.14) and (4.16)–(4.18) that

$\begin{eqnarray} \left\{\begin{array}{ll} A^T(A\tilde{X}+\tilde{X}B-C)+(A\tilde{X} +\tilde{X}B-C)B^T-\tilde{\Lambda}_1+\tilde{\Lambda}_2-\tilde{\Lambda}_3 = 0\\ \langle\tilde{\Lambda}_1, \tilde{X}-L\rangle = 0, \tilde{X}-L\geq 0, \tilde{\Lambda}_1\geq 0, \\ \langle\tilde{\Lambda}_2, U-\tilde{X}\rangle = 0, U-\tilde{X}\geq 0, \tilde{\Lambda}_2\geq 0, \\ \langle\tilde{\Lambda}_3+\tilde{\Lambda}_3^T, \tilde{X}-\varepsilon I\rangle = 0, \tilde{X}-\varepsilon I\in SR_0^{n\times n}, \tilde{\Lambda}_3+\tilde{\Lambda}_3^T\in SR_0^{n\times n}. \end{array}\right. \end{eqnarray}$

(4.19)

Hence, we have by Theorem 2.1 that $\tilde{X}$ is a solution of problem (1.2). □

Theorem 4.2. Let the sequences $\{\tilde{\mathcal {W}}_k\}$ be generated by Algorithm 3.2. For an integer $t > 0$ , let

$\begin{eqnarray} \widetilde{\mathcal {W}}_t = \frac{1}{t+1}\sum\limits_{k = 0}^t\tilde{\mathcal {W}}_k, \end{eqnarray}$

(4.20)

then $\frac{1}{t+1}\sum_{k = 0}^t(\tilde{X}_k, \tilde{Y}_k, \tilde{Z}_k, \tilde{M}_k, \tilde{N}_k)\in \Omega$ and the inequality

$\begin{eqnarray} \langle\widetilde{\mathcal {W}}_t-\mathcal {W},G(\mathcal {W})\rangle\leq\frac{1}{2\gamma(t+1)}\|\mathcal {V}-\mathcal {V}_0\|_Q^2 \end{eqnarray}$

(4.21)

holds for any $(X, Y, Z, M, N)\in \Omega$ .

Proof. First, for any integer $t > 0$ , we have $(\tilde{X}_k, \tilde{Y}_k, \tilde{Z}_k, \tilde{M}_k, \tilde{N}_k)\in \Omega$ for $k = 0, 1, 2, \cdots, t$ . Since $\frac{1}{t+1}\sum_{k = 0}^t\tilde{\mathcal {W}}_k$ can be viewed as a convex combination of $\tilde{\mathcal {W}}_k's$ , we obtain

$\frac{1}{t+1}\sum\limits_{k = 0}^t(\tilde{X}_k,\tilde{Y}_k,\tilde{Z}_k,\tilde{M}_k,\tilde{N}_k)\in \Omega.$

Second, since $\gamma\in (0, 2)$ , it follows from Lemma 4.3 that

$\begin{eqnarray} \langle\mathcal {W}-\tilde{\mathcal {W}}_k,G(\tilde{\mathcal {W}}_k)\rangle+\frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_k\|_Q^2-\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2)\geq 0, \forall (\tilde{X}_k,\tilde{Y}_k,\tilde{Z}_k,\tilde{M}_k,\tilde{N}_k)\in \Omega. \end{eqnarray}$

(4.22)

By combining the monotonicity of $G(\mathcal {W})$ with the inequality (4.22), we have

$\begin{eqnarray*} \langle\mathcal {W}-\tilde{\mathcal {W}}_k,G(\mathcal {W})\rangle+\frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_k\|_Q^2-\|\mathcal {V}-\mathcal {V}_{k+1}\|_Q^2)\geq 0, \forall (X,Y,Z,M,N)\in \Omega. \end{eqnarray*}$

Summing the above inequality over $k = 0, 1, \cdots, t$ , we derive that

$\begin{eqnarray*} \langle(t+1)\mathcal {W}-\sum\limits_{k = 0}^t\tilde{\mathcal {W}}_k,G(\mathcal {W})\rangle+\frac{1}{2\gamma}(\|\mathcal {V}-\mathcal {V}_0\|_Q^2-\|\mathcal {V}-\mathcal {V}_{t+1}\|_Q^2)\geq 0, \forall (X,Y,Z,M,N)\in \Omega. \end{eqnarray*}$

By dropping the minus term, we have

$\begin{eqnarray*} \langle(t+1)\mathcal {W}-\sum\limits_{k = 0}^t\tilde{\mathcal {W}}_k,G(\mathcal {W})\rangle+\frac{1}{2\gamma}\|\mathcal {V}-\mathcal {V}_0\|_Q^2\geq 0, \forall (X,Y,Z,M,N)\in \Omega. \end{eqnarray*}$

which is equivalent to

$\begin{eqnarray*} \langle\frac{1}{t+1}\sum\limits_{k = 0}^t\tilde{\mathcal {W}}_k-\mathcal {W},G(\mathcal {W})\rangle\leq \frac{1}{2\gamma(t+1)}\|\mathcal {V}-\mathcal {V}_0\|_Q^2, \forall (X,Y,Z,M,N)\in \Omega. \end{eqnarray*}$

The proof is completed. □

Noting that problem (3.1) is equivalent to find $(X^*, Y^*, Z^*, M^*, N^*)\in \Omega$ such that the following inequality

$\begin{eqnarray} \langle\mathcal {W}-\mathcal {W}^*,G(\mathcal {W}^*)\rangle\geq 0 \end{eqnarray}$

(4.23)

holds for any $(X, Y, Z, M, N)\in \Omega$ . Theorem 4.2 means that, for any initial matrices $Y_0, Z_0, M_0, N_0\in R^{n\times n}$ , the point $\widetilde{\mathcal {W}}_t$ defined in (4.20) satisfies

$\begin{eqnarray*} \langle\widetilde{\mathcal {W}}_t-\mathcal {W}^*, G(\mathcal {W}^*)\rangle\leq \frac{\|\mathcal {V}^*-\mathcal {V}_0\|_Q^2}{2\gamma(t+1)}, \end{eqnarray*}$

which means the point $\widetilde{\mathcal {W}}_t$ is an approximate solution of (4.23) with the accuracy $O(1/t)$ . That is, the convergence rate $O(1/t)$ of the Algorithm 3.2 is established in an ergodic sense.

5. Numerical experiments

In this section, we present some numerical examples to illustrate the convergence of MSADMM to the constrained least squares problem (1.2). All tests were performed by Matlab 7 with 64-bit Windows 7 operating system. In all tests, the constant $\varepsilon = 0.1$ , matrices $L$ with all elements are $-1$ and $U$ with all elements are $3$ . The matrices $A, B$ and $C$ are randomly generated, i.e., generated in Matlab style as $A = randn(n, n)$ , $B = randn(n, n)$ , $C = randn(n, n)$ . In all algorithms, the initial matrices are chosen as the null matrices. The maximum number of inner iterations and out iterations are restricted to 5000. The error tolerance $\varepsilon_{out} = \varepsilon_{in} = 10^{-9}$ in Algorithms 3.1 and 3.2. The computational methods of the projection $P_{\Omega_i}(X)\; (i = 1, 2)$ are as follows^[38].

$\begin{eqnarray*} P_{\Omega_1}(X) = \left\{\begin{array}{lll} X_{ij}, \; \mbox{if}\;\; L_{ij}\leq X_{ij}\leq U_{ij}\\ U_{ij}, \; \mbox{if}\;\; X_{ij} > U_{ij}\\ L_{ij}, \; \mbox{if}\;\; X_{ij} < L_{ij} \end{array}\right., \; P_{\Omega_2}(X) = Wdiag(d_1, d_2, \cdots, d_n)W^T \end{eqnarray*}$

where

$\begin{eqnarray*} d_i = \left\{\begin{array}{lll} \lambda_i(\frac{X+X^T}{2}), \; \mbox{if}\; \lambda_i(\frac{X+X^T}{2})\geq \varepsilon\\ \varepsilon, \; \mbox{if}\; \lambda_i(\frac{X+X^T}{2}) < \varepsilon \end{array}\right. \end{eqnarray*}$

and $W$ is such that $\frac{X+X^T}{2} = W\Delta W^T$ is spectral decomposition, i.e., $W^TW = I$ and $\Delta = diag(\lambda_1(\frac{X+X^T}{2}), \lambda_2(\frac{X+X^T}{2}), \cdots, \lambda_n(\frac{X+X^T}{2}))$ . We use LSQR algorithm described in ^[34] with necessary modifications to solve the subproblems (3.5) in Algorithm 3.1, (3.6) in Algorithm 3.2 and (6.1) in Algorithm 6.2.

The LSQR algorithm is an effective method to solve consistent linear matrix equation or least square problem of inconsistent linear matrix equation. Using this iterative algorithm, for any initial matrix, a solution can be obtained within finite iteration steps if exact arithmetic is used. In addition, using this iterative algorithm, a solution with minimum Frobenius norm can be obtained by choosing a special kind of initial matrix, and a solution which is nearest to given matrix in Frobenius norm can be obtained by first finding minimum Frobenius norm solution of a new consistent matrix equation. The LSQR algorithm to solve the subproblems (3.5), (3.6) and (6.1) can be described as follows:

Algorithm 5.1. LSQR algorithm to solve subproblems (3.5) and (3.6).
Step 1. Input matrices $A, B, C, Y_k, Z_k, M_k$ and $N_k$ , penalty parameter $\alpha > 0$ and error tolerance $\varepsilon_{in}$ . Compute
$\begin{eqnarray} &&\eta_1= \left(\begin{array}{l} \\|C\\|^2+\left\\|\begin{array}{l}\sqrt{\alpha}Y_k+\frac{M_k}{\sqrt{\alpha}}\end{array}\right\\|^2 +\left\\|\begin{array}{l}\sqrt{\alpha}Z_k+\frac{N_k}{\sqrt{\alpha}} \end{array}\right\\|^2 \end{array}\right)^{1/2}, \\ &&U_1^{(1)}=\frac{C}{\eta_1}, \; U_1^{(2)}=\frac{\sqrt{\alpha}Y_k+\frac{M_k}{\sqrt{\alpha}}}{\eta_1}, \; U_1^{(3)}=\frac{\sqrt{\alpha}Z_k+\frac{N_k}{\sqrt{\alpha}}}{\eta_1}, \\ &&\xi_1=\\|A^TU_1^{(1)}+U_1^{(1)}B^T+\sqrt{\alpha}U_1^{(2)}+\sqrt{\alpha}U_1^{(3)}\\|, \\ \\ &&\Gamma_1=\frac{A^TU_1^{(1)}+U_1^{(1)}B^T+\sqrt{\alpha}U_1^{(2)}+\sqrt{\alpha}U_1^{(3)}}{\xi_1}, \\ \\ &&\Phi_1=\Gamma_1, \; \bar{\phi}=\eta_1, \; \bar{\rho}_1=\xi_1.\\ &&\mathit{\mbox{Let}} i=:1; \end{eqnarray}$
Step 2. Compute
$\begin{eqnarray} &&\eta_{i+1}=(\\|A\Gamma_i+\Gamma_iB-\xi_iU_i^{(1)}\\|^2+\\|\sqrt{\alpha}\Gamma_i-\xi_i U_i^{(2)}\\|^2+\\|\sqrt{\alpha}\Gamma_i-\xi_i U_i^{(3)}\\|^2)^{1/2}, \\ &&U_{i+1}^{(1)}=\frac{A\Gamma_i+\Gamma_iB-\xi_iU_i^{(1)}}{\eta_{i+1}}, \; U_{i+1}^{(2)}=\frac{\sqrt{\alpha}\Gamma_i-\xi_i U_i^{(2)}}{\eta_{i+1}}, \; U_{i+1}^{(3)}=\frac{\sqrt{\alpha}\Gamma_i-\xi_i U_i^{(3)}}{\eta_{i+1}}, \\ &&\xi_{i+1}=\\|A^TU_{i+1}^{(1)}+U_{i+1}^{(1)}B^T+\sqrt{\alpha}U_{i+1}^{(2)}+\sqrt{\alpha}U_{i+1}^{(3)}-\eta_{i+1}\Gamma_i\\|, \\ &&\Gamma_{i+1}=\frac{A^TU_{i+1}^{(1)}+U_{i+1}^{(1)}B^T+\sqrt{\alpha}U_{i+1}^{(2)}+\sqrt{\alpha}U_{i+1}^{(3)}-\eta_{i+1}\Gamma_i}{\xi_{i+1}}, \\ &&\rho_i=(\bar{\rho}_i^2+\eta_{i+1}^2)^{1/2}, \; c_i=\frac{\bar{\rho}_i}{\rho_i}, \; s_i=\frac{\eta_{i+1}}{\rho_i}, \; \theta_{i+1}=s_i\xi_{i+1}, \; \bar{\rho}_{i+1}=-c_i\xi_{i+1}, \\ &&\phi_{i}=c_i\bar{\phi}_i, \; \bar{\phi}_{i+1}=s_i\bar{\phi}_i, \\ &&X_{i+1}=X_i+\frac{\phi_i}{\rho_i}\Phi_i, \; \Phi_{i+1}=\Gamma_{i+1}-\frac{\theta_{i+1}}{\rho_i}\Phi_i; \end{eqnarray}$
Step 3. If $\\|X_{i+1}-X_i\\| < \varepsilon_{in}$ , terminate the execution of the algorithm. (In this case, $X_i$ is a solution of problem (3.5) or (3.6));
Step 4. Let $i=: i+1$ and go to step 2.

| Show Table

DownLoad: CSV

reports the average computing time (CPU) of 10 tests of Algorithm 3.1 (ADMM) and Algorithm 3.2 (MSADMM) with penalty parameter $\alpha = n$ . – report the computing time of ADMM with the same size of problem and different penalty parameters $\alpha$ . – report the computing time of MSADMM with the same size of problem and different correction factor $\gamma$ . Figure 9 reports the computing time curve of Algorithm 3.1 (ADMM) and Algorithm 3.2 (MSADMM) with different matrix size.

Table 1. Numerical comparisons between MSADMM and ADMM.

$\alpha =$ n	ADMM	MSADMM( $\gamma = 0.8$ )	MSADMM( $\gamma = 1.0$ )	MSADMM( $\gamma = 1.5$ )
20	0.0846	0.1254	0.0865	0.0538
40	0.3935	0.4883	0.3836	0.2676
80	1.3370	1.6930	1.3726	0.8965
100	2.4766	3.1514	2.5015	1.6488
150	5.8780	7.3482	5.8742	3.9154
200	11.6398	14.6023	11.6162	7.6576
300	35.1929	41.4151	32.9488	21.4898
400	83.7386	108.9807	86.0472	56.8144
500	147.5759	183.4758	144.2038	92.3137
600	242.0408	302.6395	241.6225	157.8042

| Show Table

DownLoad: CSV

Figure 1. Computing times (seconds) vs. the values of

$\alpha$ for n = 40.

DownLoad: Full-Size Img PowerPoint

Figure 2. Computing times (seconds) vs. the values of

$\alpha$ for n = 60.

DownLoad: Full-Size Img PowerPoint

Figure 3. Computing times (seconds) vs. the values of

$\alpha$ for n = 80.

DownLoad: Full-Size Img PowerPoint

Figure 4. Computing times (seconds) vs. the values of

$\alpha$ for n = 100.

DownLoad: Full-Size Img PowerPoint

Figure 5. Computing times (seconds) vs. the values of

$\gamma$ for

$\alpha$ = n = 40.

DownLoad: Full-Size Img PowerPoint

Figure 6. Computing times (seconds) vs. the values of

$\gamma$ for

$\alpha$ = n = 60.

DownLoad: Full-Size Img PowerPoint

Figure 7. Computing times (seconds) vs. the values of

$\gamma$ for

$\alpha$ = n = 80.

DownLoad: Full-Size Img PowerPoint

Figure 8. Computing times (seconds) vs. the values of

$\gamma$ for

$\alpha$ = n = 100.

DownLoad: Full-Size Img PowerPoint

Figure 9. Numerical comparisons between ADMM and MSADMM.

DownLoad: Full-Size Img PowerPoint

Based on the tests reported in Table 1, Figures 1–9 and many other performed unreported tests which show similar patterns, we have the following results:

Remark 5.1. The convergence speed of ADMM is directly related to the penalty parameter $\alpha$ . In general, the penalty parameter $\alpha$ in this paper can be chosen as $\alpha\approx n$ (see –). However, how to select the best penalty parameter $\alpha$ is an important problem should be studied future time.

Remark 5.2. The convergence speed of MSADMM is direct relation to the penalty parameter $\alpha$ and the correction factor $\gamma$ . The selection of the penalty parameter $\alpha$ is similar to ADMM since MSADMM is a direct extension of ADMM. For the correction factor $\gamma$ , as showed in and –, aggressive values such as $\gamma\approx 1.5$ are often preferred. However, how to select the best correction factor $\gamma$ is also an important problem should be studied future time.

Remark 5.3. As showed in and , MSADMM, with the correction factor $\gamma\approx 1.5$ and the penalty parameter $\alpha$ be chosen as the same as ADMM, is more effective than ADMM.

6. Algorithm 3.1 with Anderson acceleration

Anderson acceleration, or Anderson mixing, was initially developed in 1965 by Donald Anderson ^[35] as an iterative procedure to solve some nonlinear integral equations arising in physics. It turns out that the Anderson acceleration is very efficient to solve other types of nonlinear equations as well, see ^[36,37,38], and the literature cited therein. When Anderson acceleration is applied to the equation $f(x) = g(x)-x = 0$ , the iterative pattern can be described as the following Algorithm 6.1.

Algorithm 6.1. Anderson accelerated method to solve the equation $f(x)=0$ .
Given $x_0 \in R^n$ and an integer $m \geq 1$ this algorithm produces a sequence $x_k$ of iterates intended to converge to a fixed point of the function $g:R^n\rightarrow R^n$
Step 1. Compute $x_1=g(x_0)$ ;
Step 2. For $k=1, 2, \cdots$ until convergence;
Step 3. Let $m_k=\min(m, k)$ ;
Step 4. Compute $\lambda_k=(\lambda_1, \lambda_2, \cdots, \lambda_{m_k})^T\in\mathbb{R}^{m_k}$ that solves
$\min\limits_{\lambda\in R^{m_k}}\\|f(x_k)-\sum\limits_{j=1}^{m_k}\lambda_j(f(x_{k-m_k+j})-f(x_{k-m_k+j-1}))\\|_2^2;$
Step 5. Set
$x_{k+1}=g(x_k)+\sum\limits_{j=1}^{m_k-1}\lambda_j[g(x_{k-m_k+j+1})-g(x_{k-m_k+j})].$

| Show Table

DownLoad: CSV

In this we define the following matrix functions

$f(Y,Z,M,N) = g(Y,Z,M,N)-(Y,Z,M,N),$

where $g(Y_k, Z_k, M_k, N_k) = (Y_{k+1}, Z_{k+1}, M_{k+1}, N_{k+1})$ with $Y_{k+1}, Z_{k+1}, M_{k+1}$ and $N_{k+1}$ are computed by (b)–(e) in Algorithm 3.1, and let $f_k = f(Y_k, Z_k, M_k, N_k), g_k = g(Y_k, Z_k, M_k, N_k)$ , then Algorithm 3.1 with Anderson acceleration can be described as the following Algorithm 6.2.

Algorithm 6.2. Algorithm 3.1 with Anderson acceleration to solve problem (3.1).
Step 1. Input matrices $A, B, C, L$ and $U$ . Input constant $\varepsilon > 0$ , error tolerance $\varepsilon_{out} > 0$ , penalty parameter $\alpha > 0$ and integer $m \geq 1$ . Choose initial matrices $Y_0, Z_0, M_0, N_0\in R^{n\times n}$ . Let $k=:0$ ;
Step 2. Compute
$\begin{eqnarray} X_{k+1}=arg\min_{X\in R^{n\times n}}L_{\alpha}(X, Y_k, Z_k, M_k, N_k); \end{eqnarray} \; \; \; \; \; \; \; \; \; \; \; (6.1)$
Step 3. Let $m_k=\min(m, k)$ ;
Step 4. Compute $\lambda_k=(\lambda_1, \lambda_2, \cdots, \lambda_{m_k})^T\in\mathbb{R}^{m_k}$ that solves
$\min\limits_{\lambda\in R^{m_k}}\\|f_k-\sum\limits_{j=1}^{m_k}\lambda_j(f_{k-m_k+j+1}-f_{k-m_k+j})\\|^2;$
Step 5. Set
$\begin{eqnarray} (Y_{k+1}, Z_{k+1}, M_{k+1}, N_{k+1})=g_k+\sum_{j=1}^{m_k-1}\lambda_j(g_{k-m_k+j+1}-g_{k-m_k-j}); \end{eqnarray}$
Step 6. If $(\\|Y_{k+1}-Y_k\\|^2+\\|Z_{k+1}-Z_k\\|^2+\\|M_{k+1}-M_k\\|^2+\\|N_{k+1}-N_k\\|^2)^{1/2}\leq \varepsilon_{out}$ , stop. In this case, $X_{k+1}$ is an approximate solution of problem (3.1);
Step 7. Let $k=:k+1$ and go to step 2.

| Show Table

DownLoad: CSV

Algorithm 6.2 (ACADMM) is $m$ -step iterative method, that is, the current iterates is obtained by the linear combination of the previous $m$ steps. Furthermore, the combination coefficients of the linear combination are modified at each iteration steps. Compared to ACADMM, Algorithm 3.2 (MSADMM) is two-step iterative method and the combination coefficients of the linear combination are fixed at each iteration steps. The convergence speed of ACADMM is directly related to the penalty parameter $\alpha$ and the backtracking step $m$ . The selection of the penalty parameter $\alpha$ is the same as ADMM since ACADMM's iterates are corrected by ADMM's iterates. For the backtracking step $m$ , as showed in (the average computing time of 10 tests) and , aggressive values such as $m = 10$ are often preferred (in this case, ACADMM is more efficient than MSADMM). However, how to select the best backtracking step $m$ is an important problem which should be studied in near future.

Table 2. Numerical comparisons between MSADMM and ACADMM.

$\alpha =$ n	MSADMM( $\gamma = 1.5$ )	ACADMM( $m = 2$ )	ACADMM( $m = 10$ )	ACADMM( $m = 20$ )
20	0.0735	0.0991	0.0521	0.0647
40	0.2595	0.3485	0.2186	0.2660
80	1.1866	1.1319	1.0232	1.1069
100	1.7750	2.2081	1.6003	1.7660
150	3.8474	5.2760	3.5700	3.9587
200	7.6133	10.8719	6.9807	7.7318
300	21.2970	28.5379	18.8233	19.9526
400	56.2133	74.3192	44.8087	46.5275
500	98.0542	130.2326	75.6044	80.1480
600	157.7573	208.1124	125.2842	133.4549

| Show Table

DownLoad: CSV

Figure 10. Numerical comparisons between MSADMM and ACADMM.

DownLoad: Full-Size Img PowerPoint

7. Conclusions

In this paper, the multiple constraint least squares solution of the Sylvester equation $AX+XB = C$ is discussed. The necessary and sufficient conditions for the existence of solutions to the considered problem are given (Theorem 2.1). MSADMM to solve the considered problem is proposed and some convergence results of the proposed algorithm are proved (Theorem 4.1 and Theorem 4.2). Problems which should be studied in the near future are listed. Numerical experiments show that MSADMM with a suitable correction factor $\gamma$ is more effective than ADMM (See and ), and ACADMM with a suitable backtracdking step $m$ is the most effective of ADMM, MSADMM and ACADMM (See Table 2 and Figure 10).

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work was supported by National Natural Science Foundation of China (grant number 11961012) and Special Research Project for Guangxi Young Innovative Talents (grant number AD20297063).

Conflict of interest

The authors declare no competing interests.

References

[1]	World Health Organisation, Obesity and overweight, 2018. Available from: http://www.who.int/mediacentre/factsheets/fs311/en/.
[2]	G. Ni, J. Zhang, F. Zheng, The current situation and trend of obesity epidemic in China, Food Nutr. China, 19 (2013), 70–74.
[3]	World Health Organisation, What are the health consequences of being overweight?, 2013. Available from: https://www.who.int/features/qa/49/en/.
[4]	F. Lo, Y. Sun, J. Qiu, B. Lo, Image-based food classification and volume estimation for dietary assessment: A review, IEEE J. Biomed. Health Inform., 24 (2020), 1926–1939. https://doi.org/10.1109/JBHI.2020.2987943 doi: 10.1109/JBHI.2020.2987943
[5]	W. Tay, B. Kaur, R. Quek, Current developments in digital quantitative volume estimation for the optimisation of dietary assessment, Nutrients, 12 (2020), 1167. https://doi.org/10.3390/nu12041167 doi: 10.3390/nu12041167
[6]	I. Nyalala, C. Okinda, K. Chen, T. Korohou, L. Nyalala, C. Qi, Weight and volume estimation of poultry and products based on computer vision systems: A review, Poult. Sci., 100 (2021). https://doi.org/10.1016/j.psj.2021.101072 doi: 10.1016/j.psj.2021.101072
[7]	V. B. Raju, E. Sazonov, A Systematic Review of Sensor-Based Methodologies for Food Portion Size Estimation, IEEE Sens. J., 21 (2021), 12882–12899. https://doi.org/10.1109/JSEN.2020.3041023 doi: 10.1109/JSEN.2020.3041023
[8]	M. Sun, Q. Liu, K. Schmidt, J. Yang, N. Yao, J. Fernstrom, et al., Determination of food portion size by image processing, Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., (2008), 871–874. https://doi.org/10.1109/EMBS10205.2008 doi: 10.1109/EMBS10205.2008
[9]	Y. Yang, W. Jia, T. Bucher, H. Zhang, M. Sun, Image-based food portion size estimation using a smartphone without a fiducial marker, Public Health Nutrition, 22 (2018), 1180–1192. https://doi.org/10.1017/S136898001800054X doi: 10.1017/S136898001800054X
[10]	F. Zhu, M. Bosch, I. Woo, S. Kim, C. Boushey, D. Ebert, et al., The use of mobile devices in aiding dietary assessment and evaluation, IEEE J. Sel. Top. Sign. Proces., 4 (2010), 756–766. https://doi.org/10.1109/JSTSP.2010.2051471 doi: 10.1109/JSTSP.2010.2051471
[11]	H. C. Chen, Y. Yue, Z. Li, J. Fernstrom, Y. Bai, C. Li, et al., Accuracy of food portion size estimation from digital pictures acquired by a chest-worn camera, Public Health Nutr., 17 (2014), 1671–1681. https://doi.org/10.1017/S1368980013003236 doi: 10.1017/S1368980013003236
[12]	H. Chen, W. Jia, Y. Yue, Z. Li, Y. Sun, J. Fernstrom, et al., Model-based measurement of food portion size for image-based dietary assessment using 3D/2D registration, Meas. Sci. Technol., 24 (2013). https://doi.org/10.1088/0957-0233/24/10/105701 doi: 10.1088/0957-0233/24/10/105701
[13]	C. Xu, Y. He, N. Khanna, C. Boushey, E. Delp, Model-based food volume estimation using 3D pose, IEEE Int. Conf. Image Process., (2013), 2534–2538. https://doi.org/10.1109/ICIP.2013.6738522 doi: 10.1109/ICIP.2013.6738522
[14]	J. Dehais, M. Anthimopoulos, S. Shevchik, S. Mougiakakou, Two-view 3D reconstruction for food volume estimation, IEEE Trans Multimedia, 19 (2017), 1090–1099. https://doi.org/10.1109/TMM.2016.2642792 doi: 10.1109/TMM.2016.2642792
[15]	M. Puri, Z. Zhu, Q. Yu, A. Divakaran, H. Sawhney, Recognition and volume estimation of food intake using a mobile device, Workshop Appl. Comput. Vis., (2009), 1–8. https://doi.org/10.1109/WACV.2009.5403087 doi: 10.1109/WACV.2009.5403087
[16]	M. Rahman, Q. Li, M. Pickering, M. Frater, D. Kerr, C. Bouchey, et al., Food volume estimation in a mobile phone based dietary assessment system, Int. Conf. Signal Image Technol. Internet Based Syst., (2012), 988–995. https://doi.org/10.1109/SITIS.2012.146 doi: 10.1109/SITIS.2012.146
[17]	T. Suzuki, K. Futatsuishi, K. Yokoyama, N. Amaki, Point cloud processing method for food volume estimation based on dish space, Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., (2020), 5665–5668. https://doi.org/10.1109/EMBC44109.2020.9175807 doi: 10.1109/EMBC44109.2020.9175807
[18]	H. Yin, 3D reconstruction from infrared stereo image pairs, Masters Abstr. Inte., 2013.
[19]	L. Zhou, C. Zhang, F. Liu, Z. Qiu, Y. He, Application of deep learning in food: A review. Compr. Rev. Food Sci. Food Saf., 18 (2019), 1793–1811. https://doi.org/10.1111/1541-4337.12492 doi: 10.1111/1541-4337.12492
[20]	F. Lo, Y. Sun, J. Qiu, B. Lo, Food volume estimation based on deep learning view synthesis from a single depth map, Nutr., 10 (2018). https://doi.org/10.3390/nu10122005 doi: 10.3390/nu10122005
[21]	F. Boemer, E. Ratner, A. Lendasse, Parameter-free image segmentation with SLIC, Neurocomputing, 277 (2018), 228–236. https://doi.org/10.1016/j.neucom.2017.05.096 doi: 10.1016/j.neucom.2017.05.096
[22]	J. Hou, C. Sha, L. Chi, Q. Xia, N. Qi, Merging dominant sets and DBSCAN for robust clustering and image segmentation, IEEE Int. Conf. Image Process., (2014), 4422–4426. https://doi.org/10.1109/ICIP.2014.7025897 doi: 10.1109/ICIP.2014.7025897
[23]	P. Torr, A. Zisserman, MLESAC: A New Robust Estimator with Application to Estimating Image Geometry, Comput. Vis. Image Und., 78 (2000), 138–156, https://doi.org/10.1006/cviu.1999.0832 doi: 10.1006/cviu.1999.0832

This article has been cited by:

1.	Shougui Zhang, Xiyong Cui, Guihua Xiong, Ruisheng Ran, An Optimal ADMM for Unilateral Obstacle Problems, 2024, 12, 2227-7390, 1901, 10.3390/math12121901
2.	Mohammad Khorsand Zak, Abbas Abbaszadeh Shahri, A Robust Hermitian and Skew-Hermitian Based Multiplicative Splitting Iterative Method for the Continuous Sylvester Equation, 2025, 13, 2227-7390, 318, 10.3390/math13020318

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)