Bayesian quantile regression for streaming data

Zixuan Tian; Xiaoyue Xie; Jian Shi; Zixuan Tian; Xiaoyue Xie; Jian Shi

doi:10.3934/math.20241276

AIMS Mathematics

2024, Volume 9, Issue 9: 26114-26138. doi: 10.3934/math.20241276

Previous Article Next Article

Research article Special Issues

Bayesian quantile regression for streaming data

1.
Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing, China
2.
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
3.
Equipment Management and UAV Engineering, Air Force Engineering University, Xi'an, China
^# These authors contributed equally to this work.

Received: 07 June 2024 Revised: 21 August 2024 Accepted: 29 August 2024 Published: 10 September 2024
MSC : 62F15, 62G08, 68W27

Quantile regression has been widely used in many fields because of its robustness and comprehensiveness. However, it remains challenging to perform the quantile regression (QR) of streaming data by a conventional methods, as they are all based on the assumption that the memory can fit all the data. To address this issue, this paper proposes a Bayesian QR approach for streaming data, in which the posterior distribution was updated by utilizing the aggregated statistics of current and historical data. In addition, theoretical results are presented to confirm that the streaming posterior distribution is theoretically equivalent to the orcale posterior distribution calculated using the entire dataset together. Moreover, we provide an algorithmic procedure for the proposed method. The algorithm shows that our proposed method only needs to store the parameters of historical posterior distribution of streaming data. Thus, it is computationally simple and not storage-intensive. Both simulations and real data analysis are conducted to illustrate the good performance of the proposed method.

Keywords:

Citation: Zixuan Tian, Xiaoyue Xie, Jian Shi. Bayesian quantile regression for streaming data[J]. AIMS Mathematics, 2024, 9(9): 26114-26138. doi: 10.3934/math.20241276

Related Papers:

[1]	Zuliang Lu, Xiankui Wu, Fei Huang, Fei Cai, Chunjuan Hou, Yin Yang . Convergence and quasi-optimality based on an adaptive finite element method for the bilinear optimal control problem. AIMS Mathematics, 2021, 6(9): 9510-9535. doi: 10.3934/math.2021553
[2]	Zuliang Lu, Fei Cai, Ruixiang Xu, Chunjuan Hou, Xiankui Wu, Yin Yang . A posteriori error estimates of hp spectral element method for parabolic optimal control problems. AIMS Mathematics, 2022, 7(4): 5220-5240. doi: 10.3934/math.2022291
[3]	Xin Zhao, Xin Liu, Jian Li . Convergence analysis and error estimate of finite element method of a nonlinear fluid-structure interaction problem. AIMS Mathematics, 2020, 5(5): 5240-5260. doi: 10.3934/math.2020337
[4]	Zuliang Lu, Ruixiang Xu, Chunjuan Hou, Lu Xing . A priori error estimates of finite volume element method for bilinear parabolic optimal control problem. AIMS Mathematics, 2023, 8(8): 19374-19390. doi: 10.3934/math.2023988
[5]	Changling Xu, Hongbo Chen . A two-grid $P_0^2$ - $P_1$ mixed finite element scheme for semilinear elliptic optimal control problems. AIMS Mathematics, 2022, 7(4): 6153-6172. doi: 10.3934/math.2022342
[6]	Tiantian Zhang, Wenwen Xu, Xindong Li, Yan Wang . Multipoint flux mixed finite element method for parabolic optimal control problems. AIMS Mathematics, 2022, 7(9): 17461-17474. doi: 10.3934/math.2022962
[7]	Yuelong Tang . Error estimates of mixed finite elements combined with Crank-Nicolson scheme for parabolic control problems. AIMS Mathematics, 2023, 8(5): 12506-12519. doi: 10.3934/math.2023628
[8]	Miao Xiao, Zhe Lin, Qian Jiang, Dingcheng Yang, Xiongfeng Deng . Neural network-based adaptive finite-time tracking control for multiple inputs uncertain nonlinear systems with positive odd integer powers and unknown multiple faults. AIMS Mathematics, 2025, 10(3): 4819-4841. doi: 10.3934/math.2025221
[9]	Jie Liu, Zhaojie Zhou . Finite element approximation of time fractional optimal control problem with integral state constraint. AIMS Mathematics, 2021, 6(1): 979-997. doi: 10.3934/math.2021059
[10]	Cagnur Corekli . The SIPG method of Dirichlet boundary optimal control problems with weakly imposed boundary conditions. AIMS Mathematics, 2022, 7(4): 6711-6742. doi: 10.3934/math.2022375

Abstract

1. Introduction

With the development of computer technology, the finite element method follows closely in the early 1960s. The research on the reliability and validity of finite element analysis promoted the development of the finite element method ^{[3,4,5,25,27]}. Applying finite element methods, the emergence of errors has captured the attention of scholars. One of the main sources of errors is the error caused by the discretisation of the model, while researchers dissect various aspects of finite element analysis. In addition, the generation of finite element mesh is also a concern for scholars. In the early conventional FEA, scholars usually used experience, intuition or even guesses to generate meshes to simple judge whether the approximation results are reasonable or not. If it is not reasonable, the grid needs to be redesigned whenever necessary for the efficiency of the analysis and the reliability of the results. Therefore, the emergence of the adaptive finite element method is because the computer, after checking the current conditions, decides whether the solution is accurate enough to meet the determination needs according to the error information obtained during the adjustment process.

To the best of our knowledge, the adaptive finite element method is a numerical method that can automatically adjust the algorithm to improve the process of solving ^[2]. An appropriate mesh can greatly reduce the errors arising from the discretisation of the finite element approximation process during the replication. According to the current situation, solutions of optimal control problems for nonlinear systems are usually not available. Besides the complexity and diversity of nonlinear equations, it is also very practical to use the adaptive finite element method for solving nonlinear equations.

Adaptive finite element methods have been widely and successfully applied in various linear optimal control problems, for example, Eriksson and Johnson proposed that the adaptive finite element algorithm produces a series of successively refined meshes in which the final mesh satisfies a given error tolerance ^[11]. Gaevaskaya and Hope et al. proposed an adaptive finite element method for a class of distributed optimal control problems with control constraints is analysed by applying the reliability and discrete local efficiency of the posterior estimator and the quasi-orthogonality property as basic tools ^[13]. Braess and Carstensen et al. Investigate the residual jump contributions of a standard explicit residual-based a posteriori error estimator for an adaptive finite element method ^[1]. Hu and Xu et al. performed research on the convergence and optimality of the adaptive nonconforming linear element method for the stokes problem ^[17].

However, a large number of literature show that the researches on adaptive finite element method for nonlinear optimal control problems have not reached its peak yet.

Recently, for instance, Wriggers and Scherf propose an adaptive finite element technique for nonlinear contact problems ^[29]. Eriksson and Johnson consider adaptive finite element methods for parabolic problems to a class of nonlinear scalar problems, the authors obtain posteriori error estimates and design corresponding adaptive algorithms ^[12]. Nowadays, the problem of nonlinear optimal control problems, similar to big data, is the focus of scholars worldwide. Hence it is worthy of investigating the adaptive finite element method for such nonlinear problems.

Many scholars have also studied the prior error estimates of the bilinear optimal control problem. For example, Yang, Demlow, and Dobrowolski et al. investigated the prior error estimates and superconvergence of optimal control problems for bilinear models and give the optimal $L^2$ -norm error estimates and the almost optimal $L^\infty$ -norm estimates about the state and co-state variables ^[7,8,31]. Lu investigated a second-order parabolic bilinear optimal control problems and provided a priori error estimates for the finite element solutions of the state equations describing the system ^[24]. Shen and Yang et al. investigated a quadratic optimal control problem governed by a linear hyperbolic integro differential equation and its finite element approximation, a priori estimates have been carried out using the standard functional analysis techniques, and the existence and regularity of the solution are provided by using these estimates. At the same time, some scholars have analysed the posteriori error estimates of the finite element method for bilinear optimal control problems ^[15,26]. Lu, Chen, and Leng et al. discussed the discretisation of Raviart-Thomas mixed finite element for general bilinear optimal control problems, a posteriori error estimates are derived for both the coupled state and the control solutions ^[18,23]. Although bilinear optimal control problems are frequently met in applications, they are much more difficult to handle than linear or nonlinear cases. There is little work on nonlinear optimal control problems.

In this paper, we focus on nonlinear optimal control problem with integral control constraints where we deal with the control via adopting piecewise constant discetization while applying continuous piecewise linear discretization for the state and the co-state, respectively. Then a posteriori error estimates is gained. For the convergence and the quasi-optimality, we prove them relying on quasi-orthogonality and discrete local upper bound. Based on the mild assumption to the initial grids, we obtain the proof of convergence and quasi-optimality by means of the solution operator of nonlinear elliptic equations. Finally, some numerical simulations are provided support for our theoretical analysis.

Here are some notations will be used in this paper. Let $\Omega$ be a bounded Lipschitz domain in $\mathbb{R}^2$ and $\partial\Omega$ denote the boundary of $\Omega$ . We use the standard notation $W^{m, q}(\omega)$ with norm $\|\cdot\|_{m, q, \omega}$ and seminorm $|\cdot|_{m, q, \omega}$ to express the standard Sobolev space for $\omega\subset\Omega$ . Moreover, we will omit the subscription if $\omega = \Omega$ . For $q = 2$ , we denote $W^{m, 2}(\Omega)$ by $H^m(\Omega)$ and $\|\cdot\|_m = \|\cdot\|_{m, 2}$ . Also for $m = 0 \ \mathrm{and}\ q = 2$ , we denote $W^{0, 2}(\omega) = L^2(\omega)$ and $\|\cdot\|_{0, 2, \omega} = \|\cdot\|_{0, \omega}$ . Additionally, we observe that $H^1_0(\Omega) = \{v\in H^1(\Omega):v = 0\ \mathrm{on}\ \partial\Omega\}$ . Let $\mathcal{T}_{h_0}$ be the initial partition of $\bar{\Omega}$ into disjoint triangles. By newest-vertex bisections for $\mathcal{T}_{h_0}$ , we can obtain a class $\mathbb{T}$ of conforming partitions. For $\mathcal{T}_h, \tilde{\mathcal{T}}_h \in \mathbb{T}$ , we use $\mathcal{T}_h\subset\mathcal{\tilde{T}}_h$ to indicate that $\tilde{\mathcal{T}}_h$ is a refinement of $\mathcal{T}_h$ and $h_{T} = |T|^{1/2}$ , $T$ is the partition diameter. In addition $(\cdot, \cdot)$ denotes the $L^2$ inner product. Beyond that, let $C$ be a constant which independent of grids size, then we use $A\approx B$ to represent $cA\leq B\leq CA$ .

The rest of the paper is organized as follows. In Section 2, we introduce the optimal control problem of our interest and obtain a posteriori error estimates. The relevant algorithms are introduced in Section 3. In Section 4, we use quasi-orthogonality and discrete local upper bounds to prove the convergence of the adaptive finite element method, as well as quasi-optimality in Section 5. We provide an adaptive finite element algorithm and some numerical simulations to verify our theoretical analysis in Section 6. Finally, we summarize the results of this paper and develop a plan for future work.

2. A posteriori error estimates for finite element method

In this paper we mainly enter into meaningful discussions with the following nonlinear optimal control problem governed by nonlinear elliptic equations:

$\begin{align*} &\min\limits_{u\in U_{ad}}\bigg\{\frac{1}{2}\|y-y_{d}\|^2_{0}+\frac{\alpha}{2}\|u\|^2_{0}\bigg\}, \\ &-\Delta y+\phi(y) = f+u, \quad \mathrm{in} \ \Omega, \\ &\qquad\qquad y = 0, \quad \mathrm{on} \ \partial\Omega, \end{align*}$

where $y$ is the state variable, $u$ is the control variable, $f$ is a function of the control variable, $\alpha$ is a constant greater than zero, $y_{d}\in L^2(\Omega), \ U_{ad} = \{v:v\in L^2(\Omega), \ \int_{\Omega}v \ dx\geq 0\}$ is a closed convex subset of $U = L^2(\Omega)$ and $\phi(\cdot)\in W^{2, \infty}(-R, R)$ for any $R > 0, \ \phi'(y)\in L^2(\Omega)$ for any $y\in H^1(\Omega), \ \phi'\geq0$ . Let $V = H_{0}^1(\Omega)$ , we give the weak formulation to deal with state equation, namely, find $y\in V$ such that

$a(y, v)+(\phi(y), v) = (f+u, v), \quad\forall \ v\in V,$

where

$a(y, v) = \int_{\Omega}\nabla {y}\cdot\nabla {v} \ dx.$

Then the nonlinear optimal control problem can be restated as follows

$\begin{align} &\min\limits_{u\in U_{ad}}\bigg\{\frac{1}{2}\|y-y_{d}\|^2_{0}+\frac{\alpha}{2}\|u\|^2_{0}\bigg\}, \end{align}$

(2.1)

$\begin{align} &a(y, v)+(\phi(y), v) = (f+u, v), \quad \forall \ v\in V. \end{align}$

(2.2)

It is well known ^[14,20] that the nonlinear optimal control problem has at least one solution $(y, u)$ , and that if a pair $(y, u)$ is the solution of the nonlinear optimal control problem, then there is a co-state $p\in V$ such that the triplet $(y, p, u)$ satisfies the following optimality conditions:

$\begin{align} &a(y, v)+(\phi(y), v) = (f+u, v), \quad \forall \ v\in V, \end{align}$

(2.3)

$\begin{align} &a(q, p)+(\phi'(y)p, q) = (y-y_{d}, q), \quad \forall \ q\in V, \end{align}$

(2.4)

$\begin{align} &(\alpha u+p, v-u)\geq 0, \quad \forall \ v\in U_{ad}. \end{align}$

(2.5)

Due to coercivity of $a(\cdot, \cdot)$ , we define a linear operator $S:L^2(\Omega)\rightarrow H_{0}^1(\Omega)$ such that $S(f+u) = y$ and let $S^*$ be the adjoint of $S$ such that $S^*(y-y_{d}) = p$ where $V_h$ is the continuous piecewise linear finite element space with respect to the partition $\mathcal{T}_h\in \mathbb{T}$ . For $\mathcal{T}_h\in \mathbb{T}$ , we define $U^h$ as the piecewise constant finite element space with respect to $\mathcal{T}_h$ . Let $U_{ad}^h = \{v_h\in U^h:\int_{\Omega}v_hdx\geq 0\}$ . Then we derive the standard finite element discretization for the nonlinear optimal control problem as follows:

$\begin{align} &\min\limits_{u_h\in U_{ad}^h}\bigg\{\frac{1}{2}\|y_h-y_{d}\|_{0}^2+\frac{\alpha}{2}\|u_h\|^2_{0}\bigg\}, \end{align}$

(2.6)

$\begin{align} &a(y_h, v)+(\phi(y_h), v) = (f+u_h, v), \quad\forall\ v\in V_{h}. \end{align}$

(2.7)

Similarly the nonlinear optimal control problem (2.6)–(2.7) has a solution $(y_h, u_h)$ , and that if a pair $(y_h, u_h)\in V^h\times U^h$ is the solution of (2.6)–(2.7), then there is a co-state $p_h \in V_{h}$ such that the triplet $(y_{h}, p_{h}, u_{h})$ satisfies the following optimality conditions:

$\begin{align} &a(y_{h}, v)+(\phi(y_{h}), v) = (f+u_{h}, v), \quad \forall \ v\in V_{h}, \end{align}$

(2.8)

$\begin{align} &a(q, p_{h})+(\phi'(y_{h})p_{h}, q) = (y_{h}-y_{d}, q), \quad \forall \ q\in V_{h}, \end{align}$

(2.9)

$\begin{align} &(\alpha u_{h}+p_{h}, v_{h}-u_{h})\geq 0, \quad \forall \ v_{h}\in U_{ad}^h. \end{align}$

(2.10)

Here we define some error indicators we largely and frequently use in this paper where $\eta(\cdot)$ are error indicators and $osc(\cdot)$ represent the data oscillations. For $\mathcal{T}_h\in \mathbb{T}, \ T\in \mathcal{T}_h$ , we define

$\begin{align*} &\eta_{1, \mathcal{T}_h}^2(p_{h}, T) = h_{T}^2\|\nabla p_{h}\|_{0, T}^2, \\ &\eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, T) = h_{T}^2\|f+u_{h}-\phi(y_{h})\|_{0, T}^2+h_{T}\|[\nabla{y_{h}}]\cdot \mathbf{n}\|^2_{0, \partial T \backslash\partial\Omega}, \\ &\eta^2_{3, \mathcal{T}_h}(y_{h}, p_{h}, T) = h_{T}^2\|y_{h}-y_{d}-\phi'(y_{h})p_{h}\|_{0, T}^2+h_{T}\|[\nabla{p_{h}}]\cdot\mathbf{n}\|^2_{0, \partial T\backslash\partial\Omega}, \\ &osc^2_{\mathcal{T}_h}(f, T) = h_{T}^2\|f-f_{T}\|_{0, T}^2, \\ &osc_{\mathcal{T}_h}^2(y_{h}-y_{d}, T) = h_{T}^2\|(y_{h}-y_{d})-(y_{h}-y_{d})_T\|_{0, T}^2, \end{align*}$

where $u_{h}\in U_{ad}^{h}$ , $y_{h}, p_{h}\in V_{h}$ , and where $f_{T}$ is $L^2$ -projection of $f$ onto piecewise constant space on $T$ and $f_T = \frac{\int_Tf}{|T|}$ . For $\omega\subset \mathcal{T}_h$ , we have

$\begin{align*} &\eta^2_{1, \mathcal{T}_h}(p_{h}, \omega) = \sum\limits_{T\in \omega}\eta^2_{1, \mathcal{T}_h}(p_{h}, T), \\ &osc_{\mathcal{T}_h}^2(f, \omega) = \sum\limits_{T\in\omega}osc^2_{\mathcal{T}_h}(f, T), \end{align*}$

by which $\eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, \omega), \ \eta_{3, \mathcal{T}_h}^2(u_{h}, y_{h}, \omega)$ and $osc^2_{\mathcal{T}_h}(y_{h}-y_{d}, \omega)$ can be denoted similarly.

2.1. A posteriori error estimates

Also, a reliable and effective a posteriori error estimates for the nonlinear optimal control problem (2.1)–(2.2) which is presented in next Theorem.

Theorem 2.1. For $\mathcal{T}_h\in \mathbb{T}$ , let $(y, p, u)$ be the exactsolution of problem (2.3), (2.4) and (2.5) and $(y_{h}, p_{h}, u_{h})$ be the solution of problem (2.8), (2.9) and (2.10) with respect to $\mathcal{T}_h$ . Then thereexist constants $c$ and $C$ such that

$\begin{align} &\quad\ \|u-u_{h}\|_{0}^2+\|y-y_{h}\|_{1}^2+\|p-p_{h}\|_{1}^2\\ &\leq C\left(\eta_{1, \mathcal{T}_h}^2(p_{h}, \mathcal{T}_h)+\eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, \mathcal{T}_h)+\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{T}_h)\right), \end{align}$

(2.11)

and

$\begin{align} &\quad \ c\left(\eta_{1, \mathcal{T}_h}^2(p_{h}, \mathcal{T}_h)+\eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, \mathcal{T}_h)+\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{T}_h)\right)\\ &\leq \|u-u_{h}\|_{0}^2+\|y-y_{h}\|_{1}^2+\|p-p_{h}\|_{1}^2+osc_{\mathcal{T}_h}^2(f, \mathcal{T}_h)+osc_{\mathcal{T}_h}^2(y_{h}-y_{d}, \mathcal{T}_h). \end{align}$

(2.12)

Proof. Step 1. We seek functions $y^h, p^h\in V$ satisfying the following auxiliary problems

$\begin{align} &a(y^h, v)+(\phi(y^h), v) = (f+u_{h}, v), \quad \forall \ v\in V, \end{align}$

(2.13)

$\begin{align} &a(q, p^h)+(\phi'(y^h)p^h, q) = (y_{h}-y_{d}, q), \quad \forall\ q\in V. \end{align}$

(2.14)

It follows that

$\begin{align*} \|u-u_{h}\|_0^2&\leq C(\alpha u, u-u_{h})-C(\alpha u_{h}, u-u_{h})\\ &\leq-C(\alpha u_{h}, u-u_{h})\\ & = C(\alpha u^{h}, u_{h}-u)+C(\alpha u^{h}-\alpha u_{h}, u-u_{h}). \end{align*}$

With the help of the proof of Lemma 3.4 in ^[14] and the lemma 3.4 in ^[21], we have

$\begin{align} (\alpha u^{h}, u_{h}-u)& = (\alpha u_{h}+p_{h}, u_{h}-u)\\ &\leq C\bigg(\sum\limits_{T\in \mathcal{T}_h}\|p_{h}-\pi_{h}p_{h}\|_{0, T}^2+\|u-u_{h}\|_0^2\bigg)\\ &\leq C(\sigma)\eta_{1, {\mathcal{T}}_h}^2(p_{h}, {\mathcal{T}_h})+C\sigma\|u-u_{h}\|_0^2, \end{align}$

(2.15)

where $\eta^2_{1, \mathcal{T}_h}(p_{h}, \mathcal{T}_h) = \sum\limits_{T\in\mathcal{T}_h}\eta^2_{1, \mathcal{T}_h}(p_{h}, T)$ , and

$\begin{align} (\alpha u^{h}-\alpha u_{h}, u-u_{h})& = (p_{h}-p^{h}, u-u_{h})\\ &\leq C(\sigma)\|p^{h}-p_{h}\|_0^2+C\sigma\|u-u_{h}\|_0^2, \end{align}$

(2.16)

where $\pi_{h}$ is the $L^2$ -projection operator onto piecewise constant space on $\mathcal{T}_h$ , $C(\sigma)$ is a universal constant, which depends on $\sigma$ , $\sigma$ is an arbitrary positive number, and $C$ is a general universal constant, which can include $C(\sigma)$ . Obviously, from (2.15) and (2.16) there holds

$\|u-u_{h}\|_0^2\leq C(\eta_{1, {\mathcal{T}_h}}^2(p_{h}, {\mathcal{T}_h})+\|p^{h}-p_{h}\|_0^2).$

Let $e^y = y^{h}-y_{h}$ , and $e_I^y = \hat{\pi}_{h}e^y$ , where $\hat{\pi}_{h}$ is the average interpolation operator defined in Lemma 3.2 of ^[14], then we can obtain

$\begin{align*} c\|e^y\|_1^2\leq&(\nabla(y^{h}-y_{h}), \nabla e^y)+(\phi(y^h)-\phi(y_h), e^y)\\ = &(\nabla(y^h-y_h), \nabla(e^y-e_I^y))+(\phi(y^h)-\phi(y_h), e^y-e_I^y)\\ = &\sum\limits_{T\in \mathcal{T}_h}\int_T(f+u_h-\phi(y_h))(e^y-e_I^y)dx-\sum\limits_{T\in\mathcal{T}_h}\int_{\partial T}([\nabla y_h]\cdot\mathbf{n})(e^y-e_I^y)dx\\ \leq&C(\sigma)\sum\limits_{T\in\mathcal{T}_h}h_T^2\int_T(f+u_h-\phi(y_h))^2dx+C(\sigma)\sum\limits_{\partial T\backslash\partial\Omega}h_T\int([\nabla y_h]\cdot\mathbf{n})^2dx+C\sigma\|e^y\|_1^2\\ = &C(\sigma)\eta_{2, \mathcal{T}_h}^2(u_h, y_h, \mathcal{T}_h)+C\sigma\|e^y\|_1^2, \end{align*}$

where $\sigma$ is an arbitrary positive number. The definition of $\eta_2$ will be given later on. Then let $\sigma = \frac{c}{2C}$ , we have

$\begin{align*} \|y^h-y_h\|_1^2\leq C\eta_{2, \mathcal{T}_h}^2(u_h, y_h, \mathcal{T}_h). \end{align*}$

Similarly, let $e^p = p^{h}-p_h$ and $e_I^p$ be the average interpolation of $e^p$ , then we can get

$\begin{align*} c\|e^p\|_1^2\leq&(\nabla e^p, \nabla(p^{h}-p_{h}))+((\phi'(y^{h})(p^{h}-p_{h}), e^p)\\ = &(\nabla e^p, \nabla(p^{h}-p_{h}))+(\phi'(y^{h})p^{h}-\phi'(y_{h}) p_{h}, e^p)+((\phi'(y_{h})-\phi'(y^{h}))p_{h}, e^p)\\ = &(\nabla(e^p-e_I^p), \nabla(p^{h}-p_{h}))+(\phi'(y^{h})p^{h}-\phi'(y_{h})p_{h}, e^p-e_I^p)+(\nabla e_I^p, \nabla(p^{h}-p_{h}))\\ &+(\phi'(y^{h})p^{h}-\phi'(y_{h})p_{h}, e_I^p)+((\phi'(y_{h})-\phi'(y^{h}))p_{h}, e^p)\\ = &\sum\limits_{T\in{\mathcal{T}_h}}\int_T(y_{h}-y_d-\phi'(y_{h})p_{h})(e^p-e_I^p)dx\\&-\sum\limits_{\partial T\backslash\partial\Omega}\int_{\partial T}([\nabla p_{h}]\cdot\mathbf{n})(e^p-e_I^p) -(y^{h}-y_{h}, e_I^p)+((\phi'(y_{h})-\phi'(y^{h}))p_{h}, e^p)dx\\ \leq&C(\sigma)\sum\limits_{T\in{\mathcal{T}_h}}h_T^2\int_T(y_{h}-y_d-\phi'(y_{h})p_{h})^2dx+C(\sigma)\sum\limits_{\partial T\backslash\partial\Omega}h_T\int_{\partial T}([\nabla p_{h}]\cdot\mathbf{n})^2dx\\ &+C\sigma\sum\limits_{T\in{\mathcal{T}_h}}h_T^{2}\int_T|\nabla e^p|^2dx +C\|y^{h}-y_{h}\|_0\|e^p\|_0+C\|\phi'(y^{h})\\ &-\phi'(y_{h})\|_0\|p_{h}\|_{0, 4}\|e^p\|_{0, 4}\\ \leq&C(\sigma)\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{T}_h)+C(\sigma)\|y^{h}-y_{h}\|_0^2+C(\sigma)\|p_{h}\|_1^2 \|y^{h}-y_{h}\|_0^2+C\sigma\|p^{h}-p_{h}\|_1^2\big)\\ \leq&C(\sigma)\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{T}_h)+C\|y^{h}-y_{h}\|_0^2+C\sigma\|p^{h}-p_{h}\|_1^2, \end{align*}$

in which we apply the embedding theorem $\|v\|_{0, 4}\leq C\|v\|_1$ (see ^[6]) and the property: $\|p_{h}\|_1\leq C$ , $C(\sigma)\|y^{h}-y_{h}\|_0^2+C(\sigma)\|p_{h}\|_1^2 \|y^{h}-y_{h}\|_0^2\leq C(\sigma)\|y^{h}-y_{h}\|_0^2+C(\sigma)C \|y^{h}-y_{h}\|_0^2\leq C\|y^{h}-y_{h}\|_0^2$ . The definition of $\eta_3$ will be given later on. Absolutely, we can obtain

$\begin{align*} \|p^{h}-p_{h}\|_1^2\leq C\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{T}_h)+C\|y^{h}-y_{h}\|_0^2. \end{align*}$

Hence we gain

$\begin{align*} \|p_h-p^h\|_0^2&\leq C\|p_h-p^h\|_1^2+C\|y_h-y^h\|_1^2\\&\leq C(\eta_{2, \mathcal{T}_h}^2(u_h, y_h, \mathcal{T}_h)+\eta_{3, \mathcal{T}_h}^2(y_h, p_h, \mathcal{T}_h)). \end{align*}$

By the triangular inequality we obtain that

$\begin{align*} &\|y-y_{h}\|_{1}\leq\|y-y^h\|_{1}+\|y^h-y_{h}\|_{1}\leq C( \|u-u_{h}\|_{0}+\|y^h-y_{h}\|_{1}), \\ &\|p-p_{h}\|_{1}\leq \|p-p^h\|_{1}+\|p^h-p_{h}\|_{1} \leq C (\|y-y_{h}\|_{1}+\|p^h-p_{h}\|_{1}). \end{align*}$

In connection with what we discussed above, we have

$\begin{align*} \|u-u_h\|_0^2+\|p-p_h\|_1^2+\|y-y_h\|_1^2\leq C(\eta_{1, \mathcal{T}_h}^2(p_h, \mathcal{T}_h)+\eta_{2, \mathcal{T}_h}^2(u_h, y_h, \mathcal{T}_h)+\eta_{3, \mathcal{T}_h}^2(y_h, p_h, \mathcal{T}_h)). \end{align*}$

Step 2. Now we are in the position to get the lower bound. Consulting to the proof of Lemma 3.6 in ^[14] and Lemma 3.4 in ^[21], we conclude that

$\begin{align} h_T^2\|\nabla p_h\|_{0}^2\approx\sum\limits_{T\in\mathcal{T}_h}\|p_{h}-\pi_{h}p_{h}\|_{0, T}^2\leq C(\|u-u_{h}\|_{0}^2+\|p-p_{h}\|_{1}^2). \end{align}$

(2.17)

Proof. It is easily seen that

$\begin{align*} \sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|^2_{0, T}& = \sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|_{0, T}\|p_h-p+p-\pi_hp+\pi_hp-\pi_hp_h\|_{0, T}\\ &\leq\sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|_{0, T}\|p-\pi_hp\|+\frac{1}{3}\sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|^2_{0, T}\\ &\quad +C\|p_h-p\|^2_1. \end{align*}$

Since $u+p = \max(0, \bar{p}) = const$ , hence

$\begin{align*} \pi_h(u+p) = u+p, \end{align*}$

such that

$\begin{align*} &\quad\sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|_{0, T}\|p-\pi_hp\|_{0, T}\\ & = \sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|_{0, T}\|p+u-\pi_h(p+u)+\pi_hp-u\|_{0, T}\\ & = \sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|_{0, T}\|\pi_h(u-u_h)-(u-u_h)\|_{0, T}\\ &\leq\frac{1}{3}\sum\limits_{\mathrm{T}\in\mathcal{T}_h}\|p_h-\pi_hp_h\|^2_{0, T}+C\|u_h-u\|^2_1. \end{align*}$

This completes the proof.

According to ^[28], we empoly the standard bubble function technique to estimate error indicators $\eta_{2, \mathcal{T}_h}(u_h, y_h, \mathcal{T}_h)$ and $\eta_{3, \mathcal{T}_h}(y_h, p_h, \mathcal{T}_h)$ . Similar to Chapter 7.2 in ^[22], there exists polynomials $w_T\in H_0^1(T)$ and $w_{\partial T}\in H_0^1({\partial T\backslash\partial\Omega})$ such that

$\begin{align} \int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx& = \int_{\partial T}([\nabla p_h]\cdot\mathbf{n})w_{\partial T}dx, \end{align}$

(2.18)

$\begin{align} \int_Th_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx& = \int_T((y_h-y_d)_T-\phi'(y_h)p_h)w_Tdx, \end{align}$

(2.19)

and apparently

$\begin{align} \|w_{\partial T}\|_{1, \partial T\backslash\partial\Omega}^2\leq& C\int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx, \end{align}$

(2.20)

$\begin{align} h_T^{-2}\|w_{\partial T}\|_{0, \partial T\backslash\partial\Omega}^2\leq& C\int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx, \end{align}$

(2.21)

$\begin{align} \|w_T\|_{1, T}^2\leq &C\int_hh_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx, \end{align}$

(2.22)

$\begin{align} h_T^{-2}\|w_T\|_{0, T}^2\leq &C\int_hh_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx. \end{align}$

(2.23)

By using the Schwarz inequality, it follows from (2.18), (2.20), and (2.21) that

$\begin{align*} &\int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx\\ = &\int_{\partial T}([\nabla p_h]\cdot\mathbf{n})w_{\partial T}dx\\ = &\int_{\partial T}([\nabla p_h]\cdot\mathbf{n}-[\nabla p]\cdot\mathbf{n})w_{\partial T}dx\\ = &\int_{\partial T}\nabla(p_h-p)\nabla w_{\partial T}dx+{\rm div}(\Delta(p_h-p)) w_{\partial T}\\ = &\int_{\partial T}\nabla(p_h-p)\nabla w_{\partial T}dx+\int_{\partial T}(y-y_d)dx-\phi'(y)p)w_{\partial T}dx\\ = &\int_{\partial T}\nabla(p_h-p)\nabla w_{\partial T}dx+\int_{\partial T}(y_h-y_d-\phi'(y_h)p_h)w_{\partial T}dx\\ &+\int_{\partial T}((y-y_d)-(y_h-y_d))w_{\partial T}dx+\int_{\partial T}(\phi'(y)p-\phi'(y_h)p_h)w_{\partial T}dx\\ \leq&C(\sigma)\|p_h-p\|_{1, \partial T\backslash\partial\Omega}^2+C(\sigma)\|y-y_h\|_{0, \partial T\backslash\partial\Omega}^2\\ &+\int_{\partial T}\phi'(y)(p-p_h)w_{\partial T}dx+\int_{\partial T}(\phi'(y)-\phi'(y_h))p_hw_{\partial T}dx\\ &+C(\sigma)\int_{\partial T}(y_h-y_d-\phi'(y_h)p_h)^2dx+C\sigma(\|w_{\partial T}\|_{1, \partial T\backslash\partial\Omega}^2+h_T^{-2}\|w_{\partial T}\|_{0, \partial T\backslash\partial\Omega}^2)\\ \leq&C(\sigma)\|p_h-p\|_{1, \partial T\backslash\partial\Omega}^2+C(\sigma)\|y-y_h\|_{0, \partial T\backslash\partial\Omega}^2\\ &+C(\sigma)\|\phi'(y)\|_{0, \partial T\backslash\partial\Omega}^2\|(p-p_h)\|_{0, \partial T\backslash\partial\Omega}^2+\int_{\partial T}\tilde{\phi}''(y_h)(y-y_h)p_hw_{\partial T}dx\\ &+C(\sigma)\int_{\partial T}(y_h-y_d-\phi'(y_h)p_h)^2dx+C\sigma(\|w_{\partial T}\|_{1, \partial T\backslash\partial\Omega}^2+h_T^{-2}\|w_{\partial T}\|_{0, \partial T\backslash\partial\Omega}^2)\\ \leq&C(\sigma)\|p_h-p\|_{1, \partial T\backslash\partial\Omega}^2+C(\sigma)\|y-y_h\|_{0, \partial T\backslash\partial\Omega}^2\\ &+C(\sigma)\|\tilde{\phi}''(y_h)\|_{0, \partial T\backslash\partial\Omega}^2\|y-y_h\|_{0, \partial T\backslash\partial\Omega}^2\|p_h\|_{0, \partial T\backslash\partial\Omega}^2\\ &+C(\sigma)\int_{\partial T}(y_h-y_d-\phi'(y_h)p_h)^2dx +C\sigma(\|w_{\partial T}\|_{1, \partial T\backslash\partial\Omega}^2+h_T^{-2}\|w_{\partial T}\|_{0, \partial T\backslash\partial\Omega}^2)\\ \leq&C(\sigma)\|p_h-p\|_{1, \partial T\backslash\partial\Omega}^2+C(\sigma)\|y-y_h\|_{0, \partial T\backslash\partial\Omega}^2\\ &+C(\sigma)\int_{\partial T}(y_h-y_d-\phi'(y_h)p_h)^2dx +C\sigma\int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx, \end{align*}$

where $\sigma$ is an arbitrary positive number and $\phi(\cdot)\in W^{2, \infty}(\Omega)$ has been used. Then let $\sigma = \frac{1}{2C}$ and we have

$\begin{align} \int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx\leq&C\|p_h-p\|_{1, \partial T\backslash\partial\Omega}^2+C\|y-y_h\|_{0, \partial T\backslash\partial\Omega}^2\\ &+C\int_{\partial T}(y_h-y_d-\phi'(y_h)p_h)^2dx. \end{align}$

(2.24)

Next, it follows from (2.19), (2.22), and (2.23) that

$\begin{align*} &\int_Th_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx\\ = &\int_T((y_h-y_d)_T-\phi'(y_h)p_h)w_Tdx\\ = &\int_T(y_h-y_d-\phi'(y_h)p_h)w_Tdx+\int_T((y_h-y_d)-(y_h-y_d)_T)w_Tdx\\ \leq&\int_T(y_h-y_d-\phi'(y_h)p_h-(y-y_d)+\phi'(y)p)w_T dx\\ &+C(\sigma)\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx+C\sigma h_T^{-2}\|w_T\|_{0, T}^2\\ = &-\int_T\nabla(p_h-p)\nabla w_Tdx+\int_T((y_h-y_d)-(y-y_d))w_Tdx+\int_T(\phi'(y_h)p_h-\phi'(y)p)w_Tdx\\ &+C(\sigma)\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx+C\sigma h_T^{-2}\|w_T\|_{0, T}^2\\ \leq&C(\sigma)\|p_h-p\|_{1, T}^2+C(\sigma)\|(y-y_d)-(y_h-y_d)\|_{0, T}^2\\ &+\int_T\phi'(y_h)(p_h-p)w_Tdx+\int_T(\phi'(y_h)-\phi'(y))pw_Tdx\\ &+C(\sigma)\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx+C\sigma (\|w_T\|_{1, T}^2+h_T^{-2}\|w_T\|_{0, T}^2)\\ \leq&C(\sigma)\|p_h-p\|_{1, T}^2+C(\sigma)\|y-y_h\|_{0, T}^2 +C(\sigma)\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx \\&+C(\sigma)\|\phi'(y_h)\|_{0, T}^2\|p_h-p\|_0^2+C(\sigma)\|\tilde{\phi}''(y)\|_{0, T}^2\|y_h-y\|_{0, T}^2\|p\|_{0, T}^2\\&+C\sigma (\|w_T\|_{1, T}^2+h_T^{-2}\|w_T\|_{0, T}^2)\\ \leq&C(\sigma)\|p_h-p\|_{1, T}^2+C(\sigma)\|y-y_h\|_{0, T}^2+C(\sigma)\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx\\ &+C\sigma\int_Th_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx, \end{align*}$

where $\phi(\cdot)\in W^{2, \infty}(\Omega)$ has been used. Absolutely, we can deduce that

$\begin{align*} &\int_Th_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx\\ \leq& C\|p_h-p\|_{1, T}^2+C\|y-y_h\|_{0, T}^2 +C\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx. \end{align*}$

Then we have

$\begin{align} &\int_Th_T^2(y_h-y_d-\phi'(y_h)p_h)^2dx\\ \leq&C\int_Th_T^2((y_h-y_d)_T-\phi'(y_h)p_h)^2dx+C\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx\\ \leq&C\|p_h-p\|_{1, T}^2+C\|y-y_h\|_{0, T}^2 +C\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx. \end{align}$

(2.25)

In connection with (2.24) and (2.25) we are easy to gain

$\begin{align*} \eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{T}_h) = &\sum\limits_{T\in\mathcal{T}_h}\int_Th_T^2(y_h-y_d-\phi'(y_h)p_h)^2dx\\ &+\sum\limits_{\partial T\backslash\partial\Omega}\int_{\partial T}h_T([\nabla p_h]\cdot\mathbf{n})^2dx\\ \leq&C\|p_h-p\|_{1}^2+C\|y-y_h\|_{0}^2+C\sum\limits_{T\in\mathcal{T}_h}\int_Th_T^2((y_h-y_d)-(y_h-y_d)_T)^2dx\\ \leq&C\|p_h-p\|_{1}^2+C\|y-y_h\|_{1}^2+Cosc_{\mathcal{T}_h}^2(y_{h}-y_{d}, \mathcal{T}_h). \end{align*}$

It can also be deduced that

$\begin{align*} \eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, \mathcal{T}_h) = &\sum\limits_{T\in\mathcal{T}_h}\int_Th_T^2(f+u_h-\phi(y_h))^2dx\\ &+\sum\limits_{\partial T\backslash\partial\Omega}\int_{\partial T}h_T([\nabla y_h]\cdot\mathbf{n})^2dx\\ \leq&C\|y_h-y\|_{1}^2+C\|u-u_h\|_{0}^2+C\sum\limits_{T\in\mathcal{T}_h}\int_Th_T^2(f-f_T)^2dx\\ \leq&C\|y_h-y\|_{1}^2+C\|u-u_h\|_{0}^2+Cosc_{\mathcal{T}_h}^2(f, \mathcal{T}_h). \end{align*}$

Above-mentioned results tell the proof of Theorem 2.1 is accomplished.

3. The algorithms

In this section, we introduce two related algorithms as follows:

Algorithm 3.1. Adaptive finite element algorithm for nonlinear optimal controlproblems:

(0) Given an initial grids $\mathcal{T}_{h_0}$ and construct finiteelement space $U_{ad}^{h_0}$ and $V_{h_0}$ . Select marking parameter $0 < \theta\leq 1$ and set $k: = 0$ .

(1) Solve the discrete nonlinear optimal control problem (2.8)–(2.10), then obtain approximate solution $(u_{h_k}, y_{h_k}, p_{h_k})$ with respect to $\mathcal{T}_{h_k}$ .

(2) Compute the local error estimator $\eta_{\mathcal{T}_{h_k}}(T)$ for all $T\in\mathcal{T}_{h_k}$ .

(3) Select a minimal subset $\mathcal{M}_{h_k}$ of $\mathcal{T}_{h_k}$ such that

$\eta_{\mathcal{T}_{h_k}}^2(\mathcal{M}_{h_k})\geq\theta\eta_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k}),$

where $\eta_{\mathcal{T}_{h_k}}^2(\omega) = \eta_{1, \mathcal{T}_h}^2(p_h, \omega)+\eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, \omega)+\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \omega)$ for all $\omega\subset\mathcal{T}_{h_k}$ .

(4) Refine $\mathcal{M}_{h_k}$ by bisecting $b\geq 1$ times inpassing from $\mathcal{T}_{h_k}$ to $\mathcal{T}_{h_{k+1}}$ andgenerally additional elements are refined in the process in order toensure that $\mathcal{T}_{h_{k+1}}$ is conforming.

(5) Solve the discrete nonlinear optimal control problem (2.8)–(2.10), then obtain approximate solution $(u_{h_{k+1}}, y_{h_{k+1}}, p_{h_{k+1}})$ with respect to $\mathcal{T}_{h_{k+1}}$ .

(6) Set $k = k+1$ and go to step (2).

Algorithm 3.2. Given an initial control $u_h^0\in U_{ad}^{h}$ , then seek $(y_h^k, p_h^k, u_h^k)$ such that

$\begin{align*} &a\big(y_h^k, w_h\big)+\big(\phi\big(y_h^k\big), w_h) = \big(f+u_h^{k-1}, w_h\big), \quad\forall\ w_h\in V_h, \\ &a\big(q_h, p_h^{k-1}\big)+\big(\phi\big(y_h^{k}\big)p_h^{k-1}, q_h\big) = \big(y_h^k-y_d, q_h\big), \quad\forall\ q_h\in V_h, \\ &\big(\alpha u_h^k+p_h^{k-1}, v_h-u_h^k\big)\geq0, \quad\forall\ v_h^k\in U_{ad}^h, \end{align*}$

for $k = 1, 2, \cdots$ , and apparently

$\begin{align*} u_h^k = \frac{1}{\alpha}\big(-\mathcal{P}_hp_h^k+\max\big(0, \bar{p}_h^k\big)\big), \end{align*}$

where $\mathcal{P}_h$ is the $L^2$ -projection from $L^2(\Omega)$ to $U^h$ and $\bar{p}_h^k = \frac{\int_\Omega p_h^k}{|\Omega|}$ .

4. Convergence analysis for adaptive finite element method

In this section we first consider the nonlinear elliptic equations as follows:

$\begin{align} \begin{cases}-\Delta y+\phi(y) = f, \quad \mathrm{in}\ \Omega, \\ y = 0, \quad \mathrm{on}\ \partial\Omega, \end{cases} \end{align}$

(4.1)

where $f\in L^2(\Omega)$ . We introduce the quantity $\mathcal{J}(h)$ in view of the idea in ^[30] as follows:

$\mathcal{J}(h): = \sup\limits_{f\in L^2(\Omega), \ \|f\|_{0, \Omega} = 1}\inf\limits_{v_{h}\in v_{\mathcal{T}}}\|S-v_{h}\|_{1},$

where $S$ is the solution operator for nonlinear elliptic equations. Obviously, $\mathcal{J}(h)\ll 1$ for $h\in (0, h_{0})$ if $h_{0}\ll 1$ . Hereinafter there holds the Lemma about the quantity referring to ^[10,30].

Lemma 4.1. For each $f\in L^2(\Omega)$ , there exists a constant $C$ such that

$\begin{align} \|Sf-S_{h}f\|_{1}\leq C \mathcal{J}(h)\|f\|_{0}, \end{align}$

(4.2)

and

$\begin{align} \|Sf-S_{h}f\|_{0}\leq C \mathcal{J}(h)\|Sf-S_{h}f\|_{1}, \end{align}$

(4.3)

where $S_{h}$ is the discretesolution operator for nonlinear elliptic equations.

Here, Lemma 4.1 is a preparation for Lemma 4.4.

4.1. Local perturbation property

According to ^[19], the local perturbation property plays an important role for the proof of the convergence. It impels us to combine the sum of the error estimates.

Lemma 4.2. For $\mathcal{T}_h\in \mathbb{T}, \ T\in \mathcal{T}_h$ , let $u_{h_1}, u_{h_2}\in U_{ad}^h, \ y_{h_1}, y_{h_2}, p_{h_1}, p_{h_2}\in V_{h}$ , we have

$\begin{align} \eta_{1, \mathcal{T}_h}(p_{h_1}, T)-\eta_{1, \mathcal{T}_h}(p_{h_2}, T)&\leq C h_{T}\|p_{h_1}-p_{h_2}\|_{1, T}, \end{align}$

(4.4)

$\begin{align} \eta_{2, \mathcal{T}_h}(u_{h_1}, y_{h_1}, T)-\eta_{2, \mathcal{T}_h}(u_{h_2}, y_{h_2}, T)&\leq C(h_T\|u_{h_1}-u_{h_2}\|_{0, T}+\|y_{h_1}-y_{h_2}\|_{1, T}), \end{align}$

(4.5)

$\begin{align} \eta_{3, \mathcal{T}_h}(y_{h_1}, p_{h_1}, T)-\eta_{3, \mathcal{T}_h}(y_{h_2}, p_{h_2}, T)&\leq C(h_T\|y_{h_1}-y_{h_2}\|_{0, T}+\|p_{h_1}-p_{h_2}\|_{1, T}), \end{align}$

(4.6)

$\begin{align} osc_{\mathcal{T}_h}(y_{h_1}-y_{d}, T)-osc_{\mathcal{T}_h}(y_{h_2}-y_{d}, T)&\leq C h_{T}^2\|y_{h_1}-y_{h_2}\|_{1, T}. \end{align}$

(4.7)

Proof. Step 1. According to ^[16,21] we have

$\begin{align} \|v\|_{0, \partial T\backslash\partial\Omega}\leq C(h_T^{-1/2}\|v\|_{0, T}+h_T^{1/2}\|v\|_{1, T}). \end{align}$

(4.8)

By adopting the inverse estimates and (4.8) we obtain

$\begin{align} &\|[\nabla(y_{h_1}-y_{h_2})]\cdot\mathbf{n}\|_{0, \partial T\backslash\partial\Omega}\leq C h_{T}^{-1/2}\|y_{h_1}-y_{h_2}\|_{1, \omega_{T}}, \end{align}$

(4.9)

$\begin{align} &\|[\nabla(p_{h_1}-p_{h_2})]\cdot\mathbf{n}\|_{0, \partial T\backslash\partial\Omega}\leq C h_{T}^{-1/2}\|p_{h_1}-p_{h_2}\|_{1, \omega_{T}}, \end{align}$

(4.10)

where $\omega_{T}$ denotes the patch of elements that share an edge with $T$ . By the definition of $\eta_{1, \mathcal{T}_h}(p_h, T)$ and (4.10) we can deduce that

$\begin{align*} \eta_{1, \mathcal{T}_h}(p_{h_1}, T)\leq\eta_{1, \mathcal{T}_h}(p_{h_2}, T)+Ch_T\|[\nabla(p_{h_1}-p_{h_2})]\cdot\mathbf{n}\|_{0, \partial T\backslash\partial\Omega}. \end{align*}$

Then we have

$\begin{align*} \eta_{1, \mathcal{T}_h}(p_{h_1}, T)-\eta_{1, \mathcal{T}_h}(p_{h_2}, T)\leq&Ch_T\|[\nabla(p_{h_1}-p_{h_2})]\cdot\mathbf{n}\|_{0, \partial T\backslash\partial\Omega} \leq Ch_T\|p_{h_1}-p_{h_2}\|_{1, T}, \end{align*}$

where the proof of (4.4) is finished.

Step 2. By the definition of $\eta_{2, \mathcal{T}_h}(u_h, y_h, T)$ and (4.9), we can deduce that

$\begin{align*} &\eta_{2, \mathcal{T}_h}(u_{h_1}, y_{h_1}, T)-\eta_{2, \mathcal{T}_h}(u_{h_2}, y_{h_2}, T)\\ \leq&h_T\|u_{h_1}-u_{h_2}\|_{0, T}+h_T^{1/2}\|[\nabla (y_{h_1}-y_{h_2})]\cdot\mathbf{n}\|_{0, \partial T\backslash\partial\Omega}+h_T\|\phi(y_{h_1})-\phi(y_{h_2})\|_{0, T}\\ \leq&h_T\|u_{h_1}-u_{h_2}\|_{0, T}+C\|y_{h_1}-y_{h_2}\|_{1, \omega_{T}}+Ch_T\|\tilde{\phi}'(y_{h_1})\|_{0, T}\|y_{h_1}-y_{h_2}\|_{0, T}\\ \leq&C(h_T\|u_{h_1}-u_{h_2}\|_{0, T}+\|y_{h_1}-y_{h_2}\|_{1, T}), \end{align*}$

where $\phi(\cdot)\in W^{2, \infty}(\Omega)$ has been used and the proof of (4.5) is finished.

Step 3. By the definition of $\eta_{3, \mathcal{T}_h}(y_h, p_h, T)$ and (4.10) we can derive that

$\begin{align*} &\eta_{3, \mathcal{T}_h}(y_{h_1}, p_{h_1}, T)-\eta_{3, \mathcal{T}_h}(y_{h_2}, p_{h_2}, T)\\ \leq&h_T\|y_{h_1}-y_{h_2}\|_{0, T}+h_T^{1/2}\|[\nabla (p_{h_1}-p_{h_2})]\cdot\mathbf{n}\|_{0, \partial T\backslash\partial\Omega}+h_T\|\phi'(y_{h_1})p_{h_1}-\phi'(y_{h_2})p_{h_2}\|_{0, T}\\ \leq&h_T\|y_{h_1}-y_{h_2}\|_{0, T}+C\|p_{h_1}-p_{h_2}\|_{1, \omega_{T}}+h_T\|\phi'(y_{h_1})\|_{0, T}\|p_{h_1}-p_{h_2}\|_{0, T}\\ &+h_T\|\tilde{\phi}''(y_{h_1})\|_{0, T}\|y_{h_1}-y_{h_2}\|_{0, T}\|p_{h_2}\|_{0, T}\\ \leq&h_T\|y_{h_1}-y_{h_2}\|_{0, T}+C\|p_{h_1}-p_{h_2}\|_{1, \omega_{T}}+Ch_T\|p_{h_1}-p_{h_2}\|_{0, T} \\\leq&C( h_T\|y_{h_1}-y_{h_2}\|_{0, T}+\|p_{h_1}-p_{h_2}\|_{1, T}), \end{align*}$

where $\phi(\cdot)\in W^{2, \infty}(\Omega)$ has been used and the proof of (4.6) is finished. \\Step 4. Similarly, we have

$\begin{align*} \ osc_{\mathcal{T}_h}(y_{h_1}-y_d, T)-osc_{\mathcal{T}_h}(y_{h_2}-y_d, T)\leq&h_T^2\|y_{h_1}-y_{h_2}\|_{0, T}. \end{align*}$

In brief, Lemma 4.2 is proved.

4.2. Error reduction

The authors in ^[25] demonstrate an error reduction provided the current errors are larger than the desired errors, that is to say, the errors may not be reduced in the process of coarse grids refinement before introducing a node of the refined grids inside each marked element while D $\ddot{\mathrm{o}}$ rfler proves a similar result assumption ^[9].

Lemma 4.3. Let $\mathcal{T}_h\subset\tilde{\mathcal{T}_h}$ for $\mathcal{T}_h, \tilde{\mathcal{T}_h}\in\mathbb{T}.\mathcal{M}_h\subset\mathcal{T}_h$ denotes the set of elements whichare marked from $\mathcal{T}_h$ to $\tilde{\mathcal{T}}_h$ . Then for $u_h\in U_{ad}^h, \ \tilde{u}_h \in U_{ad}^{\tilde{h}}, \ y_h, p_h\in V_{h}, \ \tilde{y}_h, \tilde{p}_h\in V_{\tilde{h}}$ and any $\delta, \delta_{1}\in(0, 1]$ , we have

$\begin{align} &\quad\ \eta_{1, \tilde{\mathcal{T}}_h}^2(\tilde{p}_h, \tilde{\mathcal{T}_h})-(1+\delta_{1})\bigg\{\eta_{1, \mathcal{T}_h}^2(p_h, \mathcal{T}_h)-(1-2^{-1/2})\eta_{1, \mathcal{T}_h}(p_h, \mathcal{R}_h)\bigg\}\\ &\leq C\left(1+\delta_{1}^{-1}\right)h_{0}^2\|p_h-\tilde{p}_h\|_{1}^2, \end{align}$

(4.11)

and

$\begin{align} &\quad\ \eta_{2, \tilde{\mathcal{T}}_h}^2(\tilde{u}_h, \tilde{y}_h, \tilde{\mathcal{T}}_h)-(1+\delta)\bigg\{\eta_{2, \mathcal{T}}^2(u_h, y_h, \mathcal{T}_h) -\lambda\eta_{2, \mathcal{T}_h}^2(u_h, y_h, \mathcal{M}_h)\bigg\}\\ &\leq C(1+\delta^{-1})\left(h_{0}^2\|u_h-\tilde{u}_h\|_{0}^2+\|y_h-\tilde{y}_h\|_{1}^2\right), \end{align}$

(4.12)

and

$\begin{align} &\quad\ \eta_{3, \tilde{\mathcal{T}}_h}^2(\tilde{y}_h, \tilde{p}_h, \tilde{\mathcal{T}}_h)-(1+\delta)\bigg\{\eta_{3, \mathcal{T}_h}^2(y_h, p_h, \mathcal{T}_h) -\lambda\eta_{3, \mathcal{T}_h}^2(u_h, y_h, \mathcal{M}_h)\bigg\}\\ &\leq C(1+\delta^{-1})\bigg(h_{0}^2\|y_h-\tilde{y}_h\|_{0}^2+\|p_h-\tilde{p}_h\|_{1}^2\bigg), \end{align}$

(4.13)

and

$\begin{align} &\quad\ osc_{\mathcal{T}_h}^2(y_h-y_{d}, \mathcal{T}_h\cap\tilde{\mathcal{T}}_h)-2osc_{\tilde{\mathcal{T}}_h}^2(\tilde{y}_h-y_{d}, \mathcal{T}_h\cap\tilde{\mathcal{T}}_h)\\ &\leq 2Ch_{0}^4\|y_h-\tilde{y}_h\|_{1}^2, \end{align}$

(4.14)

where $\lambda = 1-2^{-\frac{b}{2}}, \ h_{0} = \max\limits_{{T\in\mathcal{T}_{h_{0}}}}h_{T}$ and $\mathcal{R}_h$ denotes the set ofelements which are refined from $\mathcal{T}_h$ to $\tilde{\mathcal{T}_h}$ .

Proof. Step 1. Applying the Young's inequality with parameter $\delta_{1}$ and (4.4) we get

$\begin{align} \eta_{1, \tilde{\mathcal{T}_h}}^2(\tilde{p}_h, \tilde{\mathcal{T}_h})-\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, \tilde{\mathcal{T}_h})\leq&Ch_0^2\|p_h-\tilde{p}_h\|_1^2+ 2\eta_{1, \tilde{\mathcal{T}_h}}(\tilde{p}_h, \tilde{\mathcal{T}_h})\cdot\eta_{1, \tilde{\mathcal{T}_h}}(p_h, \tilde{\mathcal{T}_h})\\ \leq&C(1+\delta_1^{-1})h_0^2\|p_h-\tilde{p}_h\|_1^2+\delta_1\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, \tilde{\mathcal{T}_h}) . \end{align}$

(4.15)

Note that $T$ will be bisected at least one time for the element $T\in \mathcal{R}_h\subset\mathcal{T}_h$ , then we have

$\sum\limits_{T'\in T}\eta_{1, \tilde{\mathcal{T}_h}}(p_h, T')\leq2^{-1/2}\eta_{1, \mathcal{T}_h}(p_h, T).$

For $T\in \mathcal{T}_h\backslash\mathcal{R}_h$ , we gain

$\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, T) = \eta_{1, \mathcal{T}_h}^2(p_h, T).$

In connection with the above estimates we demonstrate that

$\begin{align*} &\eta_{1, \tilde{\mathcal{T}}_h}^2(\tilde{p}_h, \tilde{\mathcal{T}_h})-(1+\delta_{1})\bigg\{\eta_{1, \mathcal{T}_h}^2(p_h, \mathcal{T}_h)-(1-2^{-1/2})\eta_{1, \mathcal{T}_h}(p_h, \mathcal{R}_h)\bigg\}\\ = &\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, \mathcal{R}_h)+\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, \mathcal{T}_h\backslash\mathcal{R}_h)-(1+\delta_{1})\bigg\{\eta_{1, \mathcal{T}_h}^2(p_h, \mathcal{T}_h) -(1-2^{-1/2})\eta_{1, \mathcal{T}_h}(p_h, \mathcal{R}_h)\bigg\}\\ \leq&2^{-1/2}\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, \mathcal{R}_h)+\eta_{1, \tilde{\mathcal{T}_h}}^2(p_h, \mathcal{T}_h\backslash\mathcal{R}_h)-(1+\delta_{1})\bigg\{\eta_{1, \mathcal{T}_h}^2 (p_h, \mathcal{T}_h)-(1-2^{-1/2})\eta_{1, \mathcal{T}_h}(p_h, \mathcal{R}_h)\bigg\}\\ \leq&C\left(1+\delta_{1}^{-1}\right)h_{0}^2\|p_h-\tilde{p}_h\|_{1}^2, \end{align*}$

which illustrates (4.11) has been proved.

Step 2. Employing the Young's inequality with parameter $\delta$ and (4.5) we obtain

$\begin{align*} &\eta_{3, \tilde{\mathcal{T}_h}}^2(\tilde{y}_h, \tilde{p}_h, \tilde{\mathcal{T}_h})-\eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, \tilde{\mathcal{T}_h})\\\leq& C(1+\delta^{-1})h_T^2\bigg(\sum\limits_{T\in\mathcal{T}_h}h_{T}\|\tilde{y}_h-y_h\|_{0, T}^2+\|\tilde{p}_h-p_h\|_1^2\bigg) +\delta\eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, \tilde{\mathcal{T}_h}), \end{align*}$

where it is similar to (4.15). Then let $\tilde{\mathcal{T}}_{h_{T'}} = \{T\in \tilde{\mathcal{T}_h}:T\subset T'\}$ where $T'\in\mathcal{M}_h$ is a marked element. For arbitrary $p_h\in V_{\mathcal{T}_h}\subset V_{\tilde{\mathcal{T}_h}}$ , we find the jump $[\nabla p_h] = 0$ on the interior sides of $\cup\tilde{\mathcal{T}}_{h_{T'}}$ . Suppose $b$ is the number of bisections, we can deduce that

$h_{T} = |T|^{1/2}\leq(2^{-b}|T'|)^{1/2}\leq 2^{-\frac{b}{2}}h_{T'},$

due to refinement by bisection, then we obtain

$\begin{align*} \sum\limits_{T\in\tilde{\mathcal{T}_h}_{T'}}\eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, T)\leq2^{-\frac{b}{2}}\eta_{3, \mathcal{T}_h}^2(y_h, p_h, T'). \end{align*}$

It is easy to find that

$\begin{align*} \eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, T)\leq\eta_{3, \mathcal{T}_h}^2(y_h, p_h, T), \end{align*}$

for any $T\in\mathcal{T}_h\backslash\mathcal{M}_h$ . In connection with the above estimates we expound that

$\begin{align*} \eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, \tilde{\mathcal{T}_h})& = \eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, \mathcal{M}_h) +\eta_{3, \tilde{\mathcal{T}_h}}^2(y_h, p_h, \mathcal{T}_h\backslash\mathcal{M}_h)\\ &\leq 2^{-\frac{b}{2}}\eta_{3, {\mathcal{T}_h}}^2(y_h, p_h, \mathcal{M}_h)+\eta_{3, \mathcal{T}_h}^2(y_h, p_h, \mathcal{T}_h\backslash\mathcal{M}_h)\\ & = \eta_{3, {\mathcal{T}_h}}^2(y_h, p_h, \mathcal{T}_h)-(1-2^{-\frac{b}{2}})\eta_{3, {\mathcal{T}_h}}^2(y_h, p_h, \mathcal{M}_h). \end{align*}$

As has been said, we can achieve (4.13) connecting with the above estimates to which the proof of (4.12) is similar.

Step 3. For arbitrary $T\in\mathcal{T}_h\cap\tilde{\mathcal{T}_h}$ via using (4.7) and the Young's inequality we obtain that

$\begin{align*} &osc_{\mathcal{T}_h}(y_h-y_d, T) = osc_{\tilde{\mathcal{T}_h}}(y_h-y_d, T), \\ &osc_{\mathcal{T}_h}^2(y_h-y_d, T)-2osc_{\tilde{\mathcal{T}_h}}^2(\tilde{y}_h-y_d, T)\leq 2Ch_T^4\|\tilde{y}_h-y_h\|_{1, T}^2. \end{align*}$

Obviously, (4.14) can be got by summing the above inequality over $T\in\mathcal{T}_h\cap\tilde{\mathcal{T}_h}$ . To sum up, Lemma 4.3 is proved.

4.3. Quasi-orthogonality

As to the proof of the convergence, one of the main obstacle is that there do not have the orthogonality while it is vital to prove the convergence. Thus getting back to the second place we transfer proof of the quasi-orthogonality. The latter is popularly adopted in the adaptive mixed and the nonconforming adaptive finite element methods ^[19]. Apparently it is true for the following basic relationships with $\mathcal{T}_{h_k}, \mathcal{T}_{h_{k+1}}\in\mathbb{T}$ and $\mathcal{T}_{h_k}\subset\mathcal{T}_{h_{k+1}}$ ,

$\begin{align} &\|u-u_{h_{k+1}}\|_{0}^2 = \|u-u_{h_k}\|_{0}^2-\|u_{h_k}-u_{h_{k+1}}\|_{0}^2-2(u-u_{h_{k+1}}, u_{h_{k+1}}-u_{h_k}), \end{align}$

(4.16)

$\begin{align} &\|y-y_{h_{k+1}}\|_{1}^2 = \|y-y_{h_k}\|_{1}^2-\|y_{h_k}-y_{h_{k+1}}\|_{1}^2-2a(y-y_{h_{k+1}}, y_{h_{k+1}}-y_{h_k}), \end{align}$

(4.17)

$\begin{align} &\|p-p_{h_{k+1}}\|_{1}^2 = \|p-p_{h_k}\|_{1}^2-\|p_{h_k}-p_{h_{k+1}}\|_{1}^2-2a(p-p_{h_{k+1}}, p_{h_{k+1}}-p_{h_k}), \end{align}$

(4.18)

where $(u, y, p)$ are the solution of (2.3)–(2.5), $(u_{h_k}, y_{h_k}, p_{h_k})$ and $(u_{h_{k+1}}, y_{h_{k+1}}, p_{h_{k+1}})$ are the solution of (2.8)–(2.10) with respect to $\mathcal{T}_{h_k}$ and $\mathcal{T}_{h_{k+1}}$ , respectively. Accordingly we have the quasi-orthogonality below.

Lemma 4.4. For $\mathcal{T}_{h_k}, \mathcal{T}_{h_{k+1}}\in\mathbb{T}$ and $\mathcal{T}_{h_k}\subset\mathcal{T}_{h_{k+1}}$ , we have

$\begin{align} &\quad\ (1-\delta)\|u-u_{h_{k+1}}\|_{0}^2-\|u-u_{h_k}\|_{0}^2+\|u_{h_k}-u_{h_{k+1}}\|_{0}^2\\ &\leq C\delta^{-1}\left(\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{R}_h)+\mathcal{J}^2(h_0)\big(\eta_{2, \mathcal{T}_{h_k}}^2(u_{h_k}, y_{h_k}, \mathcal{R}_h) +\eta_{3, \mathcal{T}_{h_k}}^2(y_{h_k}, p_{h_k}, \mathcal{R}_h)\big)\right), \end{align}$

(4.19)

and

$\begin{align} &\quad\ (1-\delta)\|y-y_{h_{k+1}}\|_{1}^2-\|y-y_{h_k}\|_{1}^2-\|y_{h_k}-y_{h_{k+1}}\|_{1}^2-\delta\|u-u_{h_{k+1}}\|_{0}^2\\ &\leq C\delta^{-1}\left(\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{R}_h)+\mathcal{J}^2(h_0)\big(\eta_{2, \mathcal{T}_{h_k}}^2(u_{h_k}, y_{h_k}, \mathcal{R}_h) +\eta_{3, \mathcal{T}_{h_k}}^2(y_{h_k}, p_{h_k}, \mathcal{R}_h)\big)\right), \end{align}$

(4.20)

and

$\begin{align} &\quad\ (1-\delta)\|p-p_{h_{k+1}}\|_{1}^2-\|p-p_{h_k}\|_{1}^2+\|p_{h_k}-p_{h_{k+1}}\|_{1}^2-\delta\|y-y_{h_{k+1}}\|_{1}^2\\ &\leq C\delta^{-1}\left(\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{R}_h)+\mathcal{J}^2(h_0)\big(\eta_{2, \mathcal{T}_{h_k}}^2(u_{h_k}, y_{h_k}, \mathcal{R}_h) +\eta_{3, \mathcal{T}_{h_k}}^2(y_{h_k}, p_{h_k}, \mathcal{R}_h)\big)\right). \end{align}$

(4.21)

Proof. Step 1. It follows from Lemma 4.3 in ^[19] that for $U_{ad}^{h_k}\subset U_{ad}^{h_{k+1}}$ we have

$\begin{align} \alpha\|u_{h_{k+1}}-u_{h_k}\|_0^2\leq&(p_{h_{k+1}}-p_{h_k}, u_{h_k}-u_{h_{k+1}})+(\alpha u_{h_k}+p_{h_k}, u_{h_k}-u_{h_{k+1}})\\ \leq&(S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_k}^*(S_{h_k}(f+u_{h_k})-y_{d}), u_{h_k}-u_{h_{k+1}})\\ &+C\eta_{1, \mathcal{T}_{h_k}}(p_{h_k}, \mathcal{R}_h)\|u_{h_k}-u_{h_{k+1}}\|_{0}, \end{align}$

(4.22)

where $\mathcal{R}_h$ is the set of elements which are refined from $\mathcal{T}_{h_k}$ to $\mathcal{T}_{h_{k+1}}$ .

For the right-hand first item of (4.22) we let $\zeta_h\in H_{0}^1(\Omega)$ be the solution of the following problem based on Lemma 4.1

$\begin{align*} &a(\zeta_h, q) = (S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_k}^*(S_{h_k}(f+u_{h_k})-y_{d}), q), \quad\forall \ q\in H_{0}^1(\Omega). \end{align*}$

Hence we can get

$\begin{align*} &\|S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{0}^2\\ = &a(\zeta_h, S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}))\\ = &a(\zeta_h-\zeta_{h_k}, S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}))\\ &+(S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k}), \zeta_{h_k}-\zeta_h)+(S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k}), \zeta_h). \end{align*}$

From the proof of Lemma 3.3 in ^[19], we infer that

$\begin{align*} &a(\zeta_h-\zeta_{h_k}, S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}))\\ \leq&\mathcal{J}(h_0)\|S_{h_{k+1}}^*(S_{h_{k+1}}(f+u_{h_k})-y_{d})-S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{0} \\ &\cdot\big(\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{1}+\|S_{h_{k+1}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}) -S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{1}\big), \end{align*}$

and

$\begin{align*} &(S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k}), \zeta_{h_k}-\zeta_h)\\ \leq&C\mathcal{J}(h_0)\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{0}\\& \cdot\|S_{h_{k+1}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}) -S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{0}, \end{align*}$

and

$\begin{align*} &(S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k}), \zeta)\\ \leq&\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{0}\cdot\|S_{h_{k+1}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}) -S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{0}. \end{align*}$

Similar to Lemma 4.6 in ^[4], we derive that

$\begin{align*} &\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{1}\leq C\eta_{2, \mathcal{T}_{h_{k}}}(u_{h_k}, y_{h_k}, \mathcal{R}_h), \\ &\|S_{h_{k+1}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}) -S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{1}\leq C\eta_{3, \mathcal{T}_{h_{k}}}(y_{h_k}, p_{h_k}, \mathcal{R}_h). \end{align*}$

In connection with the above estimates we conclude that

$\begin{align*} &\quad\ \|S_{h_{k+1}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}) -S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{0}\\ &\leq C\Big(\mathcal{J}(h_0)(\eta_{2, \mathcal{T}_{h_{k}}}(u_{h_k}, y_{h_k}, \mathcal{R}_h)+\eta_{3, \mathcal{T}_{h_{k}}}(y_{h_k}, p_{h_k}, \mathcal{R}_h)) +\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{0}\Big). \end{align*}$

For the third term at the right end of the above inequality, we let $\varphi_h\in H_{0}^1(\Omega)$ be the solution of the following problem

$\begin{align*} &a(q, \varphi_h) = (S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k}), q), \quad\forall\ q\in H_{0}^1(\Omega). \end{align*}$

According to the standard duality theory, we can deduce that

$\begin{align*} &\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{0}^2\\ = &a(S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k}), \varphi_h-\varphi_{h_{k}})\\ \leq&C\mathcal{J}(h_0)\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{k}}(f+u_{h_k})\|_{1}\cdot\|S_{h_{k+1}}(f+u_{h_k}) -S_{h_{k}}(f+u_{h_k})\|_{0}, \end{align*}$

where $\varphi_{h_k}$ is the standard finite element estimate of $\varphi_h$ with respect to $V_{\mathcal{T}_{h_k}}$ . So we have

$\begin{align*} &\|S_{h_{k+1}}^*(S_{h_{k}}(f+u_{h_k})-y_{d}) -S_{h_{k}}^*(S_{h_{k}}(f+u_{h_k})-y_{d})\|_{0}\\ \leq& C\Big(\mathcal{J}(h_0)(\eta_{2, \mathcal{T}_{h_{k}}}(u_{k}, y_{k}, \mathcal{R}_h)+\eta_{3, \mathcal{T}_{h_{k}}}(y_{h_k}, p_{h_k}, \mathcal{R}_h))\Big). \end{align*}$

As mentioned above, we can get

$\begin{align*} (p_{h_{k+1}}-p_{h_k}, u_{h_k}-u_{h_{k+1}})\leq& C\Big(\mathcal{J}(h_0)(\eta_{2, \mathcal{T}_{h_{k}}}(u_{h_k}, y_{h_k}, \mathcal{R}_h)\\ &+\eta_{3, \mathcal{T}_{h_{k}}}(y_{h_k}, p_{h_k}, \mathcal{R}_h))\|u_{h_k}-u_{h_{k+1}}\|_{0}\Big). \end{align*}$

In combination with (4.22) and above inequality, we deduct that

$\begin{equation} \|u_{h_k}-u_{h_{k+1}}\|_{0}\leq C\Big(\eta_{1, \mathcal{T}_{h_k}}(p_{h_k}, \mathcal{R}_h)+\mathcal{J}(h_0)(\eta_{2, \mathcal{T}_{h_k}}(u_{h_k}, y_{h_k}, \mathcal{R}_h)+\eta_{3, \mathcal{T}_{h_k}}(y_{h_k}, p_{h_k}, \mathcal{R}_h))\Big). \end{equation}$

(4.23)

It is easy to derive the desired result (4.19) with the help of (4.16) and (4.23).

Step 2. Our task now is to prove (4.20) and so is (4.21). Obviously we have

$\begin{align} \|y_{h_{k+1}}-y_{h_k}\|_{0}& = \|S_{h_{k+1}}(f+u_{h_{k+1}})-S_{h_{k}}(f+u_{h_k})\|_{0}\\ &\leq\|S_{h_{k+1}}(f+u_{h_{k+1}})-S_{h_{k+1}}(f+u_{h_k})\|_{0}+\|S_{h_{k+1}}(f+u_{h_k})-S_{h_{h_k}}(f+u_{h_k})\|_{0}\\ &\leq C(\|u_{h_k}-u_{h_{k+1}}\|_{0}+\mathcal{J}(h_0)\eta_{2, \mathcal{T}_{h_k}}(u_{h_k}, y_{h_k}, \mathcal{R}_h)). \end{align}$

(4.24)

By using the Cauchy inequality, we obtain

$\begin{align} &2a(y-y_{h_{k+1}}, y_{h_{k+1}}-y_{h_k})\\ = &2(u-u_{h_{k+1}}, y_{h_{k+1}}-y_{h_k})-2(\phi(y)-\phi(y_{h_{k+1}}), y_{h_{k+1}}-y_{h_k})\\ \leq&\delta\|u-u_{h_{k+1}}\|_{0}^2+\frac{1}{\delta}\|y_{h_{k+1}}-y_{h_k}\|_{0}^2+(\tilde{\phi}'(y)(y-y_{h_{k+1}}), y_{h_{k+1}}-y_{h_k})\\ \leq&\delta\|u-u_{h_{k+1}}\|_{0}^2+\frac{1}{\delta}\|y_{h_{k+1}}-y_{h_k}\|_{0}^2+\|\tilde{\phi}'(y)\|_0\|y-y_{h_{k+1}}\|_0\|y_{h_{k+1}}-y_{h_k}\|_0\\ \leq&\delta\|u-u_{h_{k+1}}\|_{0}^2+\frac{1}{\delta}\|y_{h_{k+1}}-y_{h_k}\|_{0}^2+C\left(\delta\|y-y_{h_{k+1}}\|_0^2+\frac{1}{\delta}\|y_{h_{k+1}}-y_{h_k}\|_0^2\right). \end{align}$

(4.25)

It is easy to derive the desired result (4.20) with the assistance of (4.17) and (4.23)–(4.25).

4.4. Convergence

For $\mathcal{T}_{h_k}\in \mathbb{T}$ , we will denote $U_{ad}^{h_{k}}, \ V_{h_{k}}$ and the solution $(u_{h_{k}}, y_{h_{k}}, p_{h_{k}})$ of (2.8)–(2.10) with respect to $\mathcal{T}_{h_{k}}$ by $U_{ad}^{h_k}, \ V_{h_k}$ , and $(u_{h_k}, y_{h_k}, p_{h_k})$ and we define some notations before we prove the convergence of Algorithm 3.1 as follows:

$\begin{align*} &e_{h_k}^2 = \|u-u_{h_k}\|_{0}^2+\|y-y_{h_k}\|_{1}^2+\|p-p_{h_k}\|_{1}^2, \\ &E_{h_k}^2 = \|u_{h_k}-u_{h_{k+1}}\|_{0}^2+\|y_{h_k}-y_{h_{k+1}}\|_{1}^2+\|p_{h_k}-p_{h_{k+1}}\|_{1}^2, \\ &\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\omega) = \eta_{2, \mathcal{T}_{h_k}}^2(u_{h_k}, y_{h_k}, \omega)+\eta_{3, \mathcal{T}_{h_k}}^2(y_{h_k}, p_{h_k}, \omega), \end{align*}$

for $\omega\subset\Omega$ .

Theorem 4.1. Let $(\mathcal{T}_{h_k}, U_{ad}^{h_k}, V_{h_k}, u_{h_k}, y_{h_k}, p_{h_k})$ be the sequence of grids, finite element spaces and discretesolutions produced by the Algorithm 3.1. Then there existconstants $\gamma_{1} > 0, \ \gamma_{2} > 0$ and $\alpha\in (0, 1)$ , onlydepending on the shape regularity of initial grids $\mathcal{T}_{h_0}, \ b$ , and the marking parameter $\theta\in(0, 1]$ , such that

$\begin{equation} e_{h_{k+1}}^2+\gamma_{1}\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})+\gamma_{2}\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}}) \leq\alpha\left(e_{h_k}^2+\gamma_{1}\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})+\gamma_{2}\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\right), \end{equation}$

(4.26)

provided $h_{0}\ll 1$ .

Proof. We get the following results from Theorem 2.1, Lemma 4.3 and Lemma 4.4,

$\begin{align} e_{h_k}^2&\leq C\eta_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k}), \end{align}$

(4.27)

$\begin{align} \tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})&\leq(1+\delta)\Big\{\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k}) -\lambda\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{M}_{h_k})\Big\}+C\left(1+\frac{1}{\delta}\right)E_{h_k}^2, \end{align}$

(4.28)

(4.29)

$\begin{align} (1-2\delta)e_{h_{k+1}}^2&\leq e_{h_k}^2-E_{h_k}^2+C\frac{1}{\delta}\Big(\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{R}_{h_k})+\mathcal{J}(h_0)\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big), \end{align}$

(4.30)

where $\mathcal{R}_{h_k}$ is the set of elements which are refined from $\mathcal{T}_{h_k}$ to $\mathcal{T}_{h_{k+1}}$ . Applying the upper bound in Theorem 2.1, (4.29) can be simplified into

$\begin{align} \eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})&\leq (1+\delta_{1})\Big\{\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})- (1-2^{-1/2})\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{R}_{h_k})\Big\}\\ &\quad+C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big(\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})+\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})\\ &\quad+\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})+\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big). \end{align}$

(4.31)

Multiplying (4.28) and (4.31) with $\tilde{\gamma_{2}}$ and $\tilde{\gamma_{1}}$ , respectively, and adding the results to (4.30) yields

$\begin{align*} &\ \quad(1-2\delta)e_{h_{k+1}}^2+\tilde{\gamma_{1}}\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})+\tilde{\gamma_{2}}\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})\\ &\leq e_{h_k}^2+\tilde{\gamma_{2}}(1+\delta)\Big\{\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})-\lambda\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{M}_{h_k})\Big\}\\ &\quad +\tilde{\gamma_{2}}C\Big(1+\frac{1}{\delta}\Big)E_{h_k}^2-E_{h_k}^2-\Big(\tilde{\gamma_{1}}(1+\delta_{1})(1-2^{-1/2})-C\frac{1}{\delta}\Big)\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{R}_{h_k})\\ &\quad +\tilde{\gamma_{1}}(1+\delta_{1})\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})+C\frac{1}{\delta}\mathcal{J}^2(h_0)\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\\ &\quad +\tilde{\gamma_{1}}C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big(\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})+\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})+ \eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})+\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big). \end{align*}$

If $\tilde{\gamma_{1}}$ is such that

$\tilde{\gamma_{1}}(1+\delta_{1})(1-2^{-1/2})-C\frac{1}{\delta} > 0,$

and one chooses $\tilde{\gamma_{2}}$ such that

$\begin{equation} \tilde{\gamma_{2}}C\Big(1+\frac{1}{\delta}\Big) = 1, \end{equation}$

(4.32)

then we have

$\begin{align*} &\quad(1-2\delta)e_{h_{k+1}}^2+\tilde{\gamma_{1}}\Big(1-C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big)\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})\\ &\quad +\Big(\tilde{\gamma_{2}}-\tilde{\gamma_{1}}C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big)\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})\\ &\leq e_{h_k}^2+\tilde{\gamma_{1}}\Big((1+\delta_{1})+C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big)\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})\\ &\quad +\Big(\tilde{\gamma_{2}}(1+\delta)+\tilde{\gamma_{1}}C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2+C\frac{1}{\delta}\mathcal{J}^2(h_0)\Big)\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\\ &\quad -c\Big(\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{M}_{h_k})+\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{M}_{h_k})\Big), \end{align*}$

where

$c = \min\Big\{\tilde{\gamma_{2}}(1+\delta)\lambda, \tilde{\gamma_{1}}(1+\delta_{1})(1-2^{-1/2})-C\frac{1}{\delta}\Big\}.$

By using the marking strategy in Algorithm 3.1 and the upper bound in Theorem 2.1 to arrive at

$\begin{align*} &\quad(1-2\delta)e_{h_{k+1}}^2+\tilde{\gamma_{1}}\Big(1-C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big)\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})\\ &\quad +\Big(\tilde{\gamma_{2}}-\tilde{\gamma_{1}}C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big)\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})\\ &\leq(1-C\theta\beta)e_{h_k}^2+\Big(\tilde{\gamma_{1}}\Big((1+\delta_{1})+C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2\Big)-c\theta(1-\beta)\Big)\eta_{1, \mathcal{T}_{h_k}}(p_{h_k}, \mathcal{T}_{h_k})\\ &\quad +\Big(\tilde{\gamma_{2}}(1+\delta)+\tilde{\gamma_{1}}C\Big(1+\frac{1}{\delta_{1}}\Big)h_{0}^2+C\frac{1}{\delta}\mathcal{J}^2(h_{0})-c\theta(1-\beta)\Big) \tilde{\eta}_{\mathcal{T}_{h_k}}(\mathcal{T}_{h_k}), \end{align*}$

where $\beta\in(0, 1)$ . Then we deduce that

$\begin{align} &\ \quad e_{h_{k+1}}^2+\gamma_{1}\eta_{1, \mathcal{T}_{h_{k+1}}}^2(p_{h_{k+1}}, \mathcal{T}_{h_{k+1}})+\gamma_{2}\tilde{\eta}_{\mathcal{T}_{h_{k+1}}}^2(\mathcal{T}_{h_{k+1}})\\ &\leq\alpha_{1}e_{h_k}^2+\alpha_{2}\gamma_{1}\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})+\alpha_{3}\gamma_{2}\tilde{\eta}_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k}), \end{align}$

where

$\begin{align*} &\alpha_{1} = \frac{1-C\theta\beta}{1-2\delta}, \\ &\gamma_{1} = \frac{\tilde{\gamma_{1}}\Big(1-C(1+\delta_{1}^{-1})h_{0}^2\Big)}{1-2\delta}, \\ &\gamma_{2} = \frac{\tilde{\gamma_{2}}-\tilde{\gamma_{1}}C(1+\delta_{1}^{-1})h_{0}^2}{1-2\delta}, \\ &\alpha_{2} = \frac{\tilde{\gamma_{1}}\Big((1+\delta_{1})+C(1+\delta_{1}^{-1})h_{0}^2\Big)-c\theta(1-\beta)}{\tilde{\gamma_{1}}\Big(1-C(1+\delta_{1}^{-1})h_{0}^2\Big)}, \\ &\alpha_{3} = \frac{\tilde{\gamma_{2}}(1+\delta)+\tilde{\gamma_{1}}C(1+\delta_{1}^{-1})h_{0}^2+C\delta^{-1}\mathcal{J}^2(h_0)-c\theta(1-\beta)} {\tilde{\gamma_{2}}-\tilde{\gamma_{1}}C(1+\delta_{1}^{-1})h_{0}^2}. \end{align*}$

As long as $\delta < 1$ and $\beta$ is small enough, $\alpha_{1}\in(0, 1)$ can be guaranteed. To facilitate judgment, we transfer the following adjustments to the above formula:

$\begin{align*} &\alpha_{2} = \frac{(1-C(1+\delta_{1}^{-1})h_{0}^2)+\delta_{1}+2C(1+\delta_{1}^{-1})h_{0}^2-\frac{c\theta(1-\beta)}{\tilde{\gamma_{1}}}}{1-C(1+\delta_{1}^{-1})h_{0}^2}, \\ &\alpha_{3} = \frac{\tilde{\gamma_{2}}(1+\delta)-\tilde{\gamma_{1}} C(1+\delta_{1}^{-1})h_{0}^2+2\tilde{\gamma_{1}} C(1+\delta_{1}^{-1})h_{0}^2+C\delta^{-1}\mathcal{J}^2(h_{0})-c\theta(1-\beta)}{\tilde{\gamma_{2}}-\tilde{\gamma_{1}}C(1+\delta_{1}^{-1})h_{0}^2}. \end{align*}$

It is absolutely clear that

$\alpha_{2}\in(0, 1),$

if $h_{0}\ll 1$ and $\delta_{1}$ is sufficiently small. Then consider (4.32) to deduce that

$\gamma_{2} = \frac{\delta^2}{C(1+\delta)},$

which can say

$\alpha_{3}\in(0, 1),$

if $h_{0}\ll 1$ and $\delta$ is sufficiently small. Therefore if choose $\alpha = \max\{\alpha_{1}, \alpha_{2}, \alpha_{3}\}$ , we can derive the expected results.

Theorem 4.2. Let $(\mathcal{T}_{h_k}, U_{ad}^{h_k}, V_{h_k}, u_{h_k}, y_{h_k}, p_{h_k})$ be the sequence of grids, finite element spaces and discretesolutions produced by the Algorithm 3.1 and the conditionsof Theorem 4.1 keep. Then we have

$\begin{align*} \|u-u_{h_k}\|_0^2+\|y-y_{h_k}\|^2_1+\|p-p_{h_k}\|_1^2\rightarrow 0 \quad\mathrm{as}\quad k\rightarrow \infty. \end{align*}$

Proof. It is obviously true combining Theorem 2.1 and Theorem 4.1.

5. Quasi-optimality for adaptive finite element algorithm

In this section we consider the quasi-optimality for the adaptive finite element method. Firstly we give the notations interpretation. For $\mathcal{T}_h, \mathcal{T}_{h_1}, \mathcal{T}_{h_2}\in \mathbb{T}$ , let $\#\mathcal{T}_h$ be the number of elements in $\mathcal{T}_h$ , and $\mathcal{T}_{h_1}\oplus\mathcal{T}_{h_2}$ be the smallest common conforming refinement of $\mathcal{T}_{h_1}$ and $\mathcal{T}_{h_2}$ and satisfies ^[4,27]

$\begin{equation} \#(\mathcal{T}_{h_1}\oplus\mathcal{T}_{h_2})\leq \#\mathcal{T}_{h_1}+\#\mathcal{T}_{h_2}-\#\mathcal{T}_{h_0}. \end{equation}$

(5.1)

According to ^[19], we need defining a function approximation class

$\begin{align*} \mathcal{A}^s: = &\{(u, y, p, y_{d}, f)\in L^2(\Omega)\times H_{0}^1(\Omega)\times H_{0}^1(\Omega)\times L^2(\Omega)\\ &\times L^2(\Omega):|(u, y, p, y_{d}, f)|_{s} < +\infty\}, \end{align*}$

where

$\begin{align*} |(u, y, p, y_{d}, f)|{_s}: = &\sup\limits_{N > 0}N^s\inf\limits_{\mathcal{T}_h\in\mathbb{T}_{N}}\inf\limits_{(u_{h}, y_{h}, p_{h})\in U_{ad}^{h}\times V_{h}\times V_{h}}\{\|u-u_{h}\|_{0}^2\\ &+\|y-y_{h}\|_{1}^2+\|p-p_{h}\|_{1}^2+osc_{\mathcal{T}_h}^2(f, \mathcal{T}_h))+osc_{\mathcal{T}_h}^2(y_{h}-y_{d}, \mathcal{T}_h)\}^{\frac{1}{2}}, \end{align*}$

and

$\mathbb{T}_{N}: = \{\mathcal{T}_h\in\mathbb{T}:\#\mathcal{T}_h-\#\mathcal{T}_{h_0}\leq N_{0}\}.$

5.1. Localized upper bound

To illustrate the quasi-optimality of the adaptive finite element method, we need a local upper bound on the distance between nested solutions ^[4], since the error of this method can only be estimated by using the indicators of refined elements without a buffer layer.

Lemma 5.1. For $\mathcal{T}_h, \tilde{\mathcal{T}}_h\in \mathbb{T}$ and $\mathcal{T}_h\subset\tilde{\mathcal{T}}_h$ , let $\mathcal{R}_h$ bethe set of refined elements from $\mathcal{T}_h$ to $\tilde{\mathcal{T}}_h$ . Let $(u_{h}, y_{h}, p_{h})$ and $(\tilde{u}, \tilde{y}, \tilde{p})$ be the solutions of (2.8)–(2.10) with respect to $\mathcal{T}_h$ and $\tilde{\mathcal{T}}_h$ , respectively. Then there exists a constant $C$ , depending on the shape regularity of initial grids $\mathcal{T}_{h_{0}}$ and $b$ such that

$\begin{equation} \|u_{h}-\tilde{u}\|_{0}^2+\|y_{h}-\tilde{y}\|_{1}^2+\|p_{h}-\tilde{p}\|_{1}^2\leq C\eta_{\mathcal{T}_h}^2(\mathcal{R}_h), \end{equation}$

(5.2)

where

$\eta_{\mathcal{T}_h}^2(\mathcal{R}_h) = \eta_{1, \mathcal{T}_h}^2(p_{h}, \mathcal{R}_h)+\eta_{2, \mathcal{T}_h}^2(u_{h}, y_{h}, \mathcal{R}_h)+\eta_{3, \mathcal{T}_h}^2(y_{h}, p_{h}, \mathcal{R}_h).$

Proof. From (4.23) of Lemma 4.4, we have

$\begin{equation} \|u_{h}-\tilde{u}\|_{0}^2\leq\eta_{\mathcal{T}_h}^2(\mathcal{R}_h). \end{equation}$

(5.3)

By Lemma 4.6 in ^[4], we deduce that

$\begin{align} \|y_{h}-\tilde{y}\|_{1}& = \|S_{h}(f+u_{h})-S_{\tilde{h}}(f+\tilde{u})\|_{1}\\ &\leq\|S_{h}(f+u_{h})-S_{\tilde{h}}(f+u_{h})\|_{1}+\|S_{\tilde{h}}(f+u_{h})-S_{\tilde{h}}(f+\tilde{u})\|_{1}\\ &\leq C(\eta_{2, \mathcal{T}_h}(u_{h}, y_{h}, \mathcal{R}_h)+\|u_{h}-\tilde{u}\|_{0}). \end{align}$

(5.4)

The corresponding result is attained by the similar method above

$\begin{align} \|p_{h}-\tilde{p}\|_{1}& = \|S_{h}^*(S_{h}(f+u_{h})-y_{d})-S_{\tilde{h}}^*(S_{\tilde{h}}(f+\tilde{u})-y_{d})\|_{1}\\ \ &\leq C(\eta_{3, \mathcal{T}_h}(y_{h}, p_{h}, \mathcal{R}_h)+\|y_{h}-\tilde{y}\|_{1}). \end{align}$

(5.5)

In connection with (5.3), (5.4) and (5.5) we can derive (5.2).

5.2. Dörfler property

Dörfler introduced a crucial marking and proved strict energy error reduction for the Laplacian provided the initial grids $\mathcal{T}_{h_0}$ satisfying a mild assumption ^[9]. If the sum of errors satisfy suitable error reductions, the error indicators on the coarse grids must satisfy a Dörfler property on the refined one ^[19].

Lemma 5.2. Assume that the marking parameter $\theta\in(0, \theta^*)$ , where

$\theta^* = \frac{C}{2C(1+h_{0}^4)+1}.$

For $\mathcal{T}_h, \tilde{\mathcal{T}}_h\in\mathbb{T}$ and $\mathcal{T}_h\subset\tilde{\mathcal{T}}_h$ , let $(u_{h}, y_{h}, p_{h})$ and $(\tilde{u}, \tilde{y}, \tilde{p})$ be thesolutions of (2.8)–(2.10) with respect to $\mathcal{T}_h$ and $\tilde{\mathcal{T}}_h$ , respectively. If

$\begin{equation} e_{\tilde{\mathcal{T}}_h}^2+osc_{\tilde{\mathcal{T}}_h}^2(\tilde{\mathcal{T}}_h)\leq \mu[e_{\mathcal{T}_h}^2+osc_{\mathcal{T}_h}^2(\mathcal{T}_h)], \end{equation}$

(5.6)

is satisfied for $\mu: = \frac{1}{2}\Big(1-\frac{\theta}{\theta^*}\Big)$ . Then, the set $\mathcal{R}_h$ of elements which are refined from $\mathcal{T}_h$ to $\tilde{\mathcal{T}}_h$ satisfies the Dörfler property

$\eta_{\mathcal{T}_h}^2(\mathcal{R}_h)\geq\theta\eta_{\mathcal{T}_h}^2(\mathcal{T}_h),$

where

$\begin{align*} &e_{\mathcal{T}_h}^2 = \|u-u_{h}\|_{0}^2+\|y-y_{h}\|_{1}^2+\|p-p_{h}\|_{1}^2, \\ &osc_{\mathcal{T}_h}^2(\omega) = osc_{\mathcal{T}_h}^2(f, \omega)+osc_{\mathcal{T}_h}^2(y_{h}-y_{d}, \omega), \end{align*}$

for $\omega\subset\mathcal{T}_h$ and $e_{\tilde{\mathcal{T}}_h}^2$ , $osc_{\tilde{\mathcal{T}}_h}^2(\tilde{\mathcal{T}}_h)$ similarly defined.

Proof. By the lower bound in Theorem 2.1 and (5.6) to obtain that

$\begin{align} (1-2\mu)C\eta_{\mathcal{T}_h}^2(\mathcal{T}_h)&\leq(1-2\mu)(e_{\mathcal{T}_h}^2+osc_{\mathcal{T}_h}^2(\mathcal{T}_h))\\ &\leq e_{\mathcal{T}_h}^2-2e_{\tilde{\mathcal{T}}_h}^2+osc_{\mathcal{T}_h}^2(\mathcal{T}_h)-2osc_{\tilde{\mathcal{T}}_h}^2(\tilde{\mathcal{T}}_h). \end{align}$

It is well-known that there exists the fundamental relationships:

$\begin{align} &\|u-u_{h}\|_{0}^2\leq2\|u-\tilde{u}\|_{0}^2+2\|u_{h}-\tilde{u}\|_{0}^2, \\ &\|y-y_{h}\|_{1}^2\leq2\|y-\tilde{y}\|_{1}^2+2\|y_{h}-\tilde{y}\|_{1}^2, \\ &\|p-p_{h}\|_{1}^2\leq2\|p-\tilde{p}\|_{1}^2+2\|p_{h}-\tilde{p}\|_{1}^2. \end{align}$

Hence we can get the following result form Lemma 5.1

$\begin{equation} e_{\mathcal{T}_h}^2-2e_{\tilde{\mathcal{T}}_h}^2\leq2C\eta_{\mathcal{T}_h}^2(\mathcal{R}_h). \end{equation}$

(5.7)

For $T\in \mathcal{T}_h\cap\tilde{\mathcal{T}}_h$ , we can get the following result from (4.14) of Lemma 4.3

$\begin{equation} osc_{\mathcal{T}_h}^2(y_{h}-y_{d}, \mathcal{T}_h\cap\tilde{\mathcal{T}}_h)-2osc_{\tilde{\mathcal{T}}_h}^2(\tilde{y}-y_{d}, \mathcal{T}_h\cap\tilde{\mathcal{T}}_h)\leq 2C(h_{0}^4\eta_{\mathcal{T}_h}^2(\mathcal{R}_h)). \end{equation}$

(5.8)

According to Remark 2.1 in ^[4], we get the following result as the indicator $\eta_{\mathcal{T}_h}(T)$ dominates oscillation $osc_{\mathcal{T}_h}(T)$

$\begin{equation} osc_{\mathcal{T}_h}^2(T)\leq\eta_{\mathcal{T}_h}^2(T), \end{equation}$

(5.9)

for all $T\in\mathcal{R}_h$ . Then in connection with (5.7), (5.8) and (5.9) one obtains

$\begin{align*} (1-2\mu)C\eta_{\mathcal{T}_h}^2(\mathcal{T}_h)&\leq(2C(1+h_{0}^4)+1)\eta_{\mathcal{T}_h}^2(\mathcal{R}_h), \\ (1-2\mu)\theta^*\eta_{\mathcal{T}_h}^2(\mathcal{T}_h)&\leq\eta_{\mathcal{T}_h}^2(\mathcal{R}_h), \\ \theta\eta_{\mathcal{T}_h}^2(\mathcal{T}_h)&\leq\eta_{\mathcal{T}_h}^2(\mathcal{R}_h), \end{align*}$

where

$\begin{align*} &\theta^* = \frac{C}{2C(1+h_{0}^4)+1}, \quad\mathrm{and}\quad\theta = (1-2\mu)\theta^*. \end{align*}$

Lemma 5.3. Let $(u, y, p)$ and $(\mathcal{T}_{h_k}, U_{ad}^{h_k}, V_{h_k}, u_{h_k}, y_{h_k}, p_{h_k})$ be the solution of (2.3)–(2.5) and the sequence ofgrids, finite element spaces and discrete solutions produced by Algorithm 3.1, respectively. Assume that the markingparameter $\theta$ satisfies the condition in Lemma 5.2, then the following estimate is valid

$\begin{equation} \#\mathcal{M}_{h_k}\leq C\Big( N^{\frac{1}{2s}}|(u, y, p, y_{d}, f)|_{s}^{\frac{1}{s}}\mu^{-\frac{1}{2s}}(e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k}))^{-\frac{1}{2s}}\Big), \end{equation}$

(5.10)

if $(u, y, p, y_{d}, f)\in \mathcal{A}^s$ .

Proof. Let $\epsilon^2: = \mu N^{-1}(e_{\mathcal{T}_{k}}^2+osc_{\mathcal{T}_{k}}^2(\mathcal{T}_{k}))$ , where $N$ shall be produced in the proof of (5.13) and $\mu$ is defined in Lemma 5.2. Because of $(u, y, p, y_{d}, f)\in \mathcal{A}^s$ , there exists a $\mathcal{T}_{h_\epsilon}\in\mathbb{T}$ and a $(u_{h_\epsilon}, y_{h_\epsilon}, p_{h_\epsilon})\in U_{ad}^{h_{\epsilon}}\times V_{h_{\epsilon}}\times V_{h_{\epsilon}}$ such that

$\begin{equation} \#\mathcal{T}_{h_\epsilon}-\#\mathcal{T}_{h_0}\leq|(u, y, p, y_{d}, f)|_{s}^{1/s}\in\epsilon^{-1/s}, \end{equation}$

(5.11)

and

$\begin{equation} \|u-u_{h_\epsilon}\|_{0}^2+\|y-y_{h_\epsilon}\|_{1}^2+\|p-p_{h_\epsilon}\|_{1}^2+osc_{\mathcal{T}_{h_\epsilon}}^2(f, \mathcal{T}_{h_\epsilon}) +osc_{\mathcal{T}_{h_\epsilon}}^2(y_{h_\epsilon}-y_{d}, \mathcal{T}_{h_\epsilon})\leq\epsilon^2. \end{equation}$

(5.12)

Let $(u_{h_*}, y_{h_*}, p_{h_*})$ be the solution of (2.8)–(2.10) with respect to $\mathcal{T}_{h_*}$ , where $\mathcal{T}_{h_*} = \mathcal{T}_{h_\epsilon}\oplus\mathcal{T}_{h_k}$ is the smallest common refinement of $\mathcal{T}_{h_\epsilon}$ and $\mathcal{T}_{h_k}$ . In the following we will prove the following inequality firstly

$\begin{equation} e_{h_*}^2+osc_{\mathcal{T}_{h_*}}^2(\mathcal{T}_{h_*})\leq N\Big(e_{\mathcal{T}_{h_\epsilon}}^2+osc_{\mathcal{T}_{h_\epsilon}}^2(\mathcal{T}_{h_\epsilon})\Big), \end{equation}$

(5.13)

where

$\begin{align*} &e_{\mathcal{T}_{h_\epsilon}}^2: = \|u-u_{h_\epsilon}\|_{0}^2+\|y-y_{h_\epsilon}\|_{1}^2+\|p-p_{h_\epsilon}\|_{1}^2, \\ &osc_{\mathcal{T}_{h_\epsilon}}^2(\mathcal{T}_{h_\epsilon}): = osc_{\mathcal{T}_{h_\epsilon}}^2(f, \mathcal{T}_{h_\epsilon}) +osc_{\mathcal{T}_{h_\epsilon}}^2(y_{h_\epsilon}-y_{d}, \mathcal{T}_{h_\epsilon}). \end{align*}$

According to the principle of adding one item and subtracting one item, then we have

$\begin{align*} &\|u-u_{h_\epsilon}\|_{0}^2 = \|u-u_{h_*}\|^2_{0}+\|u_{h_\epsilon}-u_{h_*}\|_{0}^2+2(u-u_{h_*}, u_{h_*}-u_{h_\epsilon}), \\ &\|y-y_{h_\epsilon}\|_{1}^2 = \|y-y_{h_*}\|_{1}^2+\|y_{h_\epsilon}-y_{h_*}\|_{1}^2+2a(y-y_{h_*}, y_{h_*}-y_{h_\epsilon}), \\ &\|p-p_{h_\epsilon}\|_{1}^2 = \|p-p_{h_*}\|_{1}^2+\|p_{h_\epsilon}-p_{h_*}\|_{1}^2+2a(p-p_{h_*}, p_{h_*}-p_{h_\epsilon}). \end{align*}$

With the help of the Young's inequality one obtains

$\begin{align*} (u-u_{h_*}, u_{h_*}-u_{h_\epsilon})& = (u-u_{h_\epsilon}, u_{h_*}-u_{h_\epsilon})-(u_{h_*}-u_{h_\epsilon}, u_{h_*}-u_{h_\epsilon})\\ &\leq (u-u_{h_\epsilon}, u_{h_*}-u_{h_\epsilon})\\ &\leq\|u-u_{h_\epsilon}\|_{0}^2+\frac{1}{4}\|u_{h_*}-u_{h_\epsilon}\|_{0}^2, \end{align*}$

and so are $a(y-y_{h_*}, y_{h_*}-y_{h_\epsilon})$ and $a(p-p_{h_*}, p_{h_*}-p_{h_\epsilon})$ . Hence in connection with what we get above we deduce that

$\begin{align} &\ \quad \|u-u_{h_*}\|_{0}^2+\|u_{h_*}-u_{h_\epsilon}\|_{0}^2+\|y-y_{h_*}\|_{a}^2+\|y_{h_*}-y_{h_\epsilon}\|_{1}^2+\|p-p_{h_*}\|_{1}^2+\|p_{h_*}-p_{h_\epsilon}\|_{1}^2\\ &\leq 6\Big(\|u-u_{h_\epsilon}\|_{0}^2+\|y-y_{h_\epsilon}\|_{1}^2+\|p-p_{h_\epsilon}\|_{1}^2\Big). \end{align}$

(5.14)

From Remark 2.1 in ^[4] and (4.14) in Lemma 4.3 with $\mathcal{T}_h = \tilde{\mathcal{T}}_h = \mathcal{T}_{h_*}, \ y = y_{h_*}$ and $\tilde{y} = y_{h_\epsilon}$ , we obtain that

$\begin{align} osc_{\mathcal{T}_{h_*}}^2(y_{h_*}-y_{d}, \mathcal{T}_{h_*})-2osc_{\mathcal{T}_{h_\epsilon}}^2(y_{h_\epsilon}-y_{d}, \mathcal{T}_{h_\epsilon})&\leq osc_{\mathcal{T}_{h_*}}^2(y_{h_*}-y_{d}, \mathcal{T}_{h_*})-2osc_{\mathcal{T}_{h_*}}^2(y_{h_\epsilon}-y_{d}, \mathcal{T}_{h_*})\\ &\leq2Nh_{0}^4\|y_{h_*}-y_{h_\epsilon}\|_{1}^2. \end{align}$

(5.15)

For any $T'\in\mathcal{T}_{h_\epsilon}$ , let $\mathcal{T}_{h_{T'}}: = \{T\in \mathcal{T}_{h_*}:T\in T'\}$ . From the proof of Lemma 4.3 in ^[19], we derive that

$\begin{align*} \sum\limits_{T\in\mathcal{T}_{h_{T'}}}\|f-f_{T}\|_{0, T}^2\leq N\|f-f_{T'}\|_{0, T'}^2. \end{align*}$

And then we can get

$\begin{equation} osc_{\mathcal{T}_{*}}^2(f, \mathcal{T}_{*})\leq N (osc_{\mathcal{T}_{\epsilon}}^2(f, \mathcal{T}_{\epsilon})). \end{equation}$

(5.16)

Combining (5.14)–(5.16) to obtain (5.13) and using (5.12) and the definition of $\epsilon^2$ , we have

$\begin{align} e_{h_*}^2+osc_{\mathcal{T}_{h_*}}^2(\mathcal{T}_{h_*})\leq N\Big(e_{\mathcal{T}_{h_\epsilon}}^2+osc_{\mathcal{T}_{h_\epsilon}}^2(\mathcal{T}_{h_\epsilon})\Big)\leq N\epsilon^2 = \mu\Big(e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big). \label{4-17} \end{align}$

It is true for the following result from Lemma 5.2

$\begin{equation} \#\mathcal{M}_{h_k}\leq\#\mathcal{R}_h\leq\#\mathcal{T}_{h_*}-\#\mathcal{T}_{h_k}\leq\#\mathcal{T}_{h_\epsilon}-\#\mathcal{T}_{h_0}. \end{equation}$

(5.17)

Combining (5.11), (5.17) and the definition of $\epsilon^2$ to derive the desired result (5.10).

5.3. Quasi-optimality

The following consequence is the result of previous estimates where the truth is that the number of elements is a dwindle for the errors. Namely, if a given adaptive method is used to approximate the exact solution at a certain convergence rate, the iteratively constructed grids sequence will achieve this rate until a constant factor.

Theorem 5.1. Let $(u, y, p)$ and $(\mathcal{T}_{h_k}, U_{ad}^{h_k}, V_{h_k}, u_{h_k}, y_{h_k}, p_{h_k})$ be the solution of (2.3)–(2.5) and the sequence ofgrids, finite element spaces and discrete solutions produced by Algorithm 3.1, respectively. Assume that $\mathcal{T}_{h_0}$ satisfies the condition (b) of Section 4 in ^[27]. Let $(u, y, p, y_{d}, f)\in \mathcal{A}_{s}$ , then we have

$\begin{equation} \#\mathcal{T}_{h_k}-\#\mathcal{T}_{h_0}\leq C|(u, y, p, y_{d}, f)|_{s}^{\frac{1}{s}}\Big(e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big)^{-\frac{1}{2s}}, \end{equation}$

(5.18)

provided $h_{0}\ll 1$ .

Proof. It follows from Theorem 2.1 and Lemma 5.3 that

$\begin{equation} e_{h_k}^2+\gamma_{1}\eta_{1, \mathcal{T}_{h_k}}^2(p_{h_k}, \mathcal{T}_{h_k})+\gamma_{2}\tilde{\eta}_{\mathcal{T}_{h_k}}(\mathcal{T}_{h_k})\approx e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k}). \end{equation}$

(5.19)

From Lemma 2.3 in ^[4], we have

$\begin{equation} \#\mathcal{T}_{h_k}-\#\mathcal{T}_{h_0}\leq C\sum\limits_{i = 0}^{k-1}\mathcal{M}_{h_i}. \end{equation}$

(5.20)

With the assistance of Lemma 5.3 and (5.20) to gain

$\begin{equation} \#\mathcal{T}_{h_k}-\#\mathcal{T}_{h_0}\leq C\sum\limits_{i = 0}^{k-1}\mathcal{M}_{h_i}\leq C\Big( M\sum\limits_{i = 0}^{k-1}\Big(e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big)^{-\frac{1}{2s}}\Big), \end{equation}$

(5.21)

where

$M = N^{\frac{1}{2s}}|(u, y, p, y_{d}, f)|_{s}^{\frac{1}{s}}\alpha^{-\frac{1}{2s}}.$

Then it follows from (5.19), (5.21) and Theorem 4.1 that

$\begin{align} \#\mathcal{T}_{h_k}-\#\mathcal{T}_{h_0}&\leq C\Big( M\Big(e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big)^{-\frac{1}{2s}}\sum\limits_{i = 1}^k\alpha^{-\frac{1}{2s}}\Big)\\ &\leq C|(u, y, p, y_{d}, f)|_{s}^{\frac{1}{s}}\Big(e_{h_k}^2+osc_{\mathcal{T}_{h_k}}^2(\mathcal{T}_{h_k})\Big)^{-\frac{1}{2s}}, \end{align}$

which tells the proof of Theorem 5.1.

6. Numerical experiments

In this section, we firstly present an adaptive finite element and then give the adaptive iteration method where the purpose is to provide empirical analysis for our theory.

Example 1. We consider the nonlinear optimal control problem governed by nonlinear elliptic equations subject to the state equation

$\begin{align*} -\Delta y+y^3 = f+u, \quad-\Delta p+3y^2p = y-y_d, \end{align*}$

where we choose $\alpha = 1$ and $\Omega = [0, 1]\times[0, 1]$ and apparently exact solution

$\begin{align*} u = &\frac{1}{\alpha}(\max(0, \bar{p})-p), \\ y = &\sin(\pi x_1)+\sin(\pi x_2), \\ p = &-y. \end{align*}$

By simple calculation we have $\int_\Omega pdx = -\frac{4}{\pi}$ which satisfies $u\in U_{ad}$ .

We choose 15 adaptive loops for Example 1, then we plot the profiles of the exact state and the numerical state on adaptively refined grids with $\theta = 0.5$ in Figure 1. It is easy to see that the solution is smooth, but we can find larger gradients in some regions, hence comparing with uniform refinement, adaptive finite element method can provide smaller error. In Figure 2, we provide the triangle refined grids after 6 and 12 adaptive iterations of Algorithm 3.1.

Figure 1. The exact state (left) and the numerical state (right) for Example.

DownLoad: Full-Size Img PowerPoint

Figure 2. The adaptive grids after 6 steps (left) and 12 steps (right) for Example.

DownLoad: Full-Size Img PowerPoint

In , we plot the convergence history for the errors where the left is adaptive refinement $(\theta = 0.5)$ and the right is uniform refinement $(\theta = 1)$ . We can find it very intuitively when we provide the optimal convergence rate slope $-1$ via adopting the linear finite elements, the error reduction can be observed.

Figure 3. The error estimate compares between adaptively (left) and uniformly (right) refined grids for Example 1.

DownLoad: Full-Size Img PowerPoint

Example 2. We consider the same nonlinear optimal control problem as Example 1 with $\alpha = 0.1$ , $\Omega = (-1, 1)\times(0, 1)\cup(-1, 0)\times(-1, 0]$ and apparently exact solution

$\begin{align*} &p = \begin{cases} -5\times10^{10}e^\frac{1}{m}, \quad&m < 0, \\0, &m\geq 0, \end{cases}\\ &y = -p, \\ &m = (x_1-0.2)^2+(x_2-0.6)^2-0.04, \end{align*}$

where $u\in U_{ad}$ can be guaranteed.

Comparing with Example 1, we provide some plots concerning with 21 adaptive loops for Example 2 with $\theta = 0.5$ . In , it is more easy to say that the solution is smooth while the large gradients can be found in some regions illustrating that adaptive refinement can obtain smaller errors than the uniformly refinement where we offer the error estimate graphs to explain. In , the left plot tells us the error estimates on adaptive refinement $(\theta = 0.5)$ and the right shows the error estimates on uniform refinement $(\theta = 1)$ . With the slope $-1$ being the optimal convergence rate expected, we see the error reduction from . Meanwhile we can also find that the convergence order of the total-error and the error estimate indicators are approaching to straight line slope $-1$ in which they are roughly parallel where it is showed that the posteriori error estimates we obtained in Section 2 are reliable.

Figure 4. The exact state (upper-middle) and the numerical state (left) and the adjoint state (right) variables for Example 2.

DownLoad: Full-Size Img PowerPoint

Figure 5. The error estimate compares between adaptively (left) and uniformly (right) refined grids for Example 2.

DownLoad: Full-Size Img PowerPoint

In , we show the adaptive grids after 9 and 19 adaptive iterations for Example 2 of 21 adaptive loops with $\theta = 0.5$ . We can find that the grids are concentrated on the regions where the solutions have larger gradients. Only can we note that reduced orders are observed for the uniform refinement because of the singularity of the solutions.

Figure 6. The adaptive grids after 9 steps (left) and 19 steps (right) for Example 2.

DownLoad: Full-Size Img PowerPoint

7. Conclusions and future expectation

In this paper, we first study the adaptive finite element method for nonlinear optimal control problems and give the corresponding adaptive algorithm. To evaluate the adaptive finite element method, we obtain the a posteriori error estimates for the nonlinear elliptic equations with upper and lower bound convergence and optimality, which are also important indicators for evaluating the algorithms. Therefore, we prove that the sum of the posterior errors of the control, state and covariance variables are convergent, as shown in Theorem 4.2. Based on the local upper bound, we prove the quasi-optimality of the proposed adaptive algorithm, see Section 5. To verify our theoretical analysis, we finally provide some numerical simulations. In previous research papers, the finite element methods for linear optimal control problems were studied. Our innovation is to extend the method of linear optimal control problems to a series of nonlinear optimal control problems.

There are a lot of problems which can not be tackled, such as the $L^2-L^2$ posteriori error estimates for nonlinear elliptic equations as well as the convergence and quasi-optimality for nonlinear parabolic equations. Furthermore, we note that the analysis in this paper can be generalized to common nonlinear parabolic problems and boundary problems, and we will work on these problems.

Acknowledgments

This work is supported by National Science Foundation of China (11201510), National Social Science Fund of China (19BGL190), Chongqing Research Program of Basic Research and Frontier Technology (cstc2019jcyj-msxmX0280), General project of Chongqing Natural Science Foundation (cstc2021jcyj-msxmX0949), Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJZD-K20200120), Chongqing Key Laboratory of Water Environment Evolution and Pollution Control in Three Gorges Reservoir Area (WEPKL2018YB-04), Science and technology research project of Chongqing Education Commission (KJQN202101212), Research Center for Sustainable Development of Three Gorges Reservoir Area (2019sxxyjd07), Guangdong Basic and Applied Basic Research Foundation of Joint Fund Project (2021A1515111048), and Guangdong Province Characteristic Innovation Project (2021WTSCX120).

Conflict of interest

The authors declare that they have no competing interests.

References

[1]	M. Hilbert, Big data for development: A review of promises and challenges, Dev. Policy. Rev., 34 (2016), 135–174. http://doi.org/10.1111/dpr.12142 doi: 10.1111/dpr.12142
[2]	C. Wang, J. Wu, J. Yan, Statistical methods and computing for big data, Stat. Interface, 9 (2016), 399. https://dx.doi.org/10.4310/SII.2016.v9.n4.a1 doi: 10.4310/SII.2016.v9.n4.a1
[3]	H. Wang, Y. Ma, Optimal subsampling for quantile regression in big data, Biometrika, 108 (2021), 99–112. https://doi.org/10.1093/biomet/asaa043 doi: 10.1093/biomet/asaa043
[4]	H. Wang, R. Zhu, P. Ma, Optimal subsampling for large sample logistic regression, J. Am. Stat. Assoc., 117 (2022), 265–276. https://doi.org/10.1080/01621459.2020.1773832 doi: 10.1080/01621459.2020.1773832
[5]	X. Chen, W. Liu, X. Mao, Z. Yang, Distributed high-dimensional regression under a quantile loss function, J. Mach. Learn. Res., 21 (2020), 7432–7474. https://doi.org/10.1214/18-AOS1777 doi: 10.1214/18-AOS1777
[6]	A. Hu, Y. Jiao, Y. Liu, Y. Shi, Y. Wu, Distributed quantile regression for massive heterogeneous data, Neurocomputing, 448 (2021), 249–262. https://doi.org/10.1016/j.neucom.2021.03.041 doi: 10.1016/j.neucom.2021.03.041
[7]	R. Jiang, K. Yu, Smoothing quantile regression for a distributed system, Neurocomputing, 466 (2021), 311–326. https://doi.org/10.1016/j.neucom.2021.08.101 doi: 10.1016/j.neucom.2021.08.101
[8]	M. I. Jordan, J. D. Lee, Y. Yang, Communication-efficient distributed statistical inference, J. Am. Stat. Assoc., 526 (2018), 668–681. https://doi.org/10.1080/01621459.2018.1429274 doi: 10.1080/01621459.2018.1429274
[9]	N. Lin, R. Xi, Aggregated estimating equation estimation, Stat. Interface, 4 (2011), 73–83. https://dx.doi.org/10.4310/SII.2011.v4.n1.a8 doi: 10.4310/SII.2011.v4.n1.a8
[10]	L. Luo, P. Song, Renewable estimation and incremental inference in generalized linear models with streaming data sets, J. R. Stat. Soc. B, 82 (2020), 69–97. https://doi.org/10.1111/rssb.12352 doi: 10.1111/rssb.12352
[11]	C. Shi, R. Song, W. Lu, R. Li, Statistical inference for high-dimensional models via recursive online-score estimation, J. Am. Stat. Assoc., 116 (2021), 1307–1318. https://doi.org/10.1080/01621459.2019.1710154 doi: 10.1080/01621459.2019.1710154
[12]	E. D. Schifano, J. Wu, C. Wang, J. Yan, M. Chen, Online updating of statistical inference in the big data setting, Technometrics, 58 (2016), 393–403. https://doi.org/10.1080/00401706.2016.1142900 doi: 10.1080/00401706.2016.1142900
[13]	S. Mohamad, A. Bouchachia, Deep online hierarchical dynamic unsupervised learning for pattern mining from utility usage data, Neurocomputing, 390 (2020), 359–373. https://doi.org/10.1016/j.neucom.2019.08.093 doi: 10.1016/j.neucom.2019.08.093
[14]	H. M. Gomes, J. Read, A. Bifet, J. Paul, J. Gama, Machine learning for streaming data: State of the art, challenges, and opportunities, ACM Sigkdd Explor. Newslett., 21 (2019), 6–22. https://doi.org/10.1145/3373464.3373470 doi: 10.1145/3373464.3373470
[15]	L. Lin, W. Li, J. Lu, Unified rules of renewable weighted sums for various online updating estimations, arXiv Preprint, 2020. https://doi.org/10.48550/arXiv.2008.08824
[16]	C. Wang, M. Chen, J. Wu, J. Yan, Y. Zhang, E. Schifano, Online updating method with new variables for big data streams, Can. J. Stat., 46 (2018), 123–146. https://doi.org/10.1002/cjs.11330 doi: 10.1002/cjs.11330
[17]	J. Wu, M. Chen, Online updating of survival analysis, J. Comput. Graph. Stat., 30 (2021), 1209–1223. https://doi.org/10.1080/10618600.2020.1870481 doi: 10.1080/10618600.2020.1870481
[18]	Y. Xue, H. Wang, J. Yan, E. D. Schifano, An online updating approach for testing the proportional hazards assumption with streams of survival data, Biometrics, 76 (2020), 171–182. https://doi.org/10.1111/biom.13137 doi: 10.1111/biom.13137
[19]	S. Balakrishnan, D. Madigan, A one-pass sequential Monte Carlo method for Bayesian analysis of massive datasets, Bayesian Anal., 1 (2006), 345–361. https://doi.org/10.1214/06-BA112 doi: 10.1214/06-BA112
[20]	L. N. Geppert, K. Ickstadt, A. Munteanu, J. Quedenfeld, C. Sohler, Random projections for Bayesian regression, Biometrics, 27 (2017), 79–101. https://doi.org/10.1007/s11222-015-9608-z doi: 10.1007/s11222-015-9608-z
[21]	R. Koenker, G. Bassett, Regression quantiles, Econometrica, 1978, 33–50. https://doi.org/10.2307/1913643 doi: 10.2307/1913643
[22]	Y. Wei, A. Pere, R. Koenker, X. He, Quantile regression methods for reference growth charts, Stat. Med., 25 (2006), 1369–1382. https://doi.org/10.1002/sim.2271 doi: 10.1002/sim.2271
[23]	H. Wang, Z. Zhu, J. Zhou, Quantile regression in partially linear varying coefficient models, Ann. Stat., 2009, 3841–3866. https://doi.org/10.1214/09-AOS695 doi: 10.1214/09-AOS695
[24]	X. He, B. Fu, W. K. Fung, Median regression for longitudinal data, Stat. Med., 22 (2003), 3655–3669. https://doi.org/10.1002/sim.1581 doi: 10.1002/sim.1581
[25]	M. Buchinsky, Changes in the US wage structure 1963–1987: Application of quantile regression, Econometrica, 1994,405–458. https://doi.org/10.2307/2951618 doi: 10.2307/2951618
[26]	A. J. Cannon, Quantile regression neural networks: Implementation in R and application to precipitation downscaling, Comput. Geosci., 37 (2011), 1277–1284. https://doi.org/10.1002/sim.1581 doi: 10.1002/sim.1581
[27]	Q. Xu, K. Deng, C. Jiang, F. Sun, X. Huang, Composite quantile regression neural network with applications, Expert Syst. Appl., 76 (2017), 129–139. https://doi.org/10.1016/j.eswa.2017.01.054 doi: 10.1016/j.eswa.2017.01.054
[28]	X. Chen, W. Liu, Y. Zhang, Quantile regression under memory constraint, Ann. Stat., 47 (2019), 3244–3273. https://doi.org/10.1214/18-AOS1777 doi: 10.1214/18-AOS1777
[29]	L. Chen, Y. Zhou, Quantile regression in big data: A divide and conquer based strategy, Comput. Stat. Data. An., 144 (2020), 106892. https://doi.org/10.1016/j.csda.2019.106892 doi: 10.1016/j.csda.2019.106892
[30]	K. Wang, H. Wang, S. Li, Renewable quantile regression for streaming datasets, Knowl.-Based Syst., 235 (2022), 107675. https://doi.org/10.1016/j.knosys.2021.107675 doi: 10.1016/j.knosys.2021.107675
[31]	Y. Chu, Z. Yin, K. Yu, Bayesian scale mixtures of normals linear regression and Bayesian quantile regression with big data and variable selection, J. Comput. Appl. Math., 428 (2023), 115192. https://doi.org/10.1016/j.cam.2023.115192 doi: 10.1016/j.cam.2023.115192
[32]	K. Lum, A. E. Gelfand, Spatial quantile multiple regression using the asymmetric Laplace process, Bayesian Anal., 7 (2012), 235–258. https://doi.org/10.1214/12-BA708 doi: 10.1214/12-BA708
[33]	M. Smith, R. Kohn, Nonparametric regression using Bayesian variable, J. Econometrics, 75 (1996), 317–343. https://doi.org/10.1016/0304-4076(95)01763-1 doi: 10.1016/0304-4076(95)01763-1
[34]	M. Dao, M. Wang, S. Ghosh, K. Ye, Bayesian variable selection and estimation in quantile regression using a quantile-specific prior, Computation. Stat., 37 (2022), 1339–1368. https://doi.org/10.1007/s00180-021-01181-5 doi: 10.1007/s00180-021-01181-5
[35]	K. E. Lee, N. Sha, E. R. Dougherty, M. Vannucci, B. K. Mallick, Gene selection: A Bayesian variable selection approach, Bioinformatics, 19 (2003), 90–97. https://doi.org/10.1093/bioinformatics/19.1.90 doi: 10.1093/bioinformatics/19.1.90
[36]	R. Chen, C. Chu, T. Lai, Y. Wu, Stochastic matching pursuit for Bayesian variable selection, Stat. Comput., 21 (2011), 247–259. https://doi.org/10.1007/s11222-009-9165-4 doi: 10.1007/s11222-009-9165-4
[37]	R. Jiang, K. Yu, Renewable quantile regression for streaming data sets, Neurocomputing, 508 (2022), 208–224. https://doi.org/10.1016/j.knosys.2021.107675 doi: 10.1016/j.knosys.2021.107675
[38]	X. Li, The influencing factors on PM $_{2.5}$ concentration of Lanzhou based on quantile eegression, HGU. J., 41 (2018), 61–68. https://doi.org/10.13937/j.cnki.hbdzdxxb.2018.06.009 doi: 10.13937/j.cnki.hbdzdxxb.2018.06.009
[39]	X. Zhang, W. Zhang, Spatial and temporal variation of PM $_{2.5}$ in Beijing city after rain, Ecol. Environ. Sci., 23 (2014), 797–805. https://doi.org/10.3969/j.issn.1674-5906.2014.05.011 doi: 10.3969/j.issn.1674-5906.2014.05.011
[40]	R. Tibshirani, Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. B, 58 (2018), 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x doi: 10.1111/j.2517-6161.1996.tb02080.x
[41]	J. Fan, R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96 (2011), 1348–1360. https://doi.org/10.1198/016214501753382273 doi: 10.1198/016214501753382273
[42]	F. E. Streib, M. Dehmer, High-dimensional LASSO-based computational regression models: Regularization, shrinkage, and selection, Mach. Learn. Know. Extr., 1 (2019), 359–383. https://doi.org/10.3390/make1010021 doi: 10.3390/make1010021
[43]	X. Ma, L. Lin, Y. Gai, A general framework of online updating variable selection for generalized linear models with streaming datasets, J. Stat. Comput. Sim., 93 (2023), 325–340. https://doi.org/10.1080/00949655.2022.2107207 doi: 10.1080/00949655.2022.2107207
[44]	A. Liu, J. Lu, F. Liu, G. Zhang, Accumulating regional density dissimilarity for concept drift detection in data streams, Pattern Recogn., 76 (2018), 256–272. https://doi.org/10.1016/j.patcog.2017.11.009 doi: 10.1016/j.patcog.2017.11.009
[45]	J. Wang, J. Shen, P. Li, Provable variable selection for streaming features, International Conference On Machine Learning, 80 (2018), 5171–5179. Available from: https://proceedings.mlr.press/v80/wang18g.html.
[46]	J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under concept drift: A review, IEEE T. Knowl. Data En., 31 (2018), 2346–2363. https://doi.org/10.1109/TKDE.2018.2876857 doi: 10.1109/TKDE.2018.2876857
[47]	R. Elwell, R. Polikar, Incremental learning of concept drift in nonstationary environments, IEEE T. Neural Networ., 22 (2011), 1517–1531. https://doi.org/10.1109/TNN.2011.2160459 doi: 10.1109/TNN.2011.2160459
[48]	D. Rezende, S. Mohamed, Variational inference with normalizing flows, International Conference On Machine Learning, 22 (2015), 1530–1538. Available from: https://proceedings.mlr.press/v37/rezende15.
[49]	P. Müller, F. A. Quintana, A. Jara, T. Hanson, Bayesian nonparametric data analysis, New York: Springer Press, 2015. https://doi.org/10.1007/978-0-387-69765-9-7
[50]	R. Koenker, J. A. Machado, Goodness of fit and related inference processes for quantile regression, J. Am. Stat. Assoc., 94 (1999), 1296–1310. https://doi.org/10.1109/TNN.2011.2160459 doi: 10.1109/TNN.2011.2160459
[51]	K. Yu, R. A. Moyeed, Bayesian quantile regression, Stat. Probab. Lett., 54 (2001), 437–447. https://doi.org/10.1016/S0167-7152(01)00124-9 doi: 10.1016/S0167-7152(01)00124-9
[52]	M. Geraci, Linear quantile mixed models: The lqmm package for Laplace quantile regression, J. Stat. Softw., 57 (2014), 1–29. https://doi.org/10.18637/jss.v057.i13 doi: 10.18637/jss.v057.i13
[53]	M. Geraci, M. Bottai, Quantile regression for longitudinal data using the asymmetric laplace distribution, Biostatistics, 8 (2007), 140–154. https://doi.org/10.1093/biostatistics/kxj039 doi: 10.1093/biostatistics/kxj039
[54]	D. F. Benoit, D. V. den Poel, bayesQR: A Bayesian approach to quantile regression, J. Stat. Softw., 76 (2017), 1–32. https://doi.org/10.18637/jss.v076.i07 doi: 10.18637/jss.v076.i07

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(1065) PDF downloads(47) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(4) / Tables(10)

AIMS Mathematics

Bayesian quantile regression for streaming data

Related Papers:

Abstract

1. Introduction

2. A posteriori error estimates for finite element method

2.1. A posteriori error estimates

3. The algorithms

4. Convergence analysis for adaptive finite element method

4.1. Local perturbation property

4.2. Error reduction

4.3. Quasi-orthogonality

4.4. Convergence

5. Quasi-optimality for adaptive finite element algorithm

5.1. Localized upper bound

5.2. Dörfler property

5.3. Quasi-optimality

6. Numerical experiments

7. Conclusions and future expectation

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Bayesian quantile regression for streaming data

Related Papers:

Abstract

1. Introduction

2. A posteriori error estimates for finite element method

2.1. A posteriori error estimates

3. The algorithms

4. Convergence analysis for adaptive finite element method

4.1. Local perturbation property

4.2. Error reduction

4.3. Quasi-orthogonality

4.4. Convergence

5. Quasi-optimality for adaptive finite element algorithm

5.1. Localized upper bound

5.2. Dörfler property

5.3. Quasi-optimality

6. Numerical experiments

7. Conclusions and future expectation

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog