Hardware-friendly compression and hardware acceleration for transformer: A survey

Shizhen Huang; Enhao Tang; Shun Li; Xiangzhan Ping; Ruiqi Chen; Shizhen Huang; Enhao Tang; Shun Li; Xiangzhan Ping; Ruiqi Chen

doi:10.3934/era.2022192

Electronic Research Archive

2022, Volume 30, Issue 10: 3755-3785. doi: 10.3934/era.2022192

Previous Article Next Article

Review Special Issues

Hardware-friendly compression and hardware acceleration for transformer: A survey

1.
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350116, China
2.
Department of Optoelectronic Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
3.
Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 200433, China

Academic Editor: Sibo Cheng

Received: 18 June 2022 Accepted: 27 July 2022 Published: 16 August 2022

The transformer model has recently been a milestone in artificial intelligence. The algorithm has enhanced the performance of tasks such as Machine Translation and Computer Vision to a level previously unattainable. However, the transformer model has a strong performance but also requires a high amount of memory overhead and enormous computing power. This significantly hinders the deployment of an energy-efficient transformer system. Due to the high parallelism, low latency, and low power consumption of field-programmable gate arrays (FPGAs) and application specific integrated circuits (ASICs), they demonstrate higher energy efficiency than Graphics Processing Units (GPUs) and Central Processing Units (CPUs). Therefore, FPGA and ASIC are widely used to accelerate deep learning algorithms. Several papers have addressed the issue of deploying the Transformer on dedicated hardware for acceleration, but there is a lack of comprehensive studies in this area. Therefore, we summarize the transformer model compression algorithm based on the hardware accelerator and its implementation to provide a comprehensive overview of this research domain. This paper first introduces the transformer model framework and computation process. Secondly, a discussion of hardware-friendly compression algorithms based on self-attention and Transformer is provided, along with a review of a state-of-the-art hardware accelerator framework. Finally, we considered some promising topics in transformer hardware acceleration, such as a high-level design framework and selecting the optimum device using reinforcement learning.

Keywords:

Citation: Shizhen Huang, Enhao Tang, Shun Li, Xiangzhan Ping, Ruiqi Chen. Hardware-friendly compression and hardware acceleration for transformer: A survey[J]. Electronic Research Archive, 2022, 30(10): 3755-3785. doi: 10.3934/era.2022192

Related Papers:

[1]	Guohui Zhang, Jinghe Sun, Xing Liu, Guodong Wang, Yangyang Yang . Solving flexible job shop scheduling problems with transportation time based on improved genetic algorithm. Mathematical Biosciences and Engineering, 2019, 16(3): 1334-1347. doi: 10.3934/mbe.2019065
[2]	Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
[3]	Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
[4]	Kongfu Hu, Lei Wang, Jingcao Cai, Long Cheng . An improved genetic algorithm with dynamic neighborhood search for job shop scheduling problem. Mathematical Biosciences and Engineering, 2023, 20(9): 17407-17427. doi: 10.3934/mbe.2023774
[5]	Jianguo Duan, Mengting Wang, Qinglei Zhang, Jiyun Qin . Distributed shop scheduling: A comprehensive review on classifications, models and algorithms. Mathematical Biosciences and Engineering, 2023, 20(8): 15265-15308. doi: 10.3934/mbe.2023683
[6]	Zilong Zhuang, Zhiyao Lu, Zizhao Huang, Chengliang Liu, Wei Qin . A novel complex network based dynamic rule selection approach for open shop scheduling problem with release dates. Mathematical Biosciences and Engineering, 2019, 16(5): 4491-4505. doi: 10.3934/mbe.2019224
[7]	Shaofeng Yan, Guohui Zhang, Jinghe Sun, Wenqiang Zhang . An improved ant colony optimization for solving the flexible job shop scheduling problem with multiple time constraints. Mathematical Biosciences and Engineering, 2023, 20(4): 7519-7547. doi: 10.3934/mbe.2023325
[8]	Zichen Wang, Xin Wang . Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274
[9]	Jin Zhang, Nan Ma, Zhixuan Wu, Cheng Wang, Yongqiang Yao . Intelligent control of self-driving vehicles based on adaptive sampling supervised actor-critic and human driving experience. Mathematical Biosciences and Engineering, 2024, 21(5): 6077-6096. doi: 10.3934/mbe.2024267
[10]	Lu-Wen Liao . A branch and bound algorithm for optimal television commercial scheduling. Mathematical Biosciences and Engineering, 2022, 19(5): 4933-4945. doi: 10.3934/mbe.2022231

Abstract

1. Introduction

In this paper, we consider the following diffusion equation on $\Omega\subset \mathbb{R}^2$ ,

$\begin{equation} \left\{ \begin{aligned} -\nabla \cdot (\alpha \nabla u) & = f, \quad \;{\rm{ in }}\; \Omega, \\ u & = 0, \quad \;{\rm{ on }}\; \partial \Omega. \end{aligned} \right. \end{equation}$

(1)

To approximate (1), taking advantage of the adaptive mesh refinement (AMR) to save valuable computational resources, the adaptive finite element method on quadtree mesh is among the most popular ones in the engineering and scientific computing community [20]. Compared with simplicial meshes, quadtree meshes provide preferable performance in the aspects of the accuracy and robustness. There are lots of mature software packages (e.g., [1,2]) on quadtree meshes. To guide the AMR, one possible way is through the a posteriori error estimation to construct computable quantities to indicate the location that the mesh needs to be refined/coarsened, thus to balance the spacial distribution of the error which improves the accuracy per computing power. Residual-based and recovery-based error estimators are among the most popular ones used. In terms of accuracy, the recovery-based error estimator shows more appealing attributes [28,3].

More recently, newer developments on flux recovery have been studied by many researchers on constructing a post-processed flux in a structure-preserving approximation space. Using (1) as an example, given that the data $f\in L^2(\Omega)$ , the flux $-\alpha \nabla u$ is in $\mathit{\boldsymbol{H}}(\mathrm{div}): = \{\mathit{\boldsymbol{v}}\in \mathit{\boldsymbol{L}}^2(\Omega): \nabla \cdot \mathit{\boldsymbol{v}}\in L^2(\Omega)\}$ , which has less continuity constraint than the ones in [28,3] which are vertex-patch based with the recovered flux being $H^1(\Omega)$ -conforming. The $\mathit{\boldsymbol{H}}(\mathrm{div})$ -flux recovery shows more robustness than vertex-patch based ones (e.g., [11,10]).

However, these $\mathit{\boldsymbol{H}}(\mathrm{div})$ -flux recovery techniques work mainly on conforming meshes. For nonconforming discretizations on nonmatching grids, some simple treatment of hanging nodes exists by recovering the flux on a conforming mother mesh [22]. To our best knowledge, there is no literature about the local $\mathit{\boldsymbol{H}}(\mathrm{div})$ -flux recovery on a multilevel irregular quadtree meshes. One major difficulty is that it is impossible to recover a robust computable polynomial flux to satisfy the $\mathit{\boldsymbol{H}}(\mathrm{div})$ -continuity constraint, that is, the flux is continuous in the normal direction on edges with hanging nodes.

More recently, a new class of methods called the virtual element methods (VEM) were introduced in [4,8], which can be viewed as a polytopal generalization of the tensorial/simplicial finite element. Since then, lots of applications of VEM have been studied by many researchers. A usual VEM workflow splits the consistency (approximation) and the stability of the method as well as the finite dimensional approximation space into two parts. It allows flexible constructions of spaces to preserve the structure of the continuous problems such as higher order continuities, exact divergence-free spaces, and many others. The VEM functions are represented by merely the degrees of freedom (DoF) functionals, not the pointwise values. In computation, if an optimal order discontinuous approximation can be computed elementwisely, then adding an appropriate parameter-free stabilization suffices to guarantee the convergence under common assumptions on the geometry of the mesh.

The adoption of the polytopal element brings many distinctive advantages, for example, treating rectangular element with hanging nodes as polygons allows a simple construction of $\mathit{\boldsymbol{H}}(\mathrm{div})$ -conforming finite dimensional approximation space on meshes with multilevel irregularities. We shall follow this approach to perform the flux recovery for a conforming $\mathcal{Q}_k$ discretization of problem (1). Recently, arbitrary level of irregular quadtree meshes have been studied in [21,26,15]. Analyses of the residual-based error estimator on 1-irregular (balanced) quadtree mesh can be found, e.g., in [14]. In the virtual element context, Zienkiewicz-Zhu (ZZ)-type recovery techniques are studied for linear elasticity in [18], and for diffusion problems in [24]. In [18,24], the recovered flux is in $\mathit{\boldsymbol{H}}^1$ and associated with nodal DoFs, thus cannot yield a robust estimate when the diffusion coefficient has a sharp contrast [11,10]. The first equilibrated flux recovery in $\mathit{\boldsymbol{H}}(\mathrm{div})$ for virtual element methods is studied in [19]. While [19] recovers a flux by solving a mixed problem globally, we opt for a cheap and simple weighted averaging locally.

The major ingredient in our study is an $\mathit{\boldsymbol{H}}(\mathrm{div})$ -conforming virtual element space modified from the ones used in [8,5] (Section 2.2). Afterwards, an $\mathit{\boldsymbol{H}}(\mathrm{div})$ -conforming flux is recovered by a robust weighted averaging of the numerical flux, in which some unique properties of the tensor-product type element $\mathcal{Q}_k$ are exploited (Section 3). The a posteriori error estimator is constructed based on the projected flux elementwisely. The efficiency of the local error indicator is then proved by bounding it above by the residual-based error indicator (Section 4.1). The reliability of the recovery-based error estimator is then shown under certain assumptions (Section 4.2). These estimates are verified numerically by some common AMR benchmark problems implemented in a publicly available finite element software library $i$ FEM [16] (Section 5).

2. Preliminaries

2.1. Discretization and notations

If $\Omega$ is not a rectangle, $u$ is extended by 0 to an $\widetilde{\Omega}$ that is rectangular, therefore without loss of generality, we assume $\Omega$ is partitioned into a shape-regular $\mathcal{T} = \{K\}$ with rectangular elements, and $\alpha : = \alpha_K$ is assumed to be a piecewise, positive constant with respect to $\mathcal{T}$ . The weak form of problem (1) is then discretized in a tensor-product finite element space as follows,

$\begin{equation} (\alpha \nabla u_{\mathcal{T}}, \nabla v_{\mathcal{T}}) = (f, v_{\mathcal{T}}),\quad \forall v_{\mathcal{T}}\in \mathcal{Q}_k(\mathcal{T})\cap H_0^1(\Omega), \end{equation}$

(2)

in which the standard notation is opted. $(\cdot,\cdot)_D$ denotes the inner product on $L^2(D)$ , and $\left\Vert{\cdot}\right\Vert_D: = \sqrt{(\cdot,\cdot)_D}$ , with the subscript omitted when $D = \Omega$ . The discretization space is

$\mathcal{Q}_k(\mathcal{T}) : = \{v\in H^1(\Omega): v|_K\in \mathbb{Q}_k(K), \;\forall K\in \mathcal{T} \}.$

and on $K = [a,b]\times [c,d]$

$\mathbb{Q}_k(K) : = \mathbb{P}_{k,k}(K) = \big\{p(x)q(y), \; p\in \mathbb{P}_k([a,b]), q\in \mathbb{P}_k([c,d]) \big\},$

where $\mathbb{P}_k(D)$ stands for the degree no more than $k$ polynomial defined on $D$ . Henceforth, we shall simply denote $\mathcal{Q}_k(\mathcal{T}) = : \mathcal{Q}_k$ when no ambiguity arises.

On $K$ , the sets of 4 vertices, as well as 4 edges of the same generation with $K$ , are denoted by $\mathcal{N}_K$ and $\mathcal{E}_K$ , respectively. The sets of nodes and edges in $\mathcal{T}$ are denoted by $\mathcal{N}: = \bigcup_{K\in \mathcal{T}} \mathcal{N}_K$ and $\mathcal{E}: = \bigcup_{K\in \mathcal{T}} \mathcal{E}_K$ . A node $\mathit{\boldsymbol{z}} \in \mathcal{N}$ is called a hanging node if it is on $\partial K$ but is not counted as a vertex of $K\in \mathcal{T}$ , and we denote the set of hanging nodes as $\mathcal{N}_H$

$\begin{equation} \mathcal{N}_H : = \{\mathit{\boldsymbol{z}}\in \mathcal{N}: \exists K\in \mathcal{T}, \mathit{\boldsymbol{z}} \in \partial K \backslash \mathcal{N}_K \} \end{equation}$

(3)

Otherwise the node $\mathit{\boldsymbol{z}} \in \mathcal{N}$ is a regular node. If an edge $e\in \mathcal{E}$ contains at most $l$ hanging nodes, the partition $\mathcal{T}$ , as well as the element these hanging nodes lie on, is called $l$ -irregular.

For each edge $e\in \mathcal{E}$ , a unit normal vector $\mathit{\boldsymbol{n}}_e$ is fixed by specifying its direction pointing rightward for vertical edges, and upward for horizontal edges. If an exterior normal of an element on this edge shares the same orientation with $\mathit{\boldsymbol{n}}_e$ , then this element is denoted by $K_-$ , otherwise it is denoted by $K_+$ , i.e., $\mathit{\boldsymbol{n}}_e$ is pointing from $K_-$ to $K_+$ . The intersection of the closures of $K_+, K_-$ is always an edge $e\in \mathcal{E}$ . However, we note that by the definition in (3) it is possible that $e\in\mathcal{E}_{K_+}$ but not in $\mathcal{E}_{K_-}$ or vice versa, if there exists a hanging node on $e$ (see e.g., Figure 1). For any function or distribution $v$ well-defined on the two elements, define $\lbrack\lbrack {{v}} \rbrack\rbrack_{e} = v^- - v^+$ on an edge $e\not\subset\partial \Omega$ , in which $v^-$ and $v^+$ are defined in the limiting sense $v^{\pm} = \lim_{\epsilon\to 0^{\pm}}v(\mathit{\boldsymbol{x}}+\epsilon \mathit{\boldsymbol{n}}_e)$ for $\mathit{\boldsymbol{x}}\in e$ . If $e$ is a boundary edge, the function $v$ is extended by zero outside the domain to compute $\lbrack\lbrack {{v}} \rbrack\rbrack_{e}$ . Furthermore, the following notation denotes a weighted average of $v$ on edge $e$ for a weight $\gamma\in [0,1]$ ,

$\{v\}^{\gamma}_e : = \gamma v^- + (1-\gamma) v^+.$

Figure 1. For the upper right element

$K\in \mathcal{T}$ ,

$\mathcal{N}_K = \{z_2, z_4, z_5, z_6\}$ . For

$K\in \mathcal{T}_{\mathrm{poly}}$ ,

$\mathcal{N}_K = \{z_i\}_{i = 1}^7$ .

DownLoad: Full-Size Img PowerPoint

2.2. Virtual element spaces

In this subsection, the quadtree mesh $\mathcal{T}$ of interest is embedded into a polygonal mesh $\mathcal{T} \hookrightarrow \mathcal{T}_{\mathrm{poly}} = \{K_{\mathrm{poly}}\}$ . On any given quadrilateral element $K$ , for example we consider a $v_{\mathcal{T}} \in \mathbb{Q}_1(K)$ , it has 4 degrees of freedom associated with 4 nodes $\{z\}$ . Its numerical flux $-\alpha \nabla v_{\mathcal{T}} \cdot \mathit{\boldsymbol{n}}$ is well-defined on the 4 edges $\{e\}$ locally on $K$ , such that on each edge it is a polynomial defined on the whole edge, regardless of the number of hanging nodes on that edge. Using Figure 1 as an example, on the upper right element $K$ , $\nabla {v_{\cal T}}{|_K} \cdot {\boldsymbol{n}}|\mathop \rightharpoonup \limits_{{z_2}{z_6}} \in {\mathbb{P}_1}(\mathop \rightharpoonup \limits_{{z_2}{z_6}} )$ is a linear function in $y$ -variable.

For the embedded element $K_{\mathrm{poly}} \in \mathcal{T}_{\mathrm{poly}}$ , which geometrically coincides with $K$ , it includes all the hanging nodes, while the set of edges are formed accordingly as the edges of the cyclic graph of the vertices. We shall denote the set of all edges on $\mathcal{T}_{\mathrm{poly}}$ as $\mathcal{E}_{\mathrm{poly}}$ . Using Figure 1 as example, it is possible to define a flux on $K$ with piecewise linear normal component on $\mathop \rightharpoonup \limits_{{z_2}{z_6}}$ which now consists of three edges on $\partial K_{\mathrm{poly}}$ .

Subsequently, $K_{\mathrm{poly}}\in \mathcal{T}_{\mathrm{poly}}$ shall be denoted by simply $K\in \mathcal{T}_{\mathrm{poly}}$ in the context of flux recovery, and the notion $e\subset \partial K$ denotes an edge on the boundary of $K$ , which takes into account of the edges formed with one end point or both end points as the hanging nodes.

On $\mathcal{T}_{\mathrm{poly}}$ , we consider the following Brezzi-Douglas-Marini-type virtual element modification inspired by the ones used in [8,5]. The local space on a $K\in \mathcal{T}_{\mathrm{poly}}$ is defined as for $k\geq 1$

$\begin{equation} \begin{aligned} \mathcal{V}_{k}(K) : = \Big\{ & \mathit{\boldsymbol{\tau}}\in \mathit{\boldsymbol{H}}(\mathrm{div};K)\cap \mathit{\boldsymbol{H}}(\mathbf{rot};K): \\ & \nabla\cdot \mathit{\boldsymbol{\tau}}\in \mathbb{P}_{k-1}(K), \quad \nabla \times \mathit{\boldsymbol{\tau}} = 0, \\ & \mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e \in \mathbb{P}_{k}(e), \;\forall e\subset \partial K \Big\}. \end{aligned} \end{equation}$

(4)

An $\mathit{\boldsymbol{H}}(\mathrm{div})$ -conforming global space for recovering the flux is then

$\begin{equation} \mathcal{V}_k : = \bigl\{\mathit{\boldsymbol{\tau}}\in \mathit{\boldsymbol{H}}(\mathrm{div}): \mathit{\boldsymbol{\tau}}|_K \in \mathcal{V}_{k}(K), \;\;{\rm{ on }}\; K\in \mathcal{T}_{\mathrm{poly}} \bigr\}. \end{equation}$

(5)

Next we turn to define the degrees of freedom (DoFs) of this space. To this end, we define the set of scaled monomials $\mathbb{P}_{k}(e)$ on an edge $e$ . $e$ is parametrized by $[0,h_e]\ni s\mapsto \mathit{\boldsymbol{a}} + s\mathit{\boldsymbol{t}}_e$ , where $\mathit{\boldsymbol{a}}$ is the starting point of $e$ , and $\mathit{\boldsymbol{t}}_e$ is the unit tangential vector of $e$ . The basis set for $\mathbb{P}_{k}(e)$ is chosen as:

$\begin{equation} \mathbb{P}_{k}(e): = \operatorname{span}\left\{1, \frac{s-m_{e}}{h_{e}},\left(\frac{s-m_{e}}{h_{e}}\right)^{2}, \ldots,\left(\frac{s-m_{e}}{h_{e}}\right)^{k}\right\}, \end{equation}$

(6)

where $m_e = h_e/2$ representing the midpoint when using this parametrization. Similar to the edge case, $\mathbb{P}_{k}({K})$ 's basis set is chosen as follows (see e.g., [4]):

$\begin{equation} \mathbb{P}_{k}({K}): = \operatorname{span}\left\{m_{\alpha}(\mathit{\boldsymbol{x}}): = \left(\frac{\mathit{\boldsymbol{x}}-\mathit{\boldsymbol{x}}_{K}}{h_{K}}\right)^{\mathit{\boldsymbol{\alpha}}}, \quad|\mathit{\boldsymbol{\alpha}}| \leq k\right\}. \end{equation}$

(7)

The degrees of freedom (DoFs) are then set as follows for a $\mathit{\boldsymbol{\tau}}\in \mathcal{V}_{k}$ :

$\begin{equation} \begin{aligned} (\mathfrak{e})\; k\geq 1 & \quad \int_e (\mathit{\boldsymbol{\tau}}\cdot \mathit{\boldsymbol{n}}_e) m \,\mathrm{d} s, \quad \forall m \in \mathbb{P}_{k}(e), & \;{\rm{on }}\; \; e\subset \mathcal{E}_{\mathrm{poly}}. \\ (\mathfrak{i})\; k\geq 2 & \quad \int_K \mathit{\boldsymbol{\tau}}\cdot \nabla m\, \mathrm{d} \mathit{\boldsymbol{x}} , \quad \forall m\in \mathbb{P}_{k-1}({K})/\mathbb{R} & \;{\rm{on }}\; \; K\in \mathcal{T}_{\mathrm{poly}}. \end{aligned} \end{equation}$

(8)

Remark 1. We note that in our construction, the degrees of freedom to determine the curl of a VEM function originally in [8] are replaced by a curl-free constraint thanks to the flexibility to virtual element. The reason why we opt for this subspace is that the true flux $-\alpha \nabla u$ is locally curl-free since we have assumed that $\alpha$ is a piecewise constant. The unisolvency of the set of DoFs (8) including the curl-part can be found in [8]. While for the modified space (4), a simplified argument is in the proof of Lemma 7.3.

3. Flux recovery

As the data $f\in L^2(\Omega)$ , the true flux $\mathit{\boldsymbol{\sigma}} = -\alpha\nabla u\in \mathit{\boldsymbol{H}}(\mathrm{div})$ . Consequently, we shall seek a postprocessed flux $\mathit{\boldsymbol{\sigma}}_{\mathcal{T}}$ in $\mathcal{V}_{k}\subset \mathit{\boldsymbol{H}}(\mathrm{div})$ by specifying the DoFs in (8). Throughout this section, whenever considering an element $K\in \mathcal{T}$ , we treat it a polygon as $K\in \mathcal{T}_{\mathrm{poly}}$ .

3.1. Virtual element-based flux recovery

Consider $-\alpha_K \nabla u_{\mathcal{T}}$ which is the numerical flux on $K$ . We note that $-\alpha_K \nabla u_{\mathcal{T}}|_K \in \mathbb{P}_{k-1,k}(K) \times\mathbb{P}_{k,k-1}(K)$ . The normal flux on each edge $e\in \mathcal{E}_{\mathrm{poly}}$ is in $\mathbb{P}_{k}(e)$ as $n_e = (\pm 1, 0)$ and $x = \mathrm{const}$ on vertical edges, $n_e = (0, \pm 1)$ and $y = \mathrm{const}$ on horizontal edges. Therefore, the edge-based DoFs can be computed by a simple averaging thanks to the matching polynomial degrees of the numerical flux to the functions in $\mathcal{V}_k$ .

On each $e = \partial K_+ \cap \partial K_-$ , define

$\begin{equation} \left\{-\alpha \nabla u_{\mathcal{T}} \right\}^{\gamma_e}_e \cdot\mathit{\boldsymbol{n}}_e : = \Big(\gamma_e \left( -\alpha_{K_-} \nabla u_{\mathcal{T}}|_{K_-} \right) + (1-\gamma_e) \left( -\alpha_{K_+} \nabla u_{\mathcal{T}}|_{K_+} \right)\Big)\cdot \mathit{\boldsymbol{n}}_e, \end{equation}$

(9)

where

$\begin{equation} \gamma_e : = \frac{\alpha_{K_+}^{1/2}}{\alpha_{K_+}^{1/2} + \alpha_{K_-}^{1/2}}. \end{equation}$

(10)

First for both $k = 1$ and $k\geq 2$ cases, we set the normal component of the recovered flux is set as

$\begin{equation} \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}\cdot \mathit{\boldsymbol{n}}_e = \left\{-\alpha \nabla u_{\mathcal{T}} \right\}^{\gamma_e}_e \cdot\mathit{\boldsymbol{n}}_e. \end{equation}$

(11)

In the lowest order case $k = 1$ , $\nabla \cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}$ is a constant on $K$ by (4), thus the construction (11) alone, which consists the edge DoFs $(\mathfrak{e})$ in (8), can determine the divergence $\nabla \cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}$ in $K$ as follows

$\begin{equation} |K|\nabla\cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} = \int_K \nabla\cdot\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} \mathrm{d} \mathit{\boldsymbol{x}} = \int_{\partial K} \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} \cdot\mathit{\boldsymbol{n}}_{\partial K}\mathrm{d} s = \sum\limits_{e\subset \partial K} \int_e \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} \cdot\mathit{\boldsymbol{n}}_{\partial K}|_e \mathrm{d} s. \end{equation}$

(12)

If $k\geq 2$ , after the normal component (11) is set, furthermore on each $K$ , denote $\Pi_{k-1}$ stands for the $L^2$ -projection to $\mathbb{P}_{k-1}(K)$ , and we let

$\begin{equation} \nabla \cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} = \Pi_{k-1} f + c_K. \end{equation}$

(13)

The reason to add $c_K$ is that we have set the normal components of the recovered flux first without relying on the divergence information. While in general $\nabla \cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} \neq \Pi_{k-1} f$ as otherwise the divergence theorem will be rendered invalid in (12). As a result, an element-wise constant $c_K$ is added to ensure the compatibility of $\mathit{\boldsymbol{\sigma}}_{\mathcal{T}}$ locally on each $K$ . It is straightforward to verify that $c_K$ has the following form, and later we shall show that $c_K$ does not affect the efficiency as well as the reliability of the error estimates.

$\begin{equation} c_K = \frac{1}{ |K| }\left(-\int_K \Pi_{k-1} f \mathrm{d} \mathit{\boldsymbol{x}} + \sum\limits_{e\subset\partial K} \int_e \left\{-\alpha \nabla u_{\mathcal{T}} \right\}^{\gamma_e}_e \cdot\mathit{\boldsymbol{n}}_{\partial K}|_e \mathrm{d} s\right), \end{equation}$

(14)

Consequently for $k\geq 2$ , the set $(\mathfrak{i})$ of DoFs can be set as: $\forall q\in \mathbb{P}_{k-1}(K)$

$\begin{equation} \bigl(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}},\nabla q\bigr)_K = -\left(\Pi_{k-1} f + c_K, q\right)_K + \sum\limits_{e\subset \partial K} \left(\left\{-\alpha \nabla u_{\mathcal{T}} \right\}^{\gamma_e}_e \cdot\mathit{\boldsymbol{n}}_{\partial K}|_e,q\right)_{e}. \end{equation}$

(15)

3.2. Locally projected flux

To the end of constructing a computable local error indicator, inspired by the VEM formulation [8], the recovered flux is projected to a space with a much simpler structure. A local oblique projection ${\Pi}: \mathit{\boldsymbol{L}}^2(K) \to \nabla \mathbb{P}_k(K), \; \mathit{\boldsymbol{\tau}}\mapsto {\Pi} \mathit{\boldsymbol{\tau}}$ is defined as follows:

$\begin{equation} \bigl({\Pi} \mathit{\boldsymbol{\tau}},\nabla p\bigr)_K = \bigl(\mathit{\boldsymbol{\tau}},\nabla p\bigr)_K,\quad \forall p\in \mathbb{P}_k(K)/\mathbb{R}. \end{equation}$

(16)

Next we are gonna show that this projection operator can be straightforward computed for vector fields in $\mathcal{V}_k(K)$ .

3.2.1. $k=1$

When $k = 1$ , we can compute the right hand side of (16) as follows:

$\begin{equation} \bigl(\mathit{\boldsymbol{\tau}},\nabla p\bigr)_K = -\bigl(\nabla \cdot \mathit{\boldsymbol{\tau}}, p\bigr)_K + \bigl(\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}},p\bigr)_{\partial K}. \end{equation}$

(17)

By definition of the space (4) when $k = 1$ , $\nabla \cdot \mathit{\boldsymbol{\tau}}$ is a constant on $K$ and can be determined by edge DoFs $(\mathfrak{e})$ in (8) similar to (12). Moreover, $p|_e\in \mathbb{P}_1(e)$ , thus the boundary term can be evaluated using DoFs $(\mathfrak{e})$ in (8).

3.2.2. $k\geq 2$

When $k\geq 2$ , the right hand side of (16) can be evaluated following a similar procedure as (17), if we exploit the fact that $\nabla \cdot \mathit{\boldsymbol{\tau}}\in \mathbb{P}_{k-1}(K)$ , we have

$\begin{equation} \begin{aligned} \bigl(\mathit{\boldsymbol{\tau}},\nabla p\bigr)_K & = -\bigl(\nabla \cdot \mathit{\boldsymbol{\tau}}, \Pi_{k-1} p\bigr)_K + \bigl(\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}},p\bigr)_{\partial K} \\ & = \bigl(\mathit{\boldsymbol{\tau}}, \nabla \Pi_{k-1} p\bigr)_K + \bigl(\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}},p - \Pi_{k-1} p\bigr)_{\partial K}, \end{aligned} \end{equation}$

(18)

which can be evaluated using both DoF sets $(\mathfrak{e})$ and $(\mathfrak{i})$ .

4. A posteriori error estimation

Given the recovered flux $\mathit{\boldsymbol{\sigma}}_{\mathcal{T}}$ in Section 3, the recovery-based local error indicator $\eta_{\mathrm{flux},K}$ and the element residual $\eta_{\mathrm{res},K}$ as follows:

$\begin{equation} \begin{gathered} \eta_{\mathrm{flux},K} : = \big\Vert{\alpha^{-1/2}(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha \nabla u_{\mathcal{T}})} \big\Vert_K, \\ \;{\rm{ and }}\; \; \eta_{\mathrm{res},K} : = \big\Vert{\alpha^{-1/2}(f - \nabla\cdot\mathit{\boldsymbol{\sigma}}_{\mathcal{T}}) } \big\Vert_K, \end{gathered} \end{equation}$

(19)

then

$\begin{equation} \eta_K = \left\{ \begin{array}{lc} \eta_{\mathrm{flux},K} & \;{\rm{when }}\; k = 1, \\ \left(\eta_{\mathrm{flux},K}^2 + \eta_{\mathrm{res},K}^2 \right)^{1/2} & \;{\rm{when }}\; k\geq 2. \end{array} \right. \end{equation}$

(20)

A computable $\widehat{\eta}_{\mathrm{flux},K}$ is defined as:

$\begin{equation} \widehat{\eta}_{\mathrm{flux},K}: = \big\Vert{\alpha_K^{-1/2}{\Pi}(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}})} \big\Vert_K, \end{equation}$

(21)

with the oblique projection ${\Pi}$ defined in (16). The stabilization part $\widehat{\eta}_{\mathrm{stab},K}$ is

$\begin{equation} \widehat{\eta}_{\mathrm{stab},K}: = \big\vert{\alpha_K^{-1/2}(\operatorname{I}-{\Pi})(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}})} \big\vert_{S,K}. \end{equation}$

(22)

Here $|{\cdot}|_{S,K}: = \big(S_K(\cdot, \cdot)\big)^{1/2}$ is seminorm induced by the following stabilization

$\begin{equation} S_K(\mathit{\boldsymbol{v}}, \mathit{\boldsymbol{w}}): = \sum\limits_{e\subset \partial K} h_e\big( \mathit{\boldsymbol{v}}\cdot\mathit{\boldsymbol{n}}_e, \mathit{\boldsymbol{w}}\cdot \mathit{\boldsymbol{n}}_e \big)_e + \sum\limits_{\alpha\in \Lambda} (\mathit{\boldsymbol{v}},\nabla m_{\alpha})_K (\mathit{\boldsymbol{w}},\nabla m_{\alpha})_K, \end{equation}$

(23)

where $\Lambda$ is the index set for the monomial basis of $\mathbb{P}_{k-1}(K)/\mathbb{R}$ with cardinality $k(k+1)/2 - 1$ , i.e., the second term in (23) is dropped in the $k = 1$ case. We note that this is a slightly modified version of the standard stabilization for an $\mathit{\boldsymbol{H}}(\mathrm{div})$ -function in [8] as we have replaced the edge DoFs by an integral. In Section 7.1 it is shown that the integral-based stabilization still yields the crucial norm equivalence result.

The computable error estimator $\widehat{\eta}$ is then

$\begin{equation} \widehat{\eta}^2 = \begin{cases} \sum\limits_{K\in \mathcal{T}} \left(\widehat{\eta}_{\mathrm{flux},K}^2 + \widehat{\eta}_{\mathrm{stab},K}^2 \right) = : \sum\limits_{K\in \mathcal{T}} \widehat{\eta}_{K}^2 & \;{\rm{when }}\; k = 1, \\[5pt] \sum\limits_{K\in \mathcal{T}} \left(\widehat{\eta}_{\mathrm{flux},K}^2 + \widehat{\eta}_{\mathrm{stab},K}^2 + \eta_{\mathrm{res},K}^2\right) = : \sum\limits_{K\in \mathcal{T}} \widehat{\eta}_{K}^2 & \;{\rm{when }}\; k\geq 2. \end{cases} \end{equation}$

(24)

4.1. Efficiency

In this section, we shall prove the proposed recovery-based estimator $\widehat{\eta}_{K}$ is efficient by bounding it above by the residual-based error estimator. In the process of adaptive mesh refinement, only the computable $\widehat{\eta}_{K}$ is used as the local error indicator to guide a marking strategy of choice.

Theorem 4.1. Let $u_{\mathcal{T}}$ be the solution to problem (2), and $\widehat{\eta}_{\mathrm{flux},K}$ be the error indicator in (24). On $K\in \mathcal{T}_{\mathrm{poly}}$ , $\widehat{\eta}_{\mathrm{flux},K}$ can be locally bounded by the residual-based ones:

$\begin{equation} \widehat{\eta}_{\mathrm{flux},K}^2 \lesssim \mathrm{osc}(f; K)^2 + \eta_{\mathrm{elem},K}^2 + \eta_{\mathrm{edge},K}^2 , \end{equation}$

(25)

where

$\begin{align*} \mathrm{osc}(f;K) & = \alpha_K^{-1/2}h_K \big\Vert f- \Pi_{k-1} f \big\Vert_K, \\ \eta_{\mathrm{elem},K} &: = \alpha_K^{-1/2}h_K \big\Vert f + \nabla\cdot(\alpha \nabla u_{\mathcal{T}}) \big\Vert_K, \\ \mathit{\;{\rm{and}}\;}\; \eta_{\mathrm{edge},K} &: = \left(\sum\limits_{e\subset \partial K} \frac{h_e}{\alpha_K + \alpha_{K_e}} \big\Vert \lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot\mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {} }} \big\Vert_e^2\right)^{1/2}. \end{align*}$

In the edge jump term, $K_e$ is the element on the opposite side of $K$ with respect to an edge $e\subset \partial K$ . The constant depends on $k$ and the number of edges on $\partial K$ .

Proof. Let $\alpha^{-1}_K{\Pi}(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}) = :\nabla p$ on $K$ , then $p\in \mathbb{P}_k(K)/\mathbb{R}$ and we have

$\begin{equation} \begin{aligned} \widehat{\eta}_{\mathrm{flux},K}^2 & = \bigl({\Pi}(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}), \nabla p \bigr)_K = \bigl(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}, \nabla p \bigr)_K \\ & = -\bigl(\nabla \cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}}), p \bigr)_K + \sum\limits_{e\subset \partial K}\int_e \big( \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}\big)\cdot \mathit{\boldsymbol{n}}_{\partial K}\big|_{e} \, p \, \mathrm{d} s. \end{aligned} \end{equation}$

(26)

By (11), without loss of generality we assume $K = K_-$ (the local orientation of $e$ agrees with the global one, i.e., $\mathit{\boldsymbol{n}}_{\partial K}\big|_{e} = \mathit{\boldsymbol{n}}_e$ ), and $K_e = K_+$ which is the element opposite to $K$ with respect to $e$ , and $\gamma_e : = {\alpha_{K_e}^{1/2}}/({\alpha_{K_e}^{1/2} + \alpha_{K}^{1/2}})$ , we have on edge $e\subset\partial K$

$\begin{equation} \begin{aligned} \big( \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}\big)\cdot \mathit{\boldsymbol{n}}_e & = \Big( (1-\gamma_e) \alpha_{K} \nabla u_{\mathcal{T}}|_K - (1-\gamma_e)\alpha_{K_e} \nabla u_{\mathcal{T}}|_{K_e} \Big)\cdot \mathit{\boldsymbol{n}}_e \\ & = \frac{\alpha_{K}^{1/2}}{\alpha_{K}^{1/2} + \alpha_{K_e}^{1/2}}\lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot \mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {e} }}. \end{aligned} \end{equation}$

(27)

The boundary term in (26) can be then rewritten as

$\begin{equation} \begin{aligned} & \int_e \big( \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}\big)\cdot \mathit{\boldsymbol{n}}_e \,p\, \mathrm{d} s \\ = &\;\int_e \frac{1}{\alpha_{K}^{1/2} + \alpha_{K_e}^{1/2}}\lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot \mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {e} }} \,\alpha_{K}^{1/2}p\, \mathrm{d} s \\ \lesssim & \; \frac{1}{(\alpha_{K}+ \alpha_{K_e})^{1/2}} h_e^{1/2} \big\Vert \lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot\mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {} }} \big\Vert_e \alpha_{K}^{1/2} h_e^{-1/2} \left\Vert{p}\right\Vert_e. \end{aligned} \end{equation}$

(28)

By a trace inequality on an edge of a polygon (Lemma 7.1), and the Poincaré inequality for $p\in \mathbb{P}_k(K)/\mathbb{R}$ , we have,

$h_e^{-1/2}\|p\|_e \lesssim h_K^{-1} \|p\|_K + \|\nabla p\|_K \lesssim \|\nabla p\|_K.$

As a result,

$\sum\limits_{e\subset \partial K}\int_e \big( \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}}\big)\cdot \mathit{\boldsymbol{n}}_e \,p\, \mathrm{d} s \lesssim \eta_{\mathrm{edge},K}\, \alpha_{K}^{1/2} \left\Vert{\nabla p}\right\Vert_e = \eta_{\mathrm{edge},K} \,\widehat{\eta}_{\mathrm{flux},K}.$

For the bulk term on $K$ 's in (26), when $k = 1$ , by (12), the representation in (28), and the Poincaré inequality for $p\in \mathbb{P}_k(K)/\mathbb{R}$ again with $h_K\simeq |K|^{1/2}$ , we have

$\begin{aligned} &-\bigl(\nabla \cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}}), p \bigr)_K \leq \left|\nabla \cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}}) \right| |K|^{1/2} \left\Vert{p}\right\Vert_K \\ \leq & \; \frac{1}{ |K|^{1/2}}\left|\int_K \nabla \cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}}) \, \mathrm{d} \mathit{\boldsymbol{x}}\right| \left\Vert{p}\right\Vert_K \\ = & \; \frac{1}{ |K|^{1/2}} \left|\sum\limits_{e\subset \partial K} \int_{e} (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}})\cdot \mathit{\boldsymbol{n}}_e \, \mathrm{d} s\right| \left\Vert{p}\right\Vert_K \\ \leq & \; \left(\sum\limits_{e\subset \partial K} \frac{1}{\alpha_{K}^{1/2} + \alpha_{K_e}^{1/2}} \left\Vert{\lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot \mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {} }}}\right\Vert_e \,\alpha_{K}^{1/2}h_e \right) \left\Vert{\nabla p}\right\Vert \\ \lesssim &\; \eta_{\mathrm{edge},K} \, \widehat{\eta}_{\mathrm{flux},K}. \end{aligned}$

When $k\geq 2$ , by (13),

$\begin{equation} \begin{aligned} & -\bigl(\nabla \cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}}), p \bigr)_K = -\bigl(\Pi_{k-1} f + c_K + \nabla\cdot(\alpha_K \nabla u_{\mathcal{T}}), p \bigr)_K \\ \leq & \; \left( \big\Vert f- \Pi_{k-1} f \big\Vert_K + \big\Vert f + \nabla\cdot(\alpha \nabla u_{\mathcal{T}}) \big\Vert_K + |c_K| |K|^{1/2}\right)\left\Vert{p}\right\Vert_K. \end{aligned} \end{equation}$

(29)

The first two terms can be handled by combining the weights $\alpha^{-1/2}$ and $h_K$ from $\left\Vert{p}\right\Vert_K\leq h_K \left\Vert{\nabla p}\right\Vert_K$ . For $c_K$ , it can be estimated straightforwardly as follows

$\begin{equation} \begin{aligned} c_K |K|^{1/2} & = \frac{1}{ |K|^{1/2} }\Big(-\int_K (\Pi_{k-1} f -f) \mathrm{d} \mathit{\boldsymbol{x}} - \int_K \big(f + \nabla\cdot (\alpha \nabla u_{\mathcal{T}})\big) \mathrm{d} \mathit{\boldsymbol{x}} \\ &\quad + \int_K \nabla\cdot (\alpha \nabla u_{\mathcal{T}}) \mathrm{d} \mathit{\boldsymbol{x}} + \sum\limits_{e\subset\partial K} \int_e \left\{-\alpha \nabla u_{\mathcal{T}} \right\}^{\gamma_e}_e \cdot\mathit{\boldsymbol{n}}_e \mathrm{d} s\Big) \\ &\leq \big\Vert f- \Pi_{k-1} f \big\Vert_K + \big\Vert f + \nabla\cdot(\alpha \nabla u_{\mathcal{T}}) \big\Vert_K \\ & \quad + \frac{1}{ |K|^{1/2}}\sum\limits_{e\subset\partial K} \int_e (\alpha_K \nabla u_{\mathcal{T}} - \left\{\alpha \nabla u_{\mathcal{T}} \right\}^{\gamma_e}_e) \cdot\mathit{\boldsymbol{n}}_e \mathrm{d} s \\ & \leq \big\Vert f- \Pi_{k-1} f \big\Vert_K + \big\Vert f + \nabla\cdot(\alpha \nabla u_{\mathcal{T}}) \big\Vert_K \\ & \quad + \sum\limits_{e\subset\partial K} \frac{\alpha_{K}^{1/2}}{\alpha_{K}^{1/2} + \alpha_{K_e}^{1/2}} \left\Vert{\lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot \mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {} }}}\right\Vert_e . \end{aligned} \end{equation}$

(30)

The two terms on $K$ can be treated the same way with the first two terms in (29) while the edge terms are handled similarly as in the $k = 1$ case. As a result, we have shown

$-\bigl(\nabla \cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} +\alpha_K \nabla u_{\mathcal{T}}), p \bigr)_K \lesssim \Big( \mathrm{osc}(f; K) + \eta_{\mathrm{elem},K} + \eta_{\mathrm{edge},K} \Big) \alpha_K^{1/2}\left\Vert{\nabla p}\right\Vert$

and the theorem follows.

Theorem 4.2. Under the same setting with Theorem 4.1, let $\widehat{\eta}_{\mathrm{stab},K}$ as the estimator in (22), we have

$\begin{equation} \widehat{\eta}_{\mathrm{stab},K}^2 \lesssim \mathrm{osc}(f; K)^2 + \eta_{\mathrm{elem},K}^2 + \eta_{\mathrm{edge},K}^2 , \end{equation}$

(31)

The constant depends on $k$ and the number of edges on $\partial K$ .

Proof. This theorem follows directly from the norm equivalence Lemma 7.3:

$\big\vert{\alpha_K^{-1/2}(\operatorname{I}-{\Pi})(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}})} \big\vert_{S,K} \lesssim \big\vert{\alpha_K^{-1/2}(\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K \nabla u_{\mathcal{T}})} \big\vert_{S,K},$

while evaluating the DoFs $(\mathfrak{e})$ and $(\mathfrak{i})$ using (11) and (15) reverts us back to the proof of Theorem 4.1.

Theorem 4.3. Under the same setting with Theorem 4.1, on any $K\in \mathcal{T}_{\mathrm{poly}}$ with $\omega_K$ defined as the collection of elements in $\mathcal{T}$ which share at least 1 vertex with $K$

$\begin{equation} \widehat{\eta}_{K} \lesssim \mathrm{osc}(f;K) + \big\Vert \alpha^{1/2}\nabla (u-u_{\mathcal{T}})\big\Vert_{\omega_K}, \end{equation}$

(32)

with a constant independent of $\alpha$ , but dependent on $k$ and the maximum number of edges in $K\in \mathcal{T}_{\mathrm{poly}}$ .

Proof. This is a direct consequence of Theorem 4.1 and 4.2 and the fact that the residual-based error indicator is efficient by a common bubble function argument.

4.2. Reliability

In this section, we shall prove that the computable error estimator $\widehat{\eta}$ is reliable under two common assumptions in the a posteriori error estimation literature. For the convenience of the reader, we rephrase them here using a "layman" description, for more detailed and technical definition please refer to the literature cited.

Assumption 1 ( $\mathcal{T}$ is $l$ -irregular [14]). Any given $\mathcal{T}$ is always refined from a mesh with no hanging nodes by a quadsecting red-refinement. For any two neighboring elements in $\mathcal{T}$ , the difference in their refinement levels is $\leq l$ for a uniformly bounded constant $l$ , i.e., for any edge $e\in \mathcal{E}$ , it has at most $l$ hanging nodes.

By Assumption 1, we denote the father $1$ -irregular mesh of $\mathcal{T}$ as $\mathcal{T}_1$ . On $\mathcal{T}_1$ , a subset of all nodes is denoted by $\mathcal{N}_{1}$ , which includes the regular nodes $\mathcal{N}_R$ on $\mathcal{T}_1$ , as well as $\mathcal{N}_E$ as the set of end points of edges with a hanging node as the midpoint. By [14,Theorem 2.1], there exists a set of bilinear nodal bases $\{\phi_z\}$ associated with $z\in \mathcal{N}_{1}$ , such that $\{\phi_z\}$ form a partition of unity and can be used to construct a Clément-type quasi-interpolation. Furthermore, the following assumption assures that the Clément-type quasi-interpolant is robust with respect to the coefficient distribution on a vertex patch, when taking nodal DoFs as a weighted average.

Assumption 2 (Quasi-monotonicity of $\alpha$ [6]). On $\mathcal{T}$ , let $\phi_z$ be the bilinear nodal basis associated with $z\in \mathcal{N}_{1}$ , with $\omega_{z} : = \operatorname{supp} \phi_z$ . For every element $K \subset \omega_{z}, K\in \mathcal{T}$ , there exists a simply connected element path leading to $\omega_{m(z)}$ , which is a Lipschitz domain containing the elements where the piecewise constant coefficient $\alpha$ achieves the maximum (or minimum) on $\omega_{z}$ .

Denote

$\begin{equation} \pi_{z} v = \left\{\begin{array}{ll} \frac{\int_{\omega_{z} \cap \omega_{m(z)}} v \phi_z }{\int_{\omega_{z} \cap \omega_{m(z)}} \phi_z }& {\rm { if }}\; \mathit{\boldsymbol{z}} \in \Omega, \\ 0 & {\rm { if }}\; \mathit{\boldsymbol{z}} \in \partial \Omega. \end{array}\right. \end{equation}$

(33)

We note that if $\alpha$ is a constant on $\omega_z$ , $(1, \left(v-\pi_{z} v\right) \phi_{z})_{\omega_z} = 0$ . A quasi-interpolation $\mathcal{I}: L^2(\Omega) \to \mathcal{Q}_1(\mathcal{T}_1)$ can be defined as

$\begin{equation} \mathcal{I} v : = \sum\limits_{z\in \mathcal{N}_1} (\pi_z v)\phi_z. \end{equation}$

(34)

Lemma 4.4 (Estimates for $\pi_z$ and $\mathcal{I}$ ). Under Assumption 1 and 2, the following estimates hold for any $v\in H^1(\omega_K)$

$\begin{equation} \alpha_K^{1/2} h_K^{-1} \left\Vert{v - \mathcal{I}v}\right\Vert_{K} + \alpha_K^{1/2} \left\Vert{\nabla \mathcal{I}v}\right\Vert_{K} \lesssim \big\Vert \alpha^{1/2} \nabla v\big\Vert_{\omega_K}, \end{equation}$

(35)

and for $\mathit{\boldsymbol{z}}\in \mathcal{N}_1$

$\begin{equation} \sum\limits_{K\subset\omega_z} h_{z}^{-2} \|\alpha^{1/2}(v-\pi_{z} v)\phi_z\|_{K}^2 \lesssim \big\Vert \alpha^{1/2} \nabla v\big\Vert_{\omega_z}^2, \end{equation}$

(36)

in which $h_z : = \max_{K\subset\omega_z} h_K$ , and here $\omega_K$ denotes the union of elements in $\mathcal{T}_1$ sharing at least a node (hanging or regular) with $K$ .

Proof. The estimate for $\pi_z$ follows from [6,Lemma 2.8]. For $\mathcal{I}$ , its error estimates and stability only rely on the partition of unity property of the nodal basis set $\{\phi_z\}$ (see e.g., [27]), therefore the proof follows the same argument with the ones used on triangulations in [6,Lemma 2.8].

Denotes the subset of nodes $\{\mathit{\boldsymbol{z}}\}\subset \mathcal{N}_1$ (i) on the boundary as $\mathcal{N}_{\partial \Omega}$ and (ii) with the coefficient $\alpha$ on patch $\omega_z$ as $\mathcal{N}_I$ . For the lowest order case, we need the following oscillation term for $f$

$\begin{equation} \begin{aligned} \mathrm{osc}(f;\mathcal{T})^2 : = & \sum\limits_{z \in \mathcal{N}_1\cap( \mathcal{N}_{\partial \Omega} \cup \mathcal{N}_I)} h_{z}^{2} \big\|\alpha^{-1/2} f\big\|_{\omega_{z}}^2 \\ +& \sum\limits_{z \in \mathcal{N}_1 \backslash ( \mathcal{N}_{\partial \Omega} \cup \mathcal{N}_I)} h_{z}^{2} \big\|\alpha^{-1/2} (f - f_z)\big\|_{\omega_{z}}^2, \end{aligned} \end{equation}$

(37)

with $f_z : = {\int_{\omega_{z}} v \phi_z }/{\int_{\omega_{z}} \phi_z }$ .

Theorem 4.5. Let $u_{\mathcal{T}}$ be the solution to problem (2), and $\widehat{\eta}$ be the computable error estimator in (24), under Assumption 2 and 1, we have for $k = 1$

$\begin{equation} \big\Vert{\alpha^{1/2}\nabla (u - u_{\mathcal{T}})}\big\Vert \lesssim \left(\widehat{\eta}^2 +\mathrm{osc}(f;\mathcal{T})^2 \right)^{1/2}. \end{equation}$

(38)

For $k\geq 2$ ,

$\begin{equation} \big\Vert{\alpha^{1/2}\nabla (u - u_{\mathcal{T}})}\big\Vert \lesssim \widehat{\eta}, \end{equation}$

(39)

where the constant depends on $l$ and $k$ .

Proof. Let $\varepsilon : = u - u_{\mathcal{T}} \in H^1_0(\Omega)$ , and $\mathcal{I}\varepsilon\in \mathcal{Q}_1(\mathcal{T}_1) \subset \mathcal{Q}_1(\mathcal{T})$ be the quasi-interpolant in (34) of $\varepsilon$ , then by the Galerkin orthogonality, $\alpha\nabla u + \mathit{\boldsymbol{\sigma}}_{\mathcal{T}} \in \mathit{\boldsymbol{H}}(\mathrm{div})$ , the Cauchy-Schwarz inequality, and the interpolation estimates (35), we have for $k\geq 2$ ,

$\begin{align*} &\big\Vert{\alpha^{1/2}\nabla \varepsilon}\big\Vert^2 = \big(\alpha\nabla (u - u_{\mathcal{T}}), \nabla(\varepsilon -\mathcal{I}\varepsilon)\big) \\ = & \big(\alpha \nabla u + \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}, \nabla(\varepsilon -\mathcal{I}\varepsilon)\big) - \big(\alpha \nabla u_{\mathcal{T}} + \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}, \nabla(\varepsilon -\mathcal{I}\varepsilon)\big) \\ = & \big(f -\nabla\cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}, \varepsilon -\mathcal{I}\varepsilon\big) - \big(\alpha \nabla u_{\mathcal{T}} + \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}, \nabla(\varepsilon -\mathcal{I}\varepsilon)\big) \\ \leq& \left(\sum\limits_{K\in\mathcal{T}} \alpha_K^{-1}h_K^2\left\Vert{f - \nabla\cdot\mathit{\boldsymbol{\sigma}}_{\mathcal{T}}}\right\Vert_{K}^2 \right)^{1/2} \left(\sum\limits_{K\in\mathcal{T}} \alpha_K h_K^{-2}\left\Vert{\varepsilon - \mathcal{I}\varepsilon}\right\Vert_{K}^2 \right)^{1/2} \\ & \left(\sum\limits_{K\in\mathcal{T}} \alpha_K^{-1}\big\Vert{\alpha \nabla u_{\mathcal{T}} + \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}} \big\Vert_{K}^2 \right)^{1/2} \left(\sum\limits_{K\in\mathcal{T}} \alpha_K \left\Vert{\nabla(\varepsilon - \mathcal{I}\varepsilon)}\right\Vert_{K}^2 \right)^{1/2}. \\ & \lesssim \left(\sum\limits_{K\in\mathcal{T}} (\eta_{\mathrm{res},K}^2+\eta_{\mathrm{flux},K}^2) \right)^{1/2} \left(\sum\limits_{K\in\mathcal{T}} \big\Vert{\alpha^{1/2}\nabla \varepsilon}\big\Vert_{\omega_K} \right)^{1/2}. \end{align*}$

Applying the norm equivalence of $\eta$ to $\widehat{\eta}$ by Lemma 7.3, as well as the fact that the number of elements in $\omega_K$ is uniformly bounded by Assumption 1, yields the desired estimate.

When $k = 1$ , the residual term on $K$ can be further split thanks to $\Delta \mathbb{Q}_1(K) = \{0\}$ . First we notice that by the fact that $\{\phi_z\}$ form a partition of unity,

$\begin{equation} (f, \varepsilon-\mathcal{I}\varepsilon) = \sum\limits_{z \in \mathcal{N}_1} \sum\limits_{K \subset \omega_{z}} \big(f,\left(\varepsilon-\pi_{z} \varepsilon\right) \phi_{z}\big)_{K}, \end{equation}$

(40)

in which a patch-wise constant $f_z$ (weighted average of $f$ ) can be further inserted by the definition of $\pi_z$ (33) if $\alpha$ is a constant on $\omega_z$ . Therefore, by the assumption of $\alpha_K$ being a piecewise constant, splitting (40), we have

$\begin{align*} & \big(f -\nabla\cdot \mathit{\boldsymbol{\sigma}}_{\mathcal{T}}, \varepsilon - \mathcal{I}\varepsilon\big) = \big(f, \varepsilon - \mathcal{I}\varepsilon\big) - \big(\nabla\cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K\nabla u_{\mathcal{T}}), \varepsilon - \mathcal{I}\varepsilon\big) \\ = & \sum\limits_{z \in \mathcal{N}} \sum\limits_{K \subset \omega_{z}} \big(f,\left(\varepsilon-\pi_{z} \varepsilon\right) \phi_{z}\big)_{K} - \big(\nabla\cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K\nabla u_{\mathcal{T}}), \varepsilon - \mathcal{I}\varepsilon\big) \\ \leq & \left(\mathrm{osc}(f;\mathcal{T})^2 \right)^{1/2} \left(\sum\limits_{z \in \mathcal{N}_1} \sum\limits_{K\subset\omega_z} h_{z}^{-2} \|\alpha^{1/2}(\varepsilon-\pi_{z} \varepsilon )\phi_z\|_{K}^2\right)^{1/2} \\ & \; + \left(\sum\limits_{K\in\mathcal{T}} \alpha_K^{-1} h_K^{2} \big\Vert \nabla\cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K\nabla u_{\mathcal{T}}) \big\Vert_{K}^2 \right)^{1/2} \left(\sum\limits_{K\in\mathcal{T}} \alpha_K h_K^{-2}\left\Vert{\varepsilon - \mathcal{I}\varepsilon}\right\Vert_{K}^2 \right)^{1/2}. \end{align*}$

Applied an inverse inequality in Lemma 7.2 on $\big\Vert \nabla\cdot (\mathit{\boldsymbol{\sigma}}_{\mathcal{T}} + \alpha_K\nabla u_{\mathcal{T}}) \big\Vert_{K}$ and the projection estimate for $\pi_z$ (36), the rest follows the same argument with the one used in the $k\geq 2$ case.

5. Numerical examples

The numerics is prepared using the bilinear element for common AMR benchmark problems. The codes for this paper are publicly available on https://github.com/lyc102/ifem implemented using $i$ FEM [16]. The linear algebraic system on an $l$ -irregular quadtree is implemented following the conforming prolongation approach [15] by $\mathbf{P}^{\top}\mathbf{A} \mathbf{P}\mathbf{u} = \mathbf{P}^{\top}\mathbf{f}$ , where $\mathbf{A}$ is the locally assembled stiffness matrix for all nodes in $\mathcal{N}$ , $\mathbf{u}$ and $\mathbf{f}$ are the solution vector associated with $\mathcal{N}_R$ and load vector associated with $\mathcal{N}$ , respectively. $\mathbf{P} = (\mathbf{I}, \mathbf{W})^{\top}: \mathbb{R}^{\dim \mathcal{N}_R}\to \mathbb{R}^{\dim \mathcal{N}}$ is a prolongation operator mapping conforming $H^1$ -bilinear finite element function defined on regular nodes to all nodes, the weight matrix $\mathbf{W}$ is assembled locally by a recursive $k$ NN query in $\mathcal{N}_H$ , while the polygonal mesh data structure embedding is automatically built during constructing $\mathbf{P}$ . For details we refer the readers to https://github.com/lyc102/ifem/tree/master/research/polyFEM.

The adaptive finite element (AFEM) iterative procedure is following the standard

$\texttt{SOLVE}\longrightarrow \texttt{ESTIMATE} \longrightarrow \texttt{MARK} \longrightarrow \texttt{REFINE}.$

The linear system is solved by MATLAB $\texttt{mldivide}$ . In $\texttt{MARK}$ , the Dorfler $L^2$ -marking is used with the local error indicator $\widehat{\eta}_{K}$ in that the minimum subset $\mathcal{M} \subset \mathcal{T}$ is chosen such that

$\sum\limits_{K \in \mathcal{M}} \widehat{\eta}^{2}_{K} \geq \theta \sum\limits_{K \in \mathcal{T}} \widehat{\eta}^{2}_{K}, \quad \rm { for } \theta \in(0,1).$

Throughout all examples, we fix $\theta = 0.3$ . $\mathcal{T}$ is refined by a red-refinement by quadsecting the marked element afterwards. For comparison, we compute the standard residual-based local indicator for $K\in \mathcal{T}_{\mathrm{poly}}$

$\eta_{\;{\rm{Residual}}\;,K}^2 : = \alpha_K^{-1}h_K^2 \big\Vert f + \nabla\cdot(\alpha \nabla u_{\mathcal{T}}) \big\Vert_K^2 + \frac{1}{2}\sum\limits_{e\subset \partial K} \frac{h_e}{\alpha_K + \alpha_{K_e}} \big\Vert \lbrack\lbrack {{\alpha \nabla u_{\mathcal{T}}\cdot\mathit{\boldsymbol{n}}_e}} \rbrack\rbrack_{{ {} }} \big\Vert_e^2,$

Let $\eta_{\;{\rm{Residual}}\;}^2 = \sum_{K\in \mathcal{T}} \eta_{\;{\rm{Residual}}\;,K}^2$ . The residual-based estimator $\eta_{\;{\rm{Residual}}\;}$ is merely computed for comparison purpose and not used in marking. The AFEM procedure stops when the relative error reaches a threshold. The effectivity indices for different estimators are compared

$\;{\rm{effectivity index}}\; : = {\eta}/{\big\Vert{\alpha^{1/2}\nabla \varepsilon }\big\Vert}, \quad \;{\rm{ where }}\;\; \varepsilon: = u - u_{\mathcal{T}}, \; \eta = \eta_{\;{\rm{Residual}}\;} \;{\rm{ or }}\; \widehat{\eta},$

i.e., the closer to 1 the effectivity index is, the more accurate this estimator is to measure the error of interest. We use an order $5$ Gaussian quadrature to compute $\Vert{\alpha^{1/2} \nabla (u - u_{\mathcal{T}}) } \Vert$ elementwisely. The orders of convergence for various $\eta$ 's and $\Vert{\alpha^{1/2} \nabla (u - u_{\mathcal{T}}) } \Vert$ are computed, for which $r_{\eta}$ and $r_{\;{\rm{err}}\;}$ are defined as the slope for the linear fitting of $\ln \eta_n$ and $\ln \Vert\alpha^{1/2}\nabla(u - u_{\mathcal{T},n}) \Vert$ in the asymptotic regime,

$\ln \eta_n \sim -r_{\eta} \ln N_n + c_1,\quad\;{\rm{and}}\;\quad \ln \Vert{\alpha^{1/2} \nabla (u - u_{\mathcal{T}}) } \Vert \sim -r_{\;{\rm{err}}\;} \ln N_n + c_2,$

where the subscript $n$ stands for the number of iteration in the AFEM cycles, $N_n: = \# ( \mathcal{N}_R\backslash \mathcal{N}_{\partial \Omega})$ . $r_{\eta}$ and $r_{\;{\rm{err}}\;}$ are considered optimal when being close to $1/2$ .

5.1. L-shaped domain

In this example, a standard AMR benchmark on the L-shaped domain is tested. The true solution $u = r^{2/3}\sin(2\theta/3)$ in polar coordinates on $\Omega = (-1,1) \times(-1,1) \backslash[0,1) \times(-1,0]$ . The AFEM procedure stops if the relative error has reached $0.01$ . The adaptively refined mesh can be found in Figure 2a. While both estimators show optimal rate of convergence in Figure 2b, the effectivity index for $\eta_{\mathrm{Residual}}$ is $4.52$ , and is $2.24$ for $\widehat{\eta}$ .

Figure 2. The result of the L-shape example. (a) The adaptively refined mesh with 1014 DoFs. (b) Convergence in Example 1.

DownLoad: Full-Size Img PowerPoint

5.2. A circular wave front

The solution $u = \tan^{-1}(\alpha(r-r_0))$ is defined on $\Omega = (0,1)^2$ with $r: = \sqrt{(x+0.05)^2+(y+0.05)^2}$ , $\alpha = 100$ , and $r_0 = 0.7$ . The true solution shows a sharp transition layer (Figure 3a). The result of the convergence can be found in Figure 3b. In this example, the AFEM procedure stops if the relative error has reached $0.05$ . Additionally, we note that by allowing $l$ -irregular ( $l\geq 2$ ), the AMR procedure shows to be more efficient toward capturing the singularity of the solution. A simple comparison can be found in Figure 4. The effectivity indices for $\eta_{\mathrm{Residual}}$ and $\widehat{\eta}$ are $5.49$ and $2.08$ , respectively.

Figure 3. The result of the circular wave front example. (a)

$u_{\mathcal{T}}$ on a 3-irregular mesh with

$\# \mathrm{DoFs} = 1996$ , the relative error is

$14.3\%$ . (b) Convergence in Example 2.

DownLoad: Full-Size Img PowerPoint

Figure 4. Comparison of the adaptively refined meshes. (a) 1-irregular mesh,

$\# \mathrm{DoFs} = 1083$ , the relative error is

$21.8\%$ . (b) 4-irregular mesh, and

$\# \mathrm{DoFs} = 1000$ , the relative error is

$17.8\%$ .

DownLoad: Full-Size Img PowerPoint

5.3. Kellogg benchmark

This example is a common benchmark test problem introduced in [9], see also [17,12]) for elliptic interface problems. The true solution $u = r^{\gamma}\mu(\theta)$ is harmonic in four quadrants, and $\mu(\theta)$ takes different values within four quadrants:

$\mu(\theta) = \left\{\begin{array}{ll} \cos ((\pi / 2-\delta) \gamma) \cdot \cos ((\theta-\pi / 2+\rho) \gamma) & {\rm { if }}\; 0 \leq \theta \leq \pi / 2 \\ \cos (\rho \gamma) \cdot \cos ((\theta-\pi+\delta) \gamma) & {\rm { if }}\; \pi / 2 \leq \theta \leq \pi \\ \cos (\delta \gamma) \cdot \cos ((\theta-\pi-\rho) \gamma) & {\rm { if }}\; \pi \leq \theta < 3 \pi / 2 \\ \cos ((\pi / 2-\rho) \gamma) \cdot \cos ((\theta-3 \pi / 2-\delta) \gamma) & {\rm { if }}\; 3 \pi / 2 \leq \theta \leq 2 \pi \end{array}\right.$

While $\alpha = R$ in the first and third quadrants, and $\alpha = 1$ in the second and fourth quadrants, and the true flux $\alpha \nabla u$ is glued together using $\mathit{\boldsymbol{H}}(\mathrm{div})$ -continuity conditions. We choose the following set of coefficients for $u$

$\gamma = 0.1,\;\; R \approx 161.4476387975881, \;\; \rho = \pi / 4,\;\; \delta \approx -14.92256510455152,$

By this choice, this function is very singular near the origin as the maximum regularity it has is $H^{1+\gamma}_{loc}(\Omega\backslash\{\mathit{\boldsymbol{0}}\})$ . Through an integration by parts, it can be computed accurately that $\|\alpha^{1/2}\nabla u\| \approx 0.56501154$ . For detailed formula and more possible choices of the parameters above, we refer the reader to [17].

The AFEM procedure for this problem stops when the relative error reaches $0.05$ , and the resulting mesh and finite element approximation during the refinement can be found in Figure 5, and the AFEM procedure shows optimal rate of convergence in Figure 6. The effectivity index for $\eta_{\mathrm{Residual}}$ is $2.95$ , and $1.33$ for $\widehat{\eta}$ .

Figure 5. The result of the Kellogg example. (a) The adaptively refined mesh with

$\# \mathrm{DoFs} = 2001$ on which the energy error is

$0.0753$ , this number is roughly

$75\%$ of the number of DoFs needed to achieve the same accuracy if using conforming linear finite element on triangular grid (see [17,Section 4]). (b) The finite element approximation with

$\# \mathrm{DoFs} = 1736$ .

DownLoad: Full-Size Img PowerPoint

Figure 6. The convergence result of the Kellogg example.

DownLoad: Full-Size Img PowerPoint

6. Conclusion

A postprocessed flux with the minimum $\mathit{\boldsymbol{H}}(\mathrm{div})$ continuity requirement is constructed for tensor-product type finite element. The implementation can be easily ported to finite element on quadtree to make use the vast existing finite element libraries in the engineering community. Theoretically, the local error indicator is efficient, and the global estimator is shown to be reliable under the assumptions that (i) the mesh has bounded irregularities, and (ii) the diffusion coefficient is a quasi-monotone piecewise constant. Numerically, we have observed that both the local error indicator and the global estimator are efficient and reliable (in the asymptotic regime), respectively. Moreover, the recovery-based estimator is more accurate than the residual-based one.

However, we do acknowledge that the technical tool involving interpolation is essentially limited to $1$ -irregular meshes in reliability. A simple weighted averaging has restrictions and is hard to generalize to $hp$ -finite elements, or discretization on curved edges/isoparametric elements. Nevertheless, we have shown that the flexibility of the virtual element framework allows further modification of the space in which we perform the flux recovery to cater the needs.

Acknowledgments

The author is grateful for the constructive advice from the anonymous reviewers. This work was supported in part by the National Science Foundation under grants DMS-1913080 and DMS-2136075, and no additional revenues are related to this work.

7. Appendix

7.1. Inverse estimates and the norm equivalence of a virtual element function

Unlike the identity matrix stabilization commonly used in most of the VEM literature, for $\mathit{\boldsymbol{\tau}}\in \mathcal{V}_k(K)$ , we opt for a mass matrix/DoF hybrid stabilizer approach. Let $\big\Vert{ \alpha^{-1/2}\mathit{\boldsymbol{\tau}}}\big\Vert_{h,K}^2 : = (\!(\mathit{\boldsymbol{\tau}}, {\mathit{\boldsymbol{\tau}}})\!)_{K}$ and

$\begin{equation} (\!(\mathit{\boldsymbol{\sigma}}, {\mathit{\boldsymbol{\tau}}})\!)_{K} : = \big({\Pi} \mathit{\boldsymbol{\sigma}}, {\Pi} \mathit{\boldsymbol{\tau}} \big)_K + {S}_K\big(({\rm I}-{\Pi} )\mathit{\boldsymbol{\sigma}}, ({\rm I}-{\Pi} )\mathit{\boldsymbol{\tau}}\big), \end{equation}$

(41)

where $S_{K}(\cdot,\cdot)$ is defined in (23).

To show the inverse inequality and the norm equivalence used in the reliability bound, on each element, we need to introduce some geometric measures. Consider a polygonal element $K$ and an edge $e\subset \partial K$ , let the height $l_e$ which measures how far from this edge $e$ one can advance to an interior subset of $K$ , and denote $T_e\subset K$ as a right triangle with height $l_e$ and base as edge $e$ .

Proposition 1. Under Assumption 1, $\mathcal{T}_{poly}$ satisfies (1) The number of edges in every $K\in \mathcal{T}_{poly}$ is uniformly bounded above. (2) For any edge $e$ on every $K$ , $l_e/h_e$ is uniformly bounded below.

Lemma 7.1 (Trace inequality on small edges [13]). If Proposition 1 holds, for $v \in H^1(K)$ and $K\in \mathcal{T}_{\mathrm{poly}}$ we have

$\begin{equation} h_e^{-1/2}\left\Vert{v}\right\Vert_{e} \lesssim h_K^{-1} \left\Vert{v}\right\Vert_{K} + \left\Vert{\nabla v}\right\Vert_{K}, \quad \mathit{on} \;e\subset K. \end{equation}$

(42)

Proof. The proof follows essentially equation (3.9) in [13,Lemma 3.3] as a standard scaled trace inequality on $e$ toward $T_e$ reads

$h_e^{-1/2}\left\Vert{v}\right\Vert_{e} \lesssim h_e^{-1} \left\Vert{v}\right\Vert_{T_e} + \left\Vert{\nabla v}\right\Vert_{T_e} \lesssim h_K^{-1} \left\Vert{v}\right\Vert_{K} + \left\Vert{\nabla v}\right\Vert_{K}.$

Lemma 7.2 (Inverse inequalities). Under Assumption 1, we have the following inverse estimates for $\mathit{\boldsymbol{\tau}} \in \mathcal{V}_k(K)$ (4) on any $K\in \mathcal{T}_{poly}$ with constants depending on $k$ and the number of edges in $K$ :

$\begin{equation} \|\nabla \cdot \mathit{\boldsymbol{\tau}}\|_K \lesssim h_K^{-1} \|\mathit{\boldsymbol{\tau}}\|_K, \quad \mathit{and} \quad \|\nabla \cdot \mathit{\boldsymbol{\tau}}\|_K \lesssim h_K^{-1} S_K\big(\mathit{\boldsymbol{\tau}},\mathit{\boldsymbol{\tau}}\big)^{1/2}. \end{equation}$

(43)

Proof. The first inequality in (43) can be shown using a bubble function trick. Choose $b_K$ be a bubble function of $T_{e'}$ where $e'$ is the longest edge on $\partial K$ . Denote $p : = \nabla \cdot \mathit{\boldsymbol{\tau}} \in \mathbb{P}_{k-1}(K)$ , we have

$\|\nabla \cdot \mathit{\boldsymbol{\tau}}\|_K^2 \lesssim (\nabla \cdot \mathit{\boldsymbol{\tau}}, p b_K) = -(\mathit{\boldsymbol{\tau}}, \nabla (p b_K)) \leq \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K \left\Vert{ \nabla (p b_K)}\right\Vert_K,$

and then $\left\Vert{ \nabla (p b_K)}\right\Vert$ can be estimated as follows

$\left\Vert{ \nabla (p b_K)}\right\Vert \leq \left\Vert{ b_K \nabla p }\right\Vert_K + \left\Vert{p\nabla b_K}\right\Vert_K \leq \left\Vert{ b_K }\right\Vert_{\infty,\Omega} \left\Vert{\nabla p }\right\Vert_K + \left\Vert{p}\right\Vert_K \left\Vert{\nabla b_K}\right\Vert_{\infty,K}.$

Consequently, the first inequality in (43) follows above by the standard inverse estimate for polynomials $\left\Vert{\nabla p}\right\Vert_K\lesssim h_K^{-1} \left\Vert{p}\right\Vert_K$ , and the properties of the bubble function $\left\Vert{b_K}\right\Vert_{\infty, K} = O(1)$ , and $\left\Vert{\nabla b_K}\right\Vert_{ \infty, K} = O(h_K^{-1})$ .

To prove the second inequality in (43), by integration by parts we have

$\begin{equation} \left\Vert{\nabla\cdot\mathit{\boldsymbol{\tau}}}\right\Vert^2 = (\nabla\cdot\mathit{\boldsymbol{\tau}}, p) = -(\mathit{\boldsymbol{\tau}},\nabla p) + \sum\limits_{e\subset\partial K} (\mathit{\boldsymbol{\tau}}\cdot \mathit{\boldsymbol{n}}_e, p). \end{equation}$

(44)

Expand $\nabla\cdot \mathit{\boldsymbol{\tau}} = p$ in the monomial basis $p(\mathit{\boldsymbol{x}}) = \sum_{\alpha\in \Lambda} p_{\alpha} m_{\alpha}(\mathit{\boldsymbol{x}})$ , and denote the mass matrix $\mathbf{M}: = \big( (m_{\alpha}, m_{\gamma})_{K} \big)_{\alpha\gamma}$ , $\mathbf{p}: = (p_{\alpha})_{\alpha\in \Lambda}$ , it is straightforward to see that

$\begin{equation} \left\Vert{p}\right\Vert_K^2 = \mathbf{p}^{\top} \mathbf{M} \mathbf{p} \geq \mathbf{p}^{\top} \operatorname{diag}(\mathbf{M}) \mathbf{p} \geq \min\limits_j \mathbf{M}_{jj}\left\Vert{\mathbf{p}}\right\Vert_{\ell^2}^2 \simeq h_K^2 \left\Vert{\mathbf{p}}\right\Vert_{\ell^2}^2, \end{equation}$

(45)

since $\int_K (x-x_K)^l (y-y_K)^m\,\mathrm{d} x\mathrm{d} y\geq 0$ for the off-diagonal entries of $\mathbf{M}$ due to $K$ being geometrically a rectangle (with additional vertices). As a result, applying the trace inequality in Lemma 7.1 on (44) yields

$\begin{aligned} \left\Vert{\nabla\cdot\mathit{\boldsymbol{\tau}}}\right\Vert^2 & \leq \left(\sum\limits_{\alpha\in \Lambda} (\mathit{\boldsymbol{\tau}}, m_{\alpha})_K^2 \right)^{1/2} \left(\sum\limits_{\alpha\in \Lambda} p_{\alpha}^2 \right)^{1/2} \\ & \quad + \left(\sum\limits_{e\subset \partial K} h_e \left\Vert{\mathit{\boldsymbol{\tau}}\cdot \mathit{\boldsymbol{n}}_e}\right\Vert_e^2 \right)^{1/2} \left(\sum\limits_{e\subset \partial K} h_e^{-1} \left\Vert{p}\right\Vert_e^2 \right)^{1/2} \\ & \lesssim S_K(\mathit{\boldsymbol{\tau}},\mathit{\boldsymbol{\tau}})^{1/2} \left(\left\Vert{\mathbf{p}}\right\Vert_{\ell^2} + h_K^{-1} \left\Vert{p}\right\Vert_K + \left\Vert{\nabla p}\right\Vert_K \right). \end{aligned}$

As a result, the second inequality in (43) is proved when apply an inverse inequality for $\left\Vert{\nabla p}\right\Vert_K$ and estimate (45).

Remark 2. While the proof in Lemma 7.2 relies on $K$ being a rectangle, the result holds for a much broader class of polygons by changing the basis of $\mathbb{P}_{k-1}(K)$ from the simple scaled monomials to quasi-orthogonal ones in [25,7] and apply the isotropic polygon scaling result in [13].

Lemma 7.3 (Norm equivalence). Under Assumption 1, let ${\Pi}$ be the oblique projection defined in (16), then the following relations holds for $\mathit{\boldsymbol{\tau}} \in \mathcal{V}_k(K)$ (4) on any $K\in \mathcal{T}_{poly}$ :

$\begin{equation} \gamma_* \Vert{\mathit{\boldsymbol{\tau}}}\Vert_K \leq \Vert{ \mathit{\boldsymbol{\tau}}}\Vert_{h,K} \leq \gamma^*\Vert{\mathit{\boldsymbol{\tau}}}\Vert_K, \end{equation}$

(46)

where both $\gamma_*$ and $\gamma^*$ depends on $k$ and the number of edges in $K$ .

Proof. First we consider the lower bound, by triangle inequality,

$\Vert{\mathit{\boldsymbol{\tau}}}\Vert_{K}\leq \big\Vert{{\Pi}\mathit{\boldsymbol{\tau}}}\big\Vert_{K} + \big\Vert{(\mathit{\boldsymbol{\tau}} - {\Pi}\mathit{\boldsymbol{\tau}}) }\big\Vert_{K}.$

Since ${\Pi}\mathit{\boldsymbol{\tau}} \in \mathcal{V}^{k}(K)$ , it suffices to establish the following to prove the lower bound in (46)

$\begin{equation} \Vert{\mathit{\boldsymbol{\tau}} }\Vert_{K}^2 \leq S_K\big(\mathit{\boldsymbol{\tau}},\mathit{\boldsymbol{\tau}}\big), \quad \;{\rm{ for }}\; \mathit{\boldsymbol{\tau}}\in \mathcal{V}_k(K). \end{equation}$

(47)

To this end, we consider the weak solution to the following auxiliary boundary value problem on $K$ :

$\begin{equation} \left\{ \begin{aligned} \Delta \psi & = \nabla\cdot \mathit{\boldsymbol{\tau}}&\;{\rm{ in }}\; K, \\ \frac{\partial \psi}{\partial n} & = \mathit{\boldsymbol{\tau}} \cdot\mathit{\boldsymbol{n}}_{\partial K} &\;{\rm{ on }}\;\partial K. \end{aligned} \right. \end{equation}$

(48)

By a standard Helmholtz decomposition result (e.g. Proposition 3.1, Chapter 1[23]), we have $\mathit{\boldsymbol{\tau}} -\nabla \psi = \nabla^{\perp} \phi$ . Moreover, since on $\partial K$ , $0 = \nabla^{\perp} \phi \cdot \mathit{\boldsymbol{n}} = \nabla \phi \cdot \mathit{\boldsymbol{t}} = \partial \phi/\partial s$ , we can further choose $\phi\in H^1_0(K)$ . As a result, by the assumption that $\nabla\times \mathit{\boldsymbol{\tau}} = 0$ for $\mathit{\boldsymbol{\tau}}$ in the modified virtual element space (4), we can verify that

$\left\Vert{\mathit{\boldsymbol{\tau}} - \nabla \psi}\right\Vert_K^2 = (\mathit{\boldsymbol{\tau}} - \nabla \psi, \nabla^{\perp} \phi) = 0.$

Consequently, we proved essentially the unisolvency of the modified VEM space (4) and $\mathit{\boldsymbol{\tau}} = \nabla \psi$ . We further note that $\psi$ in (48) can be chosen in $H^1(K)/\mathbb{R}$ and thus

$\begin{equation} \begin{aligned} & \big\Vert{\mathit{\boldsymbol{\tau}} }\big\Vert_{K}^2 = (\mathit{\boldsymbol{\tau}}, \nabla \psi)_K = \big(\mathit{\boldsymbol{\tau}}, \nabla \psi \big)_K \\ = & \; -\big(\nabla\cdot\mathit{\boldsymbol{\tau}}, \psi \big)_K+ (\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_{\partial K} ,\psi )_{\partial K} \\ \leq & \;\|\nabla \cdot \mathit{\boldsymbol{\tau}}\|_K \| \psi\|_K + \sum\limits_{e\subset \partial K} \|\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e\|_e\| \psi \|_e \\ \leq &\; \|\nabla \cdot \mathit{\boldsymbol{\tau}}\|_K \| \psi\|_K + \left(\sum\limits_{e\subset \partial K} h_e\|\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e\|_e^2\right)^{1/2} \left(\sum\limits_{e\subset \partial K} h_e^{-1}\|\psi\|_e^2\right)^{1/2} \end{aligned} \end{equation}$

(49)

Proposition 1 allows us to apply an isotropic trace inequality on an edge of a polygon (Lemma 7.1), combining with the Poincaré inequality for $H^1(K)/\mathbb{R}$ , we have, on every $e\subset \partial K$ ,

$h_e^{-1/2}\|\psi\|_e \lesssim h_K^{-1} \|\psi\|_K + \|\nabla \psi\|_K \lesssim \|\nabla \psi\|_K.$

Furthermore applying the inverse estimate in Lemma 7.2 on the bulk term above, we have

$\big\Vert{\mathit{\boldsymbol{\tau}} }\big\Vert_{K}^2 \lesssim S_K\big(\mathit{\boldsymbol{\tau}},\mathit{\boldsymbol{\tau}}\big)^{1/2} \|\nabla \psi\|_K,$

which proves the validity of (47), thus yield the lower bound.

To prove the upper bound, by $\big\Vert{{\Pi}\mathit{\boldsymbol{\tau}}}\big\Vert_{K}\leq \Vert{\mathit{\boldsymbol{\tau}}}\Vert_{K}$ , it suffices to establish the reversed direction of (47) on a single edge $e$ and for a single monomial basis $m_{\alpha}\in \mathbb{P}_{k-1}(K)$ :

$\begin{equation} h_e\|\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e\|_e^2 \lesssim \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K,\quad \;{\rm{ and }}\; \quad |(\mathit{\boldsymbol{\tau}}, \nabla m_{\alpha})_K| \leq \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K. \end{equation}$

(50)

To prove the first inequality, by Proposition 1 again, consider the edge bubble function $b_e$ such that $\operatorname{supp} b_e = T_e$ . We can let $b_e = 0$ on $e'\subset\partial K$ for $e'\neq e$ . It is easy to verify that:

$\begin{equation} \left\Vert{\nabla b_e}\right\Vert_{\infty,K} = O(1/h_e), \;{\rm{ and }}\; \left\Vert{b_e}\right\Vert_{\infty, K} = O(1). \end{equation}$

(51)

Denote $q_e: = \mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e$ , and extend it to $\mathop K\limits^ \circ$ by a constant extension in the normal direction rectangular strip $R_e \subset K$ with respect to $e$ (notice $\operatorname{supp} b_e \subset R_e$ ), we have

$\begin{aligned} \|\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e\|_e^2 & \lesssim \big(\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e, b_e q_e \big)_e = x\big(\mathit{\boldsymbol{\tau}}\cdot\mathit{\boldsymbol{n}}_e, b_e q_e \big)_{\partial K} \\ & = \big(\mathit{\boldsymbol{\tau}}, q_e\nabla b_e \big)_K + \big(\nabla\cdot\mathit{\boldsymbol{\tau}}, b_e q_e\big)_K \\ & \leq \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K \left\Vert{q_e\nabla b_e}\right\Vert_{T_e} + \left\Vert{\nabla\cdot\mathit{\boldsymbol{\tau}}}\right\Vert_K \left\Vert{q_e b_e}\right\Vert_{T_e}, \\ & \leq \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K \left\Vert{q_e}\right\Vert_{T_e} \left\Vert{\nabla b_e}\right\Vert_{\infty,K} + \left\Vert{\nabla\cdot\mathit{\boldsymbol{\tau}}}\right\Vert_K \left\Vert{q_e}\right\Vert_{T_e} \left\Vert{b_e}\right\Vert_{\infty,K}. \end{aligned}$

Now by the fact that $\left\Vert{q_e}\right\Vert_{T_e} \lesssim h_e^{1/2} \left\Vert{q_e}\right\Vert_e$ , the scaling of the edge bubble function in (51), and the first inverse estimate of $\left\Vert{\nabla\cdot\mathit{\boldsymbol{\tau}}}\right\Vert_K\lesssim h_K^{-1}\left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K$ in Lemma 7.2 yields the first part of (50).

The second inequality in (50) can be estimated straightforward by the scaling of the monomials (7)

$\begin{equation} \left|(\mathit{\boldsymbol{\tau}}, \nabla m_{\alpha})_K\right| \leq \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K \left\Vert{\nabla m_{\alpha}}\right\Vert_K \leq \left\Vert{\mathit{\boldsymbol{\tau}}}\right\Vert_K . \end{equation}$

(52)

Hence, (46) is proved.

References

[1]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is All you Need, in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. https://doi.org/10.48550/arXiv.2206.09457
[2]	Q. Wang, B. Li, T. Xiao, J. Zhu, C. Li, D. F. Wong, et al., Learning deep transformer models for machine translation, preprint, arXiv: 1906.01787.
[3]	S. A. Chowdhury, A. Abdelali, K. Darwish, J. Soon-Gyo, J. Salminen, B. J. Jansen, Improving arabic text categorization using transformer training diversification, in Proceedings of the Fifth Arabic Natural Language Processing Workshop (COLING-WANLP), (2020), 226–236. https://aclanthology.org/2020.wanlp-1.21
[4]	X. Ma, P. Zhang, S. Zhang, N. Duan, Y. Hou, M. Zhou, et al., A tensorized transformer for language modeling, preprint, arXiv: 1906.09777.
[5]	J. Devlin, M. W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, preprint, arXiv: 1810.04805.
[6]	Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, et al., RoBERTa: A robustly optimized BERT pretraining approach, preprint, arXiv: 1907.11692.
[7]	H. Xu, B. Liu, L. Shu, P. S. Yu, BERT post-training for review reading comprehension and aspect-based sentiment analysis, preprint, arXiv: 1904.02232.
[8]	P. Shi, J. Lin, Simple BERT models for relation extraction and semantic role labeling, preprint, arXiv: 1904.05255.
[9]	V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, preprint, arXiv: 1910.01108.
[10]	Y. Cheng, D. Wang, P. Zhou, T. Zhang, Model compression and acceleration for deep neural networks: The principles, progress, and challenges, IEEE Signal Process. Mag., 35 (2018), 126–136. https://doi.org/10.1109/MSP.2017.2765695 doi: 10.1109/MSP.2017.2765695
[11]	S. Cheng, D. Lucor, J. P. Argaud, Observation data compression for variational assimilation of dynamical systems, J. Comput. Sci., 53 (2021), 101405. https://doi.org/10.1016/j.jocs.2021.101405 doi: 10.1016/j.jocs.2021.101405
[12]	S. Liu, Y. Lin, Z. Zhou, K. Nan, H. Liu, J. Du, On-demand deep model compression for mobile devices: A usage-driven model selection framework, in Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, (2018), 389–400. https://doi.org/10.1145/3210240.3210337
[13]	S. Liu, J. Du, K. Nan, Z. Zhou, H. Liu, Z. Wang, et al., AdaDeep: A usage-driven, automated deep model compression framework for enabling ubiquitous intelligent mobiles, IEEE Trans. Mob. Comput., 20 (2021), 3282–3297. https://doi.org/10.1109/TMC.2020.2999956 doi: 10.1109/TMC.2020.2999956
[14]	V. L. Tran, S. E. Kim, Efficiency of three advanced data-driven models for predicting axial compression capacity of CFDST columns, Thin-Walled Struct., 152 (2020), 106744. https://doi.org/10.1016/j.tws.2020.106744 doi: 10.1016/j.tws.2020.106744
[15]	Z. X. Hu, Y. Wang, M. F. Ge, J. Liu, Data-driven fault diagnosis method based on compressed sensing and improved multiscale network, IEEE Trans. Ind. Electron., 67 (2020), 3216–3225. https://doi.org/10.1109/TIE.2019.2912763 doi: 10.1109/TIE.2019.2912763
[16]	S. Cheng, I. C. Prentice, Y. Huang, Y. Jin, Y. K. Guo, R. Arcucci, Data-driven surrogate model with latent data assimilation: Application to wildfire forecasting, J. Comput. Phys., 464 (2022). https://doi.org/10.1016/j.jcp.2022.111302
[17]	S. Yang, Z. Zhang, C. Zhao, X. Song, S. Guo, H. Li, CNNPC: End-edge-cloud collaborative CNN inference with joint model partition and compression, IEEE Trans. Parallel Distrib. Syst., (2022), 1–1. https://doi.org/10.1109/TPDS.2022.3177782 doi: 10.1109/TPDS.2022.3177782
[18]	H. He, S. Jin, C. K. Wen, F. Gao, G. Y. Li, Z. Xu, Model-driven deep learning for physical layer communications, IEEE Wireless Commun., 26 (2019), 77–83. https://doi.org/10.1109/MWC.2019.1800447 doi: 10.1109/MWC.2019.1800447
[19]	Z. Liu, M. del Rosario, Z. Ding, A markovian model-driven deep learning framework for massive MIMO CSI feedback, IEEE Trans. Wireless Commun., 21 (2022), 1214–1228. https://doi.org/10.1109/TWC.2021.3103120 doi: 10.1109/TWC.2021.3103120
[20]	W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, M. Zhou, MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers, preprint, arXiv: 2002.10957.
[21]	X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, et al., TinyBERT: Distilling BERT for natural language understanding, preprint, arXiv: 1909.10351.
[22]	S. Sun, Y. Cheng, Z. Gan, J. Liu, Patient knowledge distillation for BERT model compression, preprint, arXiv: 1908.09355.
[23]	H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jegou, Training data-efficient image transformers & distillation through attention, in Proceedings of the 38th International Conference on Machine Learning (ICML), (2021), 10347–10357. https://doi.org/10.48550/arXiv.2012.12877
[24]	P. Michel, O. Levy, G. Neubig, Are sixteen heads really better than one?, Adv. Neural Inf. Process. Syst., preprint, arXiv: 1905.10650.
[25]	M. A. Gordon, K. Duh, N. Andrews, Compressing BERT: Studying the effects of weight pruning on transfer learning, preprint, arXiv: 2002.08307.
[26]	T. Chen, Y. Cheng, Z. Gan, L. Yuan, L. Zhang, Z. Wang, Chasing sparsity in vision transformers: An end-to-end exploration, Adv. Neural Inf. Process. Syst., (2021), 19974–19988. https://doi.org/10.48550/arXiv.2106.04533 doi: 10.48550/arXiv.2106.04533
[27]	T. Chen, J. Frankle, S. Chang, S. Liu, Y. Zhang, Z. Wang, et al., The lottery ticket hypothesis for pre-trained BERT networks, Adv. Neural Inf. Process. Syst., (2020), 15834–15846. https://doi.org/10.48550/arXiv.2007.12223 doi: 10.48550/arXiv.2007.12223
[28]	S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, et al., Q-BERT: Hessian based ultra low precision quantization of BERT, preprint, arXiv: 1909.05840.
[29]	Z. Liu, Y. Wang, K. Han, S. Ma, W. Gao, Post-training quantization for vision transformer, preprint, arXiv: 2106.14156.
[30]	H. Bai, W. Zhang, L. Hou, L. Shang, J. Jin, X. Jiang, et al., BinaryBERT: Pushing the limit of BERT quantization, preprint, arXiv: 2012.15701.
[31]	O. Zafrir, G. Boudoukh, P. Izsak, M. Wasserblat, Q8BERT: Quantized 8Bit BERT, in the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS 2019, (2019), 36–39. https://doi.org/10.1109/EMC2-NIPS53020.2019.00016
[32]	Z. Wu, Z. Liu, J. Lin, Y. Lin, S. Han, Lite transformer with long-short range attention, preprint, arXiv: 2004.11886.
[33]	L. Hou, Z. Huang, L. Shang, X. Jiang, X. Chen, Q. Liu, DynaBERT: Dynamic BERT with adaptive width and depth, preprint, arXiv: 2004.04037.
[34]	M. Chen, H. Peng, J. Fu, H. Ling, AutoFormer: Searching transformers for visual recognition, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 12250–12260. https://doi.org/10.1109/ICCV48922.2021.01205
[35]	P. Ganesh, Y. Chen, X. Lou, M. A. Khan, Y. Yang, H. Sajjad, et al., Compressing large-scale transformer-based models: A case study on BERT, Trans. Assoc. Comput. Linguist., 9 (2021), 1061–1080. https://doi.org/10.1162/tacl_a_00413 doi: 10.1162/tacl_a_00413
[36]	S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
[37]	J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, preprint, arXiv: 1412.3555.
[38]	D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, preprint, arXiv: 1409.0473.
[39]	B. Li, S. Pandey, H. Fang, Y. Lyv, J. Li, J. Chen, et al., FTRANS: energy-efficient acceleration of transformers using FPGA, in Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), (2020), 175–180. https://doi.org/10.1145/3370748.3406567
[40]	T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, et al., A.3: Accelerating attention mechanisms in neural networks with approximation, in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), (2020), 328–341. https://doi.org/10.1109/HPCA47549.2020.00035
[41]	T. J. Ham, Y. Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, et al., ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks, in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), (2021), 692–705. https://doi.org/10.1109/ISCA52012.2021.00060
[42]	X. Zhang, Y. Wu, P. Zhou, X. Tang, J. Hu, Algorithm-hardware co-design of attention mechanism on FPGA devices, ACM Trans. Embed. Comput. Syst., 20 (2021), 1–24. https://doi.org/10.1145/3477002 doi: 10.1145/3477002
[43]	S. Lu, M. Wang, S. Liang, J. Lin, Z. Wang, Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer, in IEEE International SOC Conference, (2020), 84–89. https://doi.org/10.1109/ISCA52012.2021.00060
[44]	A. Parikh, O. Tä ckströ m, D. Das, J. Uszkoreit, A decomposable attention model for natural language inference, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, (2016), 2249–2255. https://doi.org/10.48550/arXiv.1606.01933
[45]	Z. Lin, M. Feng, C. N. dos Santos, M. Yu, B. Xiang, B. Zhou, et al., A structured self-attentive sentence embedding, preprint, arXiv: 1703.03130
[46]	M. S. Charikar, Similarity estimation techniques from rounding algorithms, in Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, (2002), 380–388. https://doi.org/10.1145/509907.509965
[47]	X. Zhang, F. X. Yu, R. Guo, S. Kumar, S. Wang, S. F. Chang, Fast orthogonal projection based on kronecker product, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 2929–2937. https://doi.org/10.1109/ICCV.2015.335
[48]	Y. Gong, S. Kumar, H. A. Rowley, S. Lazebnik, Learning binary codes for high-dimensional data using bilinear projections, in 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2013), 484–491. https://doi.org/10.1109/CVPR.2013.69
[49]	M. Wang, S. Lu, D. Zhu, J. Lin, Z. Wang, A high-speed and low-complexity architecture for softmax function in deep learning, in 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), (2018), 223–226. https://doi.org/10.1109/APCCAS.2018.8605654
[50]	R. Hu, B. Tian, S. Yin, S. Wei, Efficient hardware architecture of softmax layer in deep neural network, in 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), (2018), 1–5. https://doi.org/10.1109/ICDSP.2018.8631588
[51]	L. Deng, G. Li, S. Han, L. Shi, Y. Xie, Model compression and hardware acceleration for neural networks: A comprehensive survey, Proc. IEEE, 108 (2020), 485–532. https://doi.org/10.1109/JPROC.2020.2976475 doi: 10.1109/JPROC.2020.2976475
[52]	C. Ding, S. Liao, Y. Wang, Z. Li, N. Liu, Y. Zhuo, et al., C ir CNN: Accelerating and compressing deep neural networks using block-circulant weight matrices, in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), (2017), 395–408. https://doi.org/10.1145/3123939.3124552
[53]	S. Wang, Z. Li, C. Ding, B. Yuan, Q. Qiu, Y. Wang, et al., C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs, in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), (2018), 11–20. https://doi.org/10.1145/3174243.3174253
[54]	L. Zhao, S. Liao, Y. Wang, Z. Li, J. Tang, B. Yuan, Theoretical properties for neural networks with weight matrices of low displacement rank, in Proceedings of the 34th International Conference on Machine Learning (ICML), (2017), 4082–4090. https://doi.org/10.48550/arXiv.1703.00144
[55]	V. Y. Pan, Structured matrices and displacement operators, in Structured Matrices and Polynomials: Unified Superfast Algorithms, Springer Science & Business Media, (2001), 117–153. https://doi.org/10.1007/978-1-4612-0129-8
[56]	J. O. Smith, Mathematics of the discrete fourier transform (DFT): with audio applications, in Mathematics of the Discrete Fourier Transform (DFT): With Audio Applications, Julius Smith, (2007), 115–164. https://ccrma.stanford.edu/~jos/st/
[57]	Z. Liu, G. Li, J. Cheng, Hardware acceleration of fully quantized BERT for efficient natural language processing, in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), (2021), 513–516. https://doi.org/10.23919/DATE51398.2021.9474043
[58]	M. Sun, H. Ma, G. Kang, Y. Jiang, T. Chen, X. Ma, et al., VAQF: Fully automatic software-hardware co-design framework for low-bit vision transformer, preprint, arXiv: 2201.06618.
[59]	Z. Liu, Z. Shen, M. Savvides, K. T. Cheng, ReActNet: Towards precise binary neural network with generalized activation functions, in Computer Vision–ECCV 2020 (ECCV), (eds. Vedaldi. A., Bischof. H., Brox. T., Frahm. J.-M.), Cham, Springer International Publishing, (2020), 143–159. https://doi.org/10.1007/978-3-030-58568-6_9
[60]	M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: ImageNet classification using binary convolutional neural networks, in Computer Vision–ECCV 2016 (ECCV), (eds. Leibe. B., Matas. J., Sebe. N., Welling. M.), Cham, Springer International Publishing, (2016), 525–542. https://doi.org/10.1007/978-3-319-46493-0_32
[61]	S. Han, H. Mao, W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, preprint, arXiv: 1510.00149.
[62]	W. Wen, C. Wu, Y. Wang, Y. Chen, H. Li, Learning structured sparsity in deep neural networks, in Advances in Neural Information Processing Systems (NeurIPS), Curran Associates, (2016). https://doi.org/10.48550/arXiv.1608.03665
[63]	X. Ma, F. M. Guo, W. Niu, X. Lin, J. Tang, K. Ma, et al., PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices, in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), (2020), 5117–5124. https://doi.org/10.1609/aaai.v34i04.5954
[64]	B. Li, Z. Kong, T. Zhang, J. Li, Z. Li, H. Liu, et al., Efficient transformer-based large scale language representations using hardware-friendly block structured pruning, preprint, arXiv: 2009.08065.
[65]	S. Cao, C. Zhang, Z. Yao, W. Xiao, L. Nie, D. Zhan, et al., Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity, in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), (2019), 63–72. https://doi.org/10.1145/3289602.3293898
[66]	H. Peng, S. Huang, T. Geng, A. Li, W. Jiang, H. Liu, et al., Accelerating transformer-based deep learning models on FPGAs using column balanced block pruning, in 2021 22nd International Symposium on Quality Electronic Design (ISQED), (2021), 142–148. https://doi.org/10.1109/ISQED51717.2021.9424344
[67]	C. Ding, A. Ren, G. Yuan, X. Ma, J. Li, N. Liu, et al., Structured weight matrices-based hardware accelerators in deep neural networks: FPGAs and ASICs, in Proceedings of the 2018 on Great Lakes Symposium on VLSI (GLSVLSI), Chicago, IL, USA, Association for Computing Machinery, (2018), 353–358. https://doi.org/10.1145/3194554.3194625
[68]	S. Narang, E. Undersander, G. Diamos, Block-sparse recurrent neural networks, preprint, arXiv: 1711.02782.
[69]	P. Qi, E. H. M. Sha, Q. Zhuge, H. Peng, S. Huang, Z. Kong, et al., Accelerating framework of transformer by hardware design and model compression co-optimization, in 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), (2021), 1–9. https://doi.org/10.1109/ICCAD51958.2021.9643586
[70]	P. Qi, Y. Song, H. Peng, S. Huang, Q. Zhuge, E. H. M. Sha, Accommodating transformer onto FPGA: Coupling the balanced model compression and FPGA-implementation optimization, in Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI), Virtual Event, USA, Association for Computing Machinery, (2021), 163–168. https://doi.org/10.1145/3453688.3461739
[71]	D. So, Q. Le, C. Liang, The evolved transformer, in Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR, (2019), 5877–5886. https://doi.org/10.48550/arXiv.1901.11117
[72]	H. Wang, Efficient algorithms and hardware for natural language processing, Graduate Theses, Retrieved from the Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/127440.
[73]	H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, V. Chandra, et al., Bit fusion: Bit-Level dynamically composable architecture for accelerating deep neural network, in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), (2018), 764–775. https://doi.org/10.1109/ISCA.2018.00069
[74]	R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, et al., Templates for the solution of linear systems: Building blocks for iterative methods, in Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, Society for Industrial and Applied Mathematics, (1994), 39–55. https://doi.org/10.1137/1.9781611971538
[75]	W. Liu, B. Vinter, CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication, in Proceedings of the 29th ACM on International Conference on Supercomputing (ICS), Newport Beach, California, USA, Association for Computing Machinery, (2015), 339–350. https://doi.org/10.1145/2751205.2751209
[76]	R. Kannan, Efficient sparse matrix multiple-vector multiplication using a bitmapped format, in 20th Annual International Conference on High Performance Computing (HiPC), (2013), 286–294. https://doi.org/10.1109/HiPC.2013.6799135
[77]	W. Jiang, X. Zhang, E. H. M. Sha, L. Yang, Q. Zhuge, Y. Shi, et al., Accuracy vs. efficiency: achieving both through FPGA-implementation aware neural architecture search, in Proceedings of the 56th Annual Design Automation Conference 2019 (DAC), Las Vegas NV USA, ACM, (2019), 1–6. https://doi.org/10.1145/3316781.3317757
[78]	W. Jiang, E. H. M. Sha, X. Zhang, L. Yang, Q. Zhuge, Y. Shi, et al., Achieving super-linear speedup across multi-FPGA for real-time DNN inference, preprint, arXiv: 1907.08985.
[79]	W. Jiang, X. Zhang, E. H. M. Sha, Q. Zhuge, L. Yang, Y. Shi, et al., XFER: A novel design to achieve super-linear performance on multiple FPGAs for real-time AI, in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), Seaside, CA, USA, Association for Computing Machinery, (2019), 305. https://doi.org/10.1145/3289602.3293988

This article has been cited by:

1.	Hengliang Tang, Jinda Dong, Solving Flexible Job-Shop Scheduling Problem with Heterogeneous Graph Neural Network Based on Relation and Deep Reinforcement Learning, 2024, 12, 2075-1702, 584, 10.3390/machines12080584
2.	Chen Han, Xuanyin Wang, TPN:Triple network algorithm for deep reinforcement learning, 2024, 591, 09252312, 127755, 10.1016/j.neucom.2024.127755
3.	Miguel S. E. Martins, João M. C. Sousa, Susana Vieira, A Systematic Review on Reinforcement Learning for Industrial Combinatorial Optimization Problems, 2025, 15, 2076-3417, 1211, 10.3390/app15031211
4.	Tianhua Jiang, Lu Liu, A Bi-Population Competition Adaptive Interior Search Algorithm Based on Reinforcement Learning for Flexible Job Shop Scheduling Problem, 2025, 24, 1469-0268, 10.1142/S1469026824500251
5.	Tianyuan Mao, A Review of Scheduling Methods for Multi-AGV Material Handling Systems in Mixed-Model Assembly Workshops, 2025, 5, 2710-0723, 227, 10.54691/p4x5a536
6.	Peng Zhao, You Zhou, Di Wang, Zhiguang Cao, Yubin Xiao, Xuan Wu, Yuanshu Li, Hongjia Liu, Wei Du, Yuan Jiang, Liupu Wang, 2025, Dual Operation Aggregation Graph Neural Networks for Solving Flexible Job-Shop Scheduling Problem with Reinforcement Learning, 9798400712746, 4089, 10.1145/3696410.3714616
7.	Yuxin Peng, Youlong Lyu, Jie Zhang, Ying Chu, Heterogeneous Graph Neural-Network-Based Scheduling Optimization for Multi-Product and Variable-Batch Production in Flexible Job Shops, 2025, 15, 2076-3417, 5648, 10.3390/app15105648

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)