Improved YOLOv7-based steel surface defect detection algorithm

Yinghong Xie; Biao Yin; Xiaowei Han; Yan Hao; Yinghong Xie; Biao Yin; Xiaowei Han; Yan Hao

doi:10.3934/mbe.2024016

Mathematical Biosciences and Engineering

2024, Volume 21, Issue 1: 346-368. doi: 10.3934/mbe.2024016

Previous Article Next Article

Research article Special Issues

Improved YOLOv7-based steel surface defect detection algorithm

1.
School of Information Engineering, Shenyang University, Shenyang 110003, China
2.
Institute for Science, Technology and Innovation, Shenyang University, Shenyang 110003, China

Academic Editor: Shangce Gao

Received: 09 October 2023 Revised: 27 November 2023 Accepted: 28 November 2023 Published: 13 December 2023

In response to the limited detection ability and low model generalization ability of the YOLOv7 algorithm for small targets, this paper proposes a detection algorithm based on the improved YOLOv7 algorithm for steel surface defect detection. First, the Transformer-InceptionDWConvolution (TI) module is designed, which combines the Transformer module and InceptionDWConvolution to increase the network's ability to detect small objects. Second, the spatial pyramid pooling fast cross-stage partial channel (SPPFCSPC) structure is introduced to enhance the network training performance. Third, a global attention mechanism (GAM) attention mechanism is designed to optimize the network structure, weaken the irrelevant information in the defect image, and increase the algorithm's ability to detect small defects. Meanwhile, the Mish function is used as the activation function of the feature extraction network to improve the model's generalization ability and feature extraction ability. Finally, a minimum partial distance intersection over union (MPDIoU) loss function is designed to locate the loss and solve the mismatch problem between the complete intersection over union (CIoU) prediction box and the real box directions. The experimental results show that on the Northeastern University Defect Detection (NEU-DET) dataset, the improved YOLOv7 network model improves the mean Average precision (mAP) performance by 6% when compared to the original algorithm, while on the VOC2012 dataset, the mAP performance improves by 2.6%. These results indicate that the proposed algorithm can effectively improve the small defect detection performance on steel surface defects.

Keywords:

Citation: Yinghong Xie, Biao Yin, Xiaowei Han, Yan Hao. Improved YOLOv7-based steel surface defect detection algorithm[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 346-368. doi: 10.3934/mbe.2024016

Related Papers:

[1]	Eunha Shim . Optimal strategies of social distancing and vaccination against seasonal influenza. Mathematical Biosciences and Engineering, 2013, 10(5&6): 1615-1634. doi: 10.3934/mbe.2013.10.1615
[2]	Hamed Karami, Pejman Sanaei, Alexandra Smirnova . Balancing mitigation strategies for viral outbreaks. Mathematical Biosciences and Engineering, 2024, 21(12): 7650-7687. doi: 10.3934/mbe.2024337
[3]	Pannathon Kreabkhontho, Watchara Teparos, Thitiya Theparod . Potential for eliminating COVID-19 in Thailand through third-dose vaccination: A modeling approach. Mathematical Biosciences and Engineering, 2024, 21(8): 6807-6828. doi: 10.3934/mbe.2024298
[4]	Sarafa A. Iyaniwura, Musa Rabiu, Jummy F. David, Jude D. Kong . Assessing the impact of adherence to Non-pharmaceutical interventions and indirect transmission on the dynamics of COVID-19: a mathematical modelling study. Mathematical Biosciences and Engineering, 2021, 18(6): 8905-8932. doi: 10.3934/mbe.2021439
[5]	Lili Liu, Xi Wang, Yazhi Li . Mathematical analysis and optimal control of an epidemic model with vaccination and different infectivity. Mathematical Biosciences and Engineering, 2023, 20(12): 20914-20938. doi: 10.3934/mbe.2023925
[6]	Antonios Armaou, Bryce Katch, Lucia Russo, Constantinos Siettos . Designing social distancing policies for the COVID-19 pandemic: A probabilistic model predictive control approach. Mathematical Biosciences and Engineering, 2022, 19(9): 8804-8832. doi: 10.3934/mbe.2022409
[7]	Seyedeh Nazanin Khatami, Chaitra Gopalappa . Deep reinforcement learning framework for controlling infectious disease outbreaks in the context of multi-jurisdictions. Mathematical Biosciences and Engineering, 2023, 20(8): 14306-14326. doi: 10.3934/mbe.2023640
[8]	Avinash Shankaranarayanan, Hsiu-Chuan Wei . Mathematical modeling of SARS-nCoV-2 virus in Tamil Nadu, South India. Mathematical Biosciences and Engineering, 2022, 19(11): 11324-11344. doi: 10.3934/mbe.2022527
[9]	Chloe Bracis, Mia Moore, David A. Swan, Laura Matrajt, Larissa Anderson, Daniel B. Reeves, Eileen Burns, Joshua T. Schiffer, Dobromir Dimitrov . Improving vaccination coverage and offering vaccine to all school-age children allowed uninterrupted in-person schooling in King County, WA: Modeling analysis. Mathematical Biosciences and Engineering, 2022, 19(6): 5699-5716. doi: 10.3934/mbe.2022266
[10]	Amira Bouhali, Walid Ben Aribi, Slimane Ben Miled, Amira Kebir . Impact of immunity loss on the optimal vaccination strategy for an age-structured epidemiological model. Mathematical Biosciences and Engineering, 2024, 21(6): 6372-6392. doi: 10.3934/mbe.2024278

Abstract

Let $(X, \Theta)$ be a dlt pair, projective over a base scheme $S$ , and $H$ an $\mathbb{R}$ -divisor that is ample over $S$ . As we run the $(X, \Theta)$ -MMP over $S$ with scaling of $H$ as in Definition 1, at the $i$ th step there are 3 possibilities.

● (Divisorial) $X^i\stackrel{\phi_i}{\longrightarrow} Z^i = X^{i+1}$ ,

● (Flipping) $X^i\stackrel{\phi_i}{\longrightarrow}Z^i \stackrel{\phi_i^+}{\longleftarrow}(X^i)^+ = X^{i+1}$ .

● (Mixed) $X^i\stackrel{\phi_i}{\longrightarrow} Z^i$ , whose exceptional set contains a divisor, followed by a small modification $Z^i \stackrel{\psi_i}{\longleftarrow}X^{i+1}$ .

Note that the mixed case can occur only if either $X^i$ is not $\mathbb{Q}$ -factorial or $\phi_i$ contracts an extremal face of dimension $\geq 2$ . In most treatments this is avoided by working with $\mathbb{Q}$ -factorial varieties and choosing $H$ sufficiently general.

We can almost always choose the initial $X$ to be nonsingular, but frequently other considerations constrain the choice of $H$ .

Our aim is to discuss a significant special case where the $X^i$ are not $\mathbb{Q}$ -factorial and we do contract extremal faces of dimension $\geq 2$ , but still avoid the mixed case. This has several applications, some of which are discussed in Section 2.

1. Relative MMP with scaling of an exceptional divisor

Definition 1 (MMP with scaling). Let $X, S$ be Noetherian, normal schemes and $g:X\to S$ a projective morphism. Let $\Theta$ be an $\mathbb{R}$ -divisor on $X$ and $H$ an $\mathbb{R}$ -Cartier, $\mathbb{R}$ -divisor on $X$ . Assume that $K_X+\Theta+r_XH$ is $g$ -ample for some $r_X$ .

By the $(X, \Theta)$ -MMP with scaling of $H$ we mean a sequence of normal, projective schemes $g_j:X^j\to S$ and birational contractions $\tau_j:X^j ⇢ X^{j+1}$ , together with real numbers $r_X = r_0>r_1\cdots$ , that are constructed by the following process.

$\bullet$ We start with $(X^1, \Theta^1, H^1): = (X, \Theta, H)$ and $r_0 = r_X$ . If $D$ is any $\mathbb{R}$ -divisor on $X$ , we let $D^j$ denote its birational transform on $X^j$ .

$\bullet$ If $X^j, \Theta^j, H^j$ are already defined, we let $r_j<r_{j-1}$ be the unique real number for which $K_{X^j}+\Theta^j+r_jH^j$ is $g_j$ -nef but not $g_j$ -ample. Then the $j$ th step of the MMP is a diagram

$\begin{array}{rrcll} (X^j, \Theta^j) &\stackrel{\phi_j}{\rightarrow } & Z^j & \stackrel{\psi_j}{\leftarrow} & (X^{j+1}, \Theta^{j+1})\\ g_j &\searrow &\downarrow & \swarrow &g_{j+1} \\ && S && \end{array}$

(1.1)

where

(2) $\phi_j$ is the contraction defined by $K_{X^j}+\Theta^j+r_jH^j$ ,

(3) $\psi_j$ is small, and

(4) $K_{X^{j+1}}+\Theta^{j+1}+(r_j-\epsilon)H^{j+1}$ is $g_{j+1}$ -ample for $0<\epsilon\ll 1$ .

Note that (4) implies that $H^{j+1}$ must be $\mathbb{R}$ -Cartier.

In general such a diagram need not exist, but if it does, it is unique and then $X^{j+1}, \Theta^{j+1}, H^{j+1}$ satisfy the original assumptions. Thus, as far as the existence of MMP-steps is concerned, we can focus on the 1st step. In this case it is customary to drop the upper indices and write (1.1) as

$\begin{array}{rrcll} (X, \Theta) &\stackrel{\phi}{\rightarrow} & Z & \stackrel{\phi^+}{\leftarrow} & (X^+, \Theta^+)\\ g &\searrow &\downarrow & \swarrow &g^+ \\ && S && \end{array}$

(1.5)

We say that the MMP terminates with $g_j:X^j \to S$ if

(6) either $K_{X^j}+\Theta^j$ is $g_j$ -nef, in which case $(X^j, \Theta^j)$ is called a minimal model of $(X, \Theta)$ ,

(7) or $\phi_j: X^j{\to} Z^j$ exists and $\dim Z^j<\dim X^j$ ; then $\phi_j$ is called a Fano contraction.

Warning 1.8. Our terminology is slightly different from [7], where it is assumed that $X^j/Z^j$ has relative Picard number 1, and $r_j = r_{j-1}$ is allowed. In effect, we declare that the composite of all [7]-steps with the same value of $r$ is a single step for us. Thus we sometimes contract an extremal face, not just an extremal ray.

One advantage is that our MMP steps are uniquely determined by the starting data. This makes it possible to extend the theory to algebraic spaces [33].

Theorem 2 is formulated for Noetherian base schemes. We do not prove any new results about the existence of flips, but Theorem 2 says that if the MMP with scaling exists and terminates, then its steps are simpler than expected, and the end result is more controlled than expected.

On the other hand, for 3-dimensional schemes, Theorem 2 can be used to conclude that, in some important cases, the MMP runs and terminates, see Theorem 9.

Theorem 2. Let $Y$ be a Noetherian, normal scheme and $g:X\to Y$ a projective, birational morphism with reduced exceptional divisor $E = E_1+\cdots + E_n$ . Assume the following (which are frequently easy to achieve, see Paragraphs 7-8).

(i) $(X, \Theta)$ is dlt and the $E_i$ are $\mathbb{Q}$ -Cartier.

(ii) $K_X+\Theta\equiv_{g} E_\Theta$ , where $E_\Theta = \sum e_iE_i$ .

(iii) $H = \sum h_iE_i$ , where $-H$ is effective and ${\rm{supp}} H = E = {\rm{Ex(}}g{\rm{)}}$ .

(iv) $K_X+\Theta+r_XH$ is $g$ -ample for some $r_X>0$ .

(v) The $h_i$ are linearly independent over $\mathbb{Q}(e_1, \dots, e_n)$ .

We run the $(X, \Theta)$ -MMP with scaling of $H$ . Assume that we reached the $j$ th step as in (1.1). Then the following hold.

(1) ${\rm{Ex}}(\phi_j)\subset {\rm{supp}}(E^j)$ and

(a) either ${\rm{Ex}}(\phi_j)$ is an irreducible divisor and $X^{j+1} = Z^j$ ,

(b) or $\phi_j$ is small, and there are irreducible components $E^j_{i_1}, E^j_{i_2}$ of $E^j$ such that $E^j_{i_1}$ and $-E^j_{i_2}$ are both $\phi_j$ -ample.

(2) The $E^{j+1}_i$ are all $\mathbb{Q}$ -Cartier.

(3) $E_{\Theta}^{j+1}+(r_j-\epsilon)H^{j+1}$ is a $g_{j+1}$ -ample $\mathbb{R}$ -divisor supported on ${\rm{Ex}}(g_{j+1})$ for $0<\epsilon\ll 1$ .

Furthermore, if the MMP terminates with $g_m:X^m\to Y$ , then

(4) $-E^m_\Theta$ is effective, ${\rm{supp}} E^m_\Theta = g_m^{-1}\bigl(g_m({\rm{supp}} E^m_\Theta)\bigr)$ , and

(5) if $E^m_\Theta$ is effective and ${\rm{supp}} E_\Theta = E$ , then $X^m = Y$ .

Remark 2.6. In applications the following are the key points:

(a) We avoided the mixed case.

(b) In the fipping case we have both $\phi$ -positive and $\phi$ -negative divisors.

(d) In case (5) we end with $X^m = Y$ (not with an unknown, small modification of $Y$ ).

(e) In case (5) the last MMP step is a divisorial contraction, giving what [35] calls a Kollár component; no further flips needed.

Proof. Assertions (1-3) concern only one MMP-step, so we may as well drop the index $j$ and work with the diagram (1.5). Thus assume that $K_X+\Theta+(r+\epsilon)H$ is $g$ -ample, $K_X+\Theta+rH$ is $g$ -nef and it determines the contraction $\phi: X\to Z$ .

Let $N_1(X/Z)$ be the relative cone of curves. The $E_i$ give elements of the dual space $N^1(X/Z)$ . If $C\subset X$ is contracted by $\phi$ then we have a relation

$\sum h_i(E_i\cdot C) = -r^{-1}(E_\Theta\cdot C).$

(2.7)

By Lemma 3 this shows that the $E_i$ are proportional, as functions on $N_1(X/Z)$ . Let $C'$ be another contracted curve; set $e: = (E_\Theta\cdot C)$ and $e': = (E_\Theta\cdot C')$ . Using (2.7) for $C$ and $C'$ , we can eliminate $r$ to get that

$\sum h_i\bigl(e'(E_i\cdot C)-e(E_i\cdot C')\bigr) = 0.$

(2.8)

By the linear independence of the $h_i$ this implies that $e'(E_i\cdot C) = e(E_i\cdot C')$ for every $i$ . That is, all contracted curves are proportional, as functions on $\langle E_1, \dots, E_n\rangle_{\mathbb{R}}\cong \mathbb{R}^n$ . Informally speaking, as far as the $E_i$ are concerned, $N_1(X/Z)$ behaves as if it were 1-dimensional.

Assume first that $\phi$ contracts some divisor, call it $E_1$ . Then $(E_1\cdot C)<0$ for some contracted curve $C\subset E_1$ , hence $(E_1\cdot C')<0$ for every contracted curve $C'$ . Thus ${\rm{Ex}}(\phi_0) = E_1$ . We also know that

$\phi_*(E_\Theta+rH) = \sum_{i > 1} (e_i+rh_i) \phi_*(E_i)$

is $\mathbb{R}$ -Cartier on $Z$ and $Z/Y$ -ample, where $r$ is computed by (2.7). So, by Lemma 4, the $\{e_i+rh_i: i>1\}$ are linearly independent over $\mathbb{Q}$ , hence the $\phi_*(E_i)$ are $\mathbb{Q}$ -Cartier on $Z$ by Lemma 5. Thus $\phi_*(E_\Theta) = \sum_{i>1} e_i \phi_*(E_i)$ is $\mathbb{R}$ -Cartier, hence $X^1 = Z$ . This proves (2-3) in the divisorial contraction case.

Otherwise $\phi$ is small, let $C$ be a contracted curve. Since $(H\cdot C)>0$ , we get that $(E_1\cdot C)<0$ for some $E_1$ . So $C\subset E_1$ . By [22,3.39] $E_\Theta+rH$ is anti-effective and

$g^{-1}\bigl(g({\rm{supp}}(E_\Theta+rH))\bigr) = {\rm{supp}}(E_\Theta+rH).$

(2.9)

If $E_1$ has coefficient 0 in $E_\Theta+rH$ then let $C_1\subset E_1$ be any curve contracted by $g$ and not contained in the other $E_i$ for $i>1$ . Then $C_1$ is disjoint from ${\rm{supp}}(E_\Theta+rH)$ by (2.9), hence $(C_1\cdot E_\Theta+rH) = 0$ . Varying $C_1$ shows that $E_1$ is contracted by $\phi$ , a contradiction.

Thus $E_1$ appears in $E_\Theta+rH$ with negative coefficient, contributing a positive term to the intersection $\bigl((E_\Theta+rH)\cdot C\bigr) = 0$ . So there is another divisor $E_2\subset {\rm{Ex(}}g{\rm{)}}$ such that $(E_2\cdot C)>0$ . This shows (1.b).

Assume next that the flip $\phi^+:X^+\to Z$ exists. Since $\phi^+$ is small, ${\rm{supp}}(E^+_\Theta+rH^+)$ contains all $X^+/Y$ -exceptional divisors. In particular, $E^+_\Theta+(r-\epsilon)H^+$ is still anti-effective for $0<\epsilon\ll 1$ . By definition $E^+_\Theta+(r-\epsilon)H^+$ is $X^+/Y$ -ample and its support is the whole $X^+/Y$ -exceptional locus. Thus we also have (2-3) in the flipping case.

Finally, if the MMP terminates with $g_m:X^m\to Y$ then $E^m_\Theta$ is a $g_m$ -nef, exceptional $\mathbb{R}$ -divisor. Thus $-E^m_\Theta$ is effective and ${\rm{supp}} E^m_\Theta = g_m^{-1}\bigl(g_m({\rm{supp}} E^m_\Theta)\bigr)$ by [22,3.39], proving (4). In case (5) this implies that ${\rm{Ex(}}{g_m}{\rm{)}}$ does not contain any divisor, but, by (3) it supports a $g_m$ -ample divisor. Thus $\dim {\rm{Ex(}}{g_m}{\rm{)}} = 0$ , hence $X^m = Y$ .

Lemma 3. Let $V$ be a $K$ -vectorspace with vectors $v_i\in V$ . Let $L/K$ be a field extension and $h_1, \dots, h_n\in L$ linearly independent over $K$ . Assume that

$\sum\nolimits_{i = 1}^n h_iv_i = \gamma v_0 ~~{for ~~some}~~ \gamma\in L.$

Then $\dim_K \langle v_1, \dots, v_n\rangle \leq 1$ .

Proof. We may assume that $\dim V = 2$ . Choose a basis and write $v_i = (a_i, b_i)$ . Then

$\sum\nolimits_{i = 1}^n h_ia_i = \gamma a_0~~{and}~~\sum\limits_{i = 1}^n h_ib_i = \gamma b_0.$

This gives that

$\sum\nolimits_{i = 1}^n h_i(b_0a_i-a_0b_i) = 0.$

Since the $h_i$ are linearly independent over $K$ , this implies that $b_0a_i-a_0b_i = 0$ for every $i$ . That is $v_i\cdot (b_0, -a_0)^t = 0$ for every $i$ .

Lemma 4. Let $L/K$ be a field extension and $h_0, \dots, h_n\in L$ linearly independent over $K$ . Let $\gamma^{-1} = \sum_{i = 0}^n r_ih_i$ for some $r_i\in K$ with $r_0\neq 0$ . Then, for any $e_i\in K$ , the $e_1+\gamma h_1, \dots, e_n+ \gamma h_n$ are linearly independent over $K$ .

Proof. Assume that $\sum_{i = 1}^n s_i (e_i+\gamma h_i) = 0$ , where $s_i\in K$ . It rearranges to

$\sum\nolimits_{i = 1}^n s_i h_i = -\bigl(\sum\nolimits_{i = 1}^n s_i e_i\bigr)\cdot \sum\nolimits_{i = 0}^n r_ih_i.$

If $\sum_{i = 1}^n s_i e_i = 0$ then the $s_1, \dots, s_n$ are all zero since the $h_i$ are linearly independent over $K$ . Otherwise we get a contradiction since the coefficient of $h_0$ is nonzero.

The following is a slight modifications of [3,Lem.1.5.1]; see also [17,5.3].

Lemma 5. Let $X$ be a normal scheme, $D_i$ $\mathbb{Q}$ -divisors and $d_1, \dots, d_n\in \mathbb{R}$ linearly independent over $\mathbb{Q}$ . Then $\sum d_iD_i$ is $\mathbb{R}$ -Cartier iff each $D_i$ is $\mathbb{Q}$ -Cartier.

Comments on $\mathbb{Q}$ -factoriality. Theorem 2 may sound unexpected from the MMP point of view, but it is quite natural if one starts with the following conjecture, which is due, in various forms, to Srinivas and myself, cf. [26].

Conjecture 6. Let $X$ be a normal variety, $x\in X$ a closed point and $\{D^X_i: i\in I\}$ a finite set of prime divisors on $X$ . Then there is a normal variety $Y$ , a closed point $y\in Y$ and prime divisors $\{D^Y_i: i\in I\}$ on $Y$ such that the following hold.

(1) The class group of the local ring $\mathcal{O}_{y, Y}$ is generated by $K_Y$ and the $D^Y_i$ .

(2) The completion of $( X, \sum D^X_i)$ at $x$ is isomorphic to the completion of $( Y, \sum D^Y_i)$ at $y$ .

Using [30,Tag 0CAV] one can reformulate (6.2) as a finite type statement:

(3) There are elementary étale morphisms

$(x, X, \sum D^X_i) \leftarrow (u, U, \sum D^U_i) \to (y, Y, \sum D^Y_i).$

Almost all resolution methods commute with étale morphisms, thus if we want to prove something about a resolution of $X$ , it is likely to be equivalent to a statement about resolutions of $Y$ . In particular, if something holds for the $\mathbb{Q}$ -factorial case, it should hold in general. This was the reason why I believed that Theorem 2 should work out.

A positive answer to Conjecture 6 (for $I = \emptyset$ ) is given for isolated complete intersections in [26] and for normal surface singularities in [27].

(Note that [27] uses an even stronger formulation: Every normal, analytic singularity has an algebraization whose class group is generated by the canonical class. This is, however, not true, since not every normal, analytic singularity has an algebraization.)

Existence of certain resolutions.

7 (The assumptions 2.i-v). In most applications of Theorem 2 we start with a normal pair $(Y, \Delta_Y)$ where $\Delta_Y$ is a boundary, and want to find $g:X\to Y$ and $\Theta$ that satisfy the conditions (2.i-v).

Typically we choose a log resolution $g:X\to (Y, \Delta_Y)$ . That is, $g$ is birational, $X$ is regular, $\Delta_X: = g^{-1}_*\Delta_Y$ , $E = {\rm{Ex(}}g{\rm{)}}$ and $E+\Delta_X$ is a simple normal crossing divisor. Then we choose $\Delta_X\leq \Theta\leq E+\Delta_X$ ; that is, we are free to choose the coefficients of the $E_i$ in $[0, 1]$ . Then (2.i) holds and if $K_Y+\Delta_Y$ is $\mathbb{R}$ -Cartier then so does (2.ii). There are also situations where one can use the theorem to show that numerical equivalence in (2.ii) implies $\mathbb{R}$ -linear equivalence; see [6,9.12].

We want $K_X+\Theta +rH$ to be $g$ -ample for some $r$ , which is easiest to achieve if $H$ is $g$ -ample. Thus we would like $H$ to be $g$ -ample and $g$ -exceptional for (2.iii-iv) to hold. If $X$ is regular (or at least $\mathbb{Q}$ -factorial) then we can wiggle the coefficients of $H$ to achieve (2.v).

The existence of a $g$ -ample and $g$ -exceptional $\mathbb{Q}$ -divisor is somewhat subtle, we discuss it next.

8 (Ample, exceptional divisors). Assume that we blow up an ideal sheaf $I\subset \mathcal{O}_Y$ to get $\pi_1:Y_1\to Y$ . The constant sections of $\mathcal{O}_Y$ give an isomorphism $\mathcal{O}_{Y_1}(1)\cong \mathcal{O}_{Y_1}(-E_1)$ where $E_1$ is supported on $\pi_1^{-1}{\rm{supp}} (\mathcal{O}_Y/I)$ . Thus, if $Y$ is normal and ${\rm{supp}} (\mathcal{O}_Y/I)$ has codimension $\geq 2$ , then $E_1$ is $\pi_1$ -ample and $\pi_1$ -exceptional. A composite of such blow-ups also has an ample, exceptional $\mathbb{Q}$ -divisor. Since Hironaka-type resolutions use only such blow-ups, we get the following. (See [32] for the most general case and [18] for an introduction.)

Claim 8.1. Let $Y$ be a Noetherian, quasi-excellent scheme over a field of characteristic zero. Then any proper, birational $Y'\to Y$ is dominated by a log resolution $g:X\to Y$ that has a $g$ -ample and $g$ -exceptional $\mathbb{Q}$ -divisor.

Resolution of singularities is also known for 3-dimensional excellent schemes [10], but in its original form it does not guarantee projectivity in general. Nonetheless, combining [6,2.7] and [23,Cor.3] we get the following.

Claim 8.2. Let $Y$ be a normal, integral, quasi-excellent scheme of dimension at most three that is separated and of finite type over an affine, quasi-excellent scheme $S$ . Then any proper, birational $Y'\to Y$ is dominated by a log resolution $g:X\to Y$ that has a $g$ -ample and $g$ -exceptional $\mathbb{Q}$ -divisor.

2. Applications

Next we mention some applications. In each case we use Theorem 2 to modify the previous proofs to get more general results. We give only some hints as to how this is done, we refer to the original papers for definitions and details of proofs.

The first two applications are to dlt 3-folds. In both cases Theorem 2 allows us to run MMP in a way that works in every characteristic and also for bases that are not $\mathbb{Q}$ -factorial.

Relative MMP for dlt 3-folds.

Theorem 9. Let $(Y, \Delta)$ be a 3-dimensional, normal, Noetherian, excellent pair such that $K_Y+\Delta$ is $\mathbb{R}$ -Cartier and $\Delta$ is a boundary. Let $g:X\to Y$ be a log resolution with exceptional divisor $E = \sum E_i$ . Assume that $E$ supports a $g$ -ample $\mathbb{R}$ -divisor $H$ (we can then choose its coefficients sufficiently general).

Then the MMP over $Y$ , starting with $(X^0, \Theta^0): = (X, E+g^{-1}_*\Delta)$ with scaling of $H$ runs and terminates with a minimal model $g_m:(X^m, \Theta^m)\to Y$ . Furthermore,

(1) each step $X^i ⇢ X^{i+1}$ of this MMP is

(a) either a contraction $\phi_i:X^i\to X^{i+1}$ , whose exceptional set is an irreducible component of $E^i$ ,

(b) or a flip $X^i\stackrel{\phi_i}{\longrightarrow}Z^i \stackrel{\psi_i}{\longleftarrow}(X^i)^+ = X^{i+1}$ , and there are irreducible components $E^i_{i_1}, E^i_{i_2}$ such that $E^i_{i_1},$ and $-E^i_{i_2}$ are both $\phi_i$ -ample,

(2) ${\rm{Ex(}}{g_m}{\rm{)}}$ supports a $g_m$ -ample $\mathbb{R}$ -divisor, and

(3) if either $(Y, \Delta)$ is plt, or $(Y, \Delta)$ is dlt and $g$ is thrifty [20,2.79], then $X^m = Y$ .

Proof. Assume first that the MMP steps exist and the MMP terminates. Note that

$\begin{array}{rll} K_X+E+g^{-1}_*\Delta& \sim_{\mathbb{R}} &g^*(K_Y+\Delta)+\sum\limits_j\bigl(1+a(E_j, Y, \Delta)\bigr) E_j\\ &\sim_{g, \mathbb{R}} & \sum\limits_j\bigl(1+a(E_j, Y, \Delta)\bigr) E_j = :E_\Theta. \end{array}$

We get from Theorem 2 that (1.a-b) are the possible MMP-steps, and (2-3) from Theorem 15-5.

For existence and termination, all details are given in [6,9.12].

However, I would like to note that we are in a special situation, which can be treated with the methods that are in [1,29], at least when the closed points of $Y$ have perfect residued fields

The key point is that everything happens inside $E$ . We can thus understand the whole MMP by looking at the 2-dimensional scheme $E$ . This is easiest for termination, which follows from [1,Sec.7].

Contractions for reducible surfaces have been treated in [1,Secs.11-12], see also [12,Chap.6] and [31].

The presence of $E^i_{i_1}, E^i_{i_2}$ means that the flips are rather special; called 1-complemented flips in [29] and easy flips in [1,Sec.20]. I believe that the methods of [1,29] prove the existence of 1-complemented 3-fold flips in our case; but the details have not been written down.

The short note [34] explains how [15,3.4] gives 1-complemented 3-fold flips; see [16,3.1 and 4.3] for stronger results.

Inversion of adjunction for 3-folds. Using Theorem 9 we can remove the $\mathbb{Q}$ -factoriality assumption from [15,Cor.1.5]. The characteristic 0 case, in all dimensions, was proved in [1,17.6],

Corollary 10. Let $(X, S+\Delta)$ be a 3-dimensional, normal, Noetherian, excellent pair. Assume that $X$ is normal, $S$ is a reduced divisor, $\Delta$ is effective and $K_X+ S+\Delta$ is $\mathbb{R}$ -Cartier. Let $\bar S\to S$ denote the normalization. Then $(\bar S, {\rm{Diff}}_{\bar S}\Delta)$ is klt iff $(X, S+\Delta)$ is plt near $S$ .

This implies that one direction of Reid's classification of terminal singularities using 'general elephants' [28,p.393] works in every characteristic. This could be useful in extending [2] to characteristics $\geq 5$ .

Corollary 11. Let $(X, S)$ be a 3-dimensional pair. Assume that $X$ is normal, $K_X+ S$ is Cartier, $X$ and $S$ have only isolated singularities, and the normalization $\bar S$ of $S$ has canonical singularities. Then $X$ has terminal singularities in a neighborhood of $S$ .

Divisor class group of dlt singularities. The divisor class group of a rational surface singularity is finite by [24], and [8] plus an easy argument shows that the divisor class group of a rational 3-dimensional singularity is finitely generated. Thus the divisor class group of a 3-dimensional dlt singularity is finitely generated in characteristic $\geq 7$ , using [4,Cor.1.3]. Theorem 9 leads—via [21,B.14]—to the following weaker result, which is, however, optimal in characteristics $2, 3, 5$ ; see [9] for an application.

Proposition 12. [21,B.1] Let $(y, Y, \Delta)$ be a 3-dimensional, Noetherian, excellent, dlt singularity with residue characteristic $p>0$ . Then the prime-to- $p$ parts of ${\rm{Cl}}(Y), {\rm{Cl}}(Y^{\rm h})$ and of ${\rm{Cl}}(\hat Y)$ are finitely generated, where $Y^{\rm h}$ denotes the henselisation and $\hat Y$ the completion.

It seems reasonable to conjecture that the same holds in all dimensions, see [21,B.6].

Grauert-Riemenschneider vanishing. One can prove a variant of the Grauert-Riemenschneider (abbreviated as G-R) vanishing theorem [13] by following the steps of the MMP.

Definition 13 (G-R vanishing). Let $(Y, \Delta_Y)$ be a pair, $Y$ normal, $\Delta_Y$ a boundary (that is, all coefficients are in $[0, 1]$ ) and $g:X\to Y$ a proper, birational morphism with $X$ normal. For an $\mathbb{R}$ -divisor $F$ on $X$ let ${\rm{Ex}}(F)$ denote its $g$ -exceptional part. Assume that $Y$ has a dualizing complex. We say that G-R vanishing holds for $g:X\to (Y, \Delta_Y)$ if the following is satisfied.

Let $D$ be a $\mathbb{Z}$ -divisor and $\Delta_X$ an effective $\mathbb{R}$ -divisor on $X$ . Assume that

(1) $D\sim_{g,\mathbb{R}} K_{X}+\Delta_X$ , and

(2) $g_*\Delta_X\leq \Delta_Y$ , $\left\lfloor\operatorname{Ex}\left(\Delta_{X}\right)\right\rfloor = 0$ .

Then $R^ig_*\mathcal{O}_{X}(D) = 0$ for $i>0$ .

We say that G-R vanishing holds over $(Y, \Delta_Y)$ if G-R vanishing holds for every log resolution $g:(X, E+g^{-1}_*\Delta_Y)\to (Y, \Delta_Y)$ .

By an elementary computation, if $X$ is regular, $W\subset X$ is regular and G-R vanishing holds for $X\to Y$ then it also holds for the blow-up $B_WX\to Y$ . This implies that if G-R vanishing holds for one log resolution of $(Y, \Delta_Y)$ , then it holds for every log resolution; see [5,Sec.1.3].

If $Y$ is essentially of finite type over a field of characteristic 0, then G-R vanishing is a special case of the general Kodaira-type vanishing theorems; see [22,2.68].

G-R vanishing also holds over 2-dimensional, excellent schemes by [24]; see [20,10.4]. In particular, if $Y$ is any normal, excellent scheme, then the support of $R^ig_*\mathcal{O}_{X}(D) = 0$ has codimension $\geq 3$ for $i>0$ .

However, G-R vanishing fails for 3-folds in every positive characteristic, as shown by cones over surfaces for which Kodaira's vanishing fails. Thus the following may be the type of G-R vanishing result that one can hope for.

Theorem 14. [5] Let $Y$ be a 3-dimensional, excellent, dlt pair with a dualizing complex. Assume that closed points of $Y$ have perfect residue fields of characteristic $\neq 2, 3, 5$ . Then G-R vanishing holds over $Y$ .

Proof. Let $(Y, \Delta_Y)$ be a 3-dimensional, dlt pair, and $g:X\to Y$ a log resolution. With $D$ as in Definition 13 we need to show that $R^jg_*\mathcal{O}_{X}(D) = 0$ for $j>0$ . Let $g_i:X^i\to Y$ be the MMP steps as in Theorem 9. The natural idea would be to show that the sheaves $R^j(g_i)_*\mathcal{O}_{X^i}(D^i)$ are independent of $i$ . At the end then we have an isomorphism $g_m:X^m\cong Y$ , hence $R^j(g_m)_*\mathcal{O}_{X^m}(D^m) = 0$ for $j>0$ .

A technical problem is that we seem to need various rationality properties of the singularities of the $X^i$ . Therefore, we show instead that, if G-R vanishing holds over $X^i$ and $X^i$ satisfies (15.1-3), then G-R vanishing also holds over $X^{i+1}$ . Then Theorem 15 gives that $X^{i+1}$ also satisfies (15.1-3), and the induction can go ahead.

For divisorial contractions $X^i\to X^{i+1}$ with exceptional divisor $S$ this is straightfoward, the method of [14,Sec.3] shows that if Kodaira vanishing holds for $S$ then G-R vanishing holds for $X^i\to X^{i+1}$ . This is where the ${\rm{char}} \neq 2, 3, 5$ assumption is used: Kodaira vanishing can fail for del Pezzo surfaces if ${\rm{char}} = 2, 3, 5$ ; see [4].

For flips $X^i\to Z^i\leftarrow X^{i+1}$ the argument works in any characteristic. First we show as above that G-R vanishing holds over $Z^i$ . Going to $X^{i+1}$ is a spectral sqeuence argument involving $\psi_i:X^{i+1}\to Z^i$ . For 3-folds the only nontrivial term is $R^1(\psi_i)_*\mathcal{O}_{X^{i+1}}(D^{i+1})$ , and no unexpected cancellations occur; see [5,Lem.21].

From G-R vanishing one can derive various rationality properties for all excellent dlt pairs. This can be done by following the method of 2 spectral sequences as in [19] or [20,7.27]; see [5] for an improved version.

Theorem 15. [5] Let $(X, \Delta)$ be an excellent dlt pair such that G-R vanishing and resolution of singularities hold over $(X, \Delta)$ . Then

(1) $X$ has rational singularities.

(2) Every irreducible component of $\left\lfloor \Delta \right\rfloor$ is normal and has rational singularities.

(3) Let $D$ be a $\mathbb{Z}$ -divisor on $X$ such that $D+ \Delta_D$ is $\mathbb{R}$ -Cartier for some $0\leq \Delta_D\leq \Delta$ . Then $\mathcal{O}_X(D)$ is CM.

See [5,12] for the precise resolution assumptions needed. The conclusions are well known in characteristic 0, see [22,5.25], [12,Sec.3.13] and [20,7.27]. For 3-dimensional dlt varieties in ${\rm{char}} \geq 7$ , the first claim was proved in [4,14].

The next two applications are in characteristic 0.

Dual complex of a resolution. Our results can be used to remove the $\mathbb{Q}$ -factoriality assumption from [11,Thm.1.3]. We refer to [11] for the definition of a dual complex and the notion of collapsing of a regular cell complex. We start with the weaker form, Corollary 16, and then state and outline the proof of the stronger version, Theorem 17.

Corollary 16. Let $(Y, \Delta)$ be a dlt variety over field of characteristic 0 and $g:X\to Y$ a thrifty log resolution whose exceptional set supports a $g$ -ample divisor. For a closed point $y\in Y$ let $E_y\subset g^{-1}(y)$ denote the divisorial part. Then ${\mathcal D}(E_y)$ is collapsible to a point (or it is empty).

Theorem 17. Let $(Y, \Delta)$ be a dlt variety over field of characteristic 0 and $g:X\to Y$ a projective, birational morphism with exceptional set $E = \cup_i E_i$ . For $y\in Y$ let $E_y\subset g^{-1}(y)$ denote the divisorial part. Assume that

(1) $(X, E+g^{-1}_*\Delta)$ is dlt and the $E_i$ are $\mathbb{Q}$ -Cartier.

(2) $a(E_i, Y, \Delta)>-1$ for every $i$ .

(3) $E$ supports a $g$ -ample divisor.

Then ${\mathcal D}(E_y)$ is collapsible to a point (or it is empty).

Proof. Fix $y\in Y$ . We may assume that $(y, Y)$ is local and, after passing to an elementary étale neighborhood (cf. [30,Tag 02LD]) of $y\in Y$ , we may also assume that $g^{-1}(y)\cap E_i$ is connected for every irreducible exceptional divisor $E_i$ (cf. [30,Tag 04HF]).

Let us now run the $(X, E+g^{-1}_*\Delta)$ -MMP with scaling of a $g$ -ample $\mathbb{R}$ -divisor $H$ that is supported on $E$ and has sufficiently general coefficients. Theorem 2 applies, as we observed during the proof of Theorem 9.

Note that ${\mathcal D}(E_y)\subset {\mathcal D}(E)$ is a full subcomplex (that is, a simplex is in ${\mathcal D}(E_y)$ iff all of its vertices are), hence an elementary collapse of ${\mathcal D}(E)$ induces an elementary collapse (or an isomorphism) on ${\mathcal D}(E_y)$ . Thus it is enough to show that ${\mathcal D}(E)$ is collapsible to a point (or it is empty).

We claim that each MMP-step as in Theorem 2 induces either a collapse or an isomorphism of ${\mathcal D}(E)$ .

By [11,Thm.19] we get an elementary collapse (or an isomorphism) if there is a divisor $E^j_i\subset E^j$ that has positive intersection with the $\phi_j$ -contracted curves. This takes care of flips by Theorem 9.b and most divisorial contractions.

It remains to deal with the case when we contract $E^j_\ell\subset E^j$ and every other $E^j_i$ has 0 intersection number with the contracted curves. Thus $E^j_i\cap E^j_\ell$ is either empty or contains $g_j^{-1}(y)\cap E^j_\ell$ . Thus the link of $E^j_\ell$ in ${\mathcal D}(E^j)$ is a simplex and removing it is a sequence of elementary collapses.

Dlt modifications of algebraic spaces. By [25], a normal, quasi-projective pair $(X, \Delta)$ (over a field of characteristic 0) has both dlt and lc modifications if $K_X+\Delta$ is $\mathbb{R}$ -Cartier. (See [20,Sec.1.4] for the definitions.) The lc modification is unique and commutes with étale base change, hence local lc modifications automatically glue to give the same conclusion if $X$ is an algebraic space.

However, dlt modifications are rarely unique, thus it was not obvious that they exist when the base is not quasi-projective. [33] observed that Theorem 2 gives enough uniqueness to allow for gluing. This is not hard when $X$ is a scheme, but needs careful considerations to work for algebraic spaces.

Theorem 18 (Villalobos-Paz). Let $X$ be a normal algebraic space of finite type over a field of characteristic 0, and $\Delta$ a boundary $\mathbb{R}$ -divisor on $X$ . Assume that $K_X+\Delta$ is $\mathbb{R}$ -Cartier. Then $(X, \Delta)$ has a modification $g: (X^{\rm dlt}, \Delta^{\rm dlt})\to (X, \Delta)$ such that

(1) $(X^{\rm dlt}, \Delta^{\rm dlt})$ is dlt,

(2) $K_{X^{\rm dlt}}+ \Delta^{\rm dlt}$ is $g$ -nef,

(3) $g_*\Delta^{\rm dlt} = \Delta$ , and

(4) $g$ is projective.

$X^{\rm dlt}$ is not unique, and we can choose

(5) either $X^{\rm dlt}$ to be $\mathbb{Q}$ -factorial, or ${\rm{Ex(}}g{\rm{)}}$ to support a $g$ -ample $\mathbb{Q}$ -divisor.

Acknowledgments

I thank E. Arvidsson, F. Bernasconi, J. Carvajal-Rojas, J. Lacini, A. Stäbler, D. Villalobos-Paz, C. Xu for helpful comments and J. Witaszek for numerous e-mails about flips.

References

[1]	S. Mei, Y. D. Wang, G. J. Wen, Automatic fabric defect detection with a multi-scale convolutional denoising autoencoder network model, Sensors, 18 (2018), 1064. http://doi.org/10.3390/S18041064 doi: 10.3390/S18041064
[2]	Z. Q. He, Q. F. Liu, Deep regression neural network for industrial surface defect detection, IEEE Access, 8 (2020), 35583–35591. http://doi.org/10.1109/ACCESS.2020.2975030 doi: 10.1109/ACCESS.2020.2975030
[3]	J. X. Luo, Z. Y. Yang, S. P. Li, Y. Wu, FPCB surface defect detection: a decoupled two-stage object detection framework, IEEE Trans. Instrum. Meas., 70 (2021). http://doi.org/10.1109/TIM.2021.3092510 doi: 10.1109/TIM.2021.3092510
[4]	L. H. Shao, E. R. Zhang, Q. R. Ma, M. Li, Pixel-wise semisupervised fabric defect detection method combined with multitask mean teacher, IEEE Trans. Instrum. Meas., 71 (2022). http://doi.org/10.1109/TIM.2022.3162286 doi: 10.1109/TIM.2022.3162286
[5]	M. Q. Chen, L. J. Yu, C. Zhi, R. Sun, S. Zhu, Z. Gao, et al., Improved faster R-CNN for fabric defect detection based on Gabor filter with genetic algorithm optimization, Comput. Ind., 134 (2022). http://doi.org/10.1016/j.compind.2021.103551 doi: 10.1016/j.compind.2021.103551
[6]	J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 27–30. http://doi.org/10.1109/CVPR.2016.91
[7]	J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 21–26. http://doi.org/10.1109/CVPR.2017.690
[8]	J. Redmon, A. Farhadi, YOLOv3: An incremental improvement, preprint, arXiv: 180402767.
[9]	A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao, YOLOv4: Optimal speed and accuracy of object detection, preprint, arXiv: 200410934.
[10]	X. H. Qian, X. Wang, S. Y. Yang, J. Lei, LFF-YOLO: A YOLO algorithm with lightweight feature fusion network for multi-scale defect detection, IEEE Access, 10 (2022), 130339–130349. http://doi.org/10.1109/ACCESS.2022.3227205 doi: 10.1109/ACCESS.2022.3227205
[11]	N. Yang, W. Guo, Application of improved YOLOv5 model for strip surface defect detection, in 2022 Global Reliability and Prognostics and Health Management (PHM-Yantai), (2022), 1–5. http://doi.org/10.1109/PHM-Yantai55411.2022.9942194
[12]	Y. Wan, H. Y. Wang, Z. H. Xin, Efficient detection model of steel strip surface defects based on YOLO-V7, IEEE Access, 10 (2022), 133936–133944. http://doi.org/10.1109/ACCESS.2022.3230894 doi: 10.1109/ACCESS.2022.3230894
[13]	X. Wang, K. Zhuang, An improved YOLOX method for surface defect detection of steel strips, in 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), (2022), 152–157. http://doi.org/10.1109/ICPECA56706.2023.10075827
[14]	C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, et al., YOLOv6: A single-stage object detection framework for industrial applications, preprint, arXiv: 220902976.
[15]	C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7464–7475. http://doi.org/10.48550/arXiv.2207.02696
[16]	F. Akhyar, C. Y. Lin, K. Muchtar, T. Y. Wu, H. F. Ng, High efficient single-stage steel surface defect detection, in 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), (2019), 18–21. http://doi.org/10.1109/AVSS.2019.8909834
[17]	V. Nath, C. Chattopadhyay, S2D2Net: An improved approach for robust steel surface defects diagnosis with small sample learning, in IEEE International Conference on Image Processing (ICIP), (2021), 1199–1203. http://doi.org/10.26599/TST.2018.9010090
[18]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems, 30 (2017). http://doi.org/10.1109/ICIP42928.2021.9506405
[19]	W. Yu, P. Zhou, S. Yan, X. Wang, Inceptionnext: When inception meets convnext, preprint, arXiv: 230316900.
[20]	Y. Liu, Z. Shao, N. Hoffmann, Global attention mechanism: Retain information to enhance channel-spatial interactions, preprint, arXiv: 211205561.

This article has been cited by:

Balázs Csutak, Gábor Szederkényi, Robust control and data reconstruction for nonlinear epidemiological models using feedback linearization and state estimation, 2025, 22, 1551-0018, 109, 10.3934/mbe.2025006

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)