
Classifying and identifying surface defects is essential during the production and use of aluminum profiles. Recently, the dual-convolutional neural network(CNN) model fusion framework has shown promising performance for defects classification and recognition. Spurred by this trend, this paper proposes an improved dual-CNN model fusion framework to classify and identify defects in aluminum profiles. Compared with traditional dual-CNN model fusion frameworks, the proposed architecture involves an improved fusion layer, fusion strategy, and classifier block. Specifically, the suggested method extracts the feature map of the aluminum profile RGB image from the pre-trained VGG16 model's pool5 layer and the feature map of the maximum pooling layer of the suggested A4 network, which is added after the Alexnet model. then, weighted bilinear interpolation unsamples the feature maps extracted from the maximum pooling layer of the A4 part. The network layer and upsampling schemes ensure equal feature map dimensions ensuring feature map merging utilizing an improved wavelet transform. Finally, global average pooling is employed in the classifier block instead of dense layers to reduce the model's parameters and avoid overfitting. The fused feature map is then input into the classifier block for classification. The experimental setup involves data augmentation and transfer learning to prevent overfitting due to the small-sized data sets exploited, while the K cross-validation method is employed to evaluate the model's performance during the training process. The experimental results demonstrate that the proposed dual-CNN model fusion framework attains a classification accuracy higher than current techniques, and specifically 4.3% higher than Alexnet, 2.5% for VGG16, 2.9% for Inception v3, 2.2% for VGG19, 3.6% for Resnet50, 3% for Resnet101, and 0.7% and 1.2% than the conventional dual-CNN fusion framework 1 and 2, respectively, proving the effectiveness of the proposed strategy.
Citation: Xiaochen Liu, Weidong He, Yinghui Zhang, Shixuan Yao, Ze Cui. Effect of dual-convolutional neural network model fusion for Aluminum profile surface defects classification and recognition[J]. Mathematical Biosciences and Engineering, 2022, 19(1): 997-1025. doi: 10.3934/mbe.2022046
[1] | Yan Ling Fu, Wei Zhang . Some results on frames by pre-frame operators in Q-Hilbert spaces. AIMS Mathematics, 2023, 8(12): 28878-28896. doi: 10.3934/math.20231480 |
[2] | Gang Wang . Some properties of weaving K-frames in n-Hilbert space. AIMS Mathematics, 2024, 9(9): 25438-25456. doi: 10.3934/math.20241242 |
[3] | Sergio Verdú . Relative information spectra with applications to statistical inference. AIMS Mathematics, 2024, 9(12): 35038-35090. doi: 10.3934/math.20241668 |
[4] | Ligong Wang . Output statistics, equivocation, and state masking. AIMS Mathematics, 2025, 10(6): 13151-13165. doi: 10.3934/math.2025590 |
[5] | Cure Arenas Jaffeth, Ferrer Sotelo Kandy, Ferrer Villar Osmin . Functions of bounded (2,k)-variation in 2-normed spaces. AIMS Mathematics, 2024, 9(9): 24166-24183. doi: 10.3934/math.20241175 |
[6] | Chibueze C. Okeke, Abubakar Adamu, Ratthaprom Promkam, Pongsakorn Sunthrayuth . Two-step inertial method for solving split common null point problem with multiple output sets in Hilbert spaces. AIMS Mathematics, 2023, 8(9): 20201-20222. doi: 10.3934/math.20231030 |
[7] | Osmin Ferrer Villar, Jesús Domínguez Acosta, Edilberto Arroyo Ortiz . Frames associated with an operator in spaces with an indefinite metric. AIMS Mathematics, 2023, 8(7): 15712-15722. doi: 10.3934/math.2023802 |
[8] | Abdullah Ali H. Ahmadini, Amal S. Hassan, Ahmed N. Zaky, Shokrya S. Alshqaq . Bayesian inference of dynamic cumulative residual entropy from Pareto Ⅱ distribution with application to COVID-19. AIMS Mathematics, 2021, 6(3): 2196-2216. doi: 10.3934/math.2021133 |
[9] | Messaoud Bounkhel . V-Moreau envelope of nonconvex functions on smooth Banach spaces. AIMS Mathematics, 2024, 9(10): 28589-28610. doi: 10.3934/math.20241387 |
[10] | Jamilu Adamu, Kanikar Muangchoo, Abbas Ja'afaru Badakaya, Jewaidu Rilwan . On pursuit-evasion differential game problem in a Hilbert space. AIMS Mathematics, 2020, 5(6): 7467-7479. doi: 10.3934/math.2020478 |
Classifying and identifying surface defects is essential during the production and use of aluminum profiles. Recently, the dual-convolutional neural network(CNN) model fusion framework has shown promising performance for defects classification and recognition. Spurred by this trend, this paper proposes an improved dual-CNN model fusion framework to classify and identify defects in aluminum profiles. Compared with traditional dual-CNN model fusion frameworks, the proposed architecture involves an improved fusion layer, fusion strategy, and classifier block. Specifically, the suggested method extracts the feature map of the aluminum profile RGB image from the pre-trained VGG16 model's pool5 layer and the feature map of the maximum pooling layer of the suggested A4 network, which is added after the Alexnet model. then, weighted bilinear interpolation unsamples the feature maps extracted from the maximum pooling layer of the A4 part. The network layer and upsampling schemes ensure equal feature map dimensions ensuring feature map merging utilizing an improved wavelet transform. Finally, global average pooling is employed in the classifier block instead of dense layers to reduce the model's parameters and avoid overfitting. The fused feature map is then input into the classifier block for classification. The experimental setup involves data augmentation and transfer learning to prevent overfitting due to the small-sized data sets exploited, while the K cross-validation method is employed to evaluate the model's performance during the training process. The experimental results demonstrate that the proposed dual-CNN model fusion framework attains a classification accuracy higher than current techniques, and specifically 4.3% higher than Alexnet, 2.5% for VGG16, 2.9% for Inception v3, 2.2% for VGG19, 3.6% for Resnet50, 3% for Resnet101, and 0.7% and 1.2% than the conventional dual-CNN fusion framework 1 and 2, respectively, proving the effectiveness of the proposed strategy.
In [13,14,15,16,19], it was proposed that insight into a probability distribution, μ, posed on a Hilbert space, H, could be obtained by finding a best fit Gaussian approximation, ν. This notion of best, or optimal, was with respect to the relative entropy, or Kullback-Leibler divergence:
R(ν||μ)={Eν[logdνdμ],ν≪μ,+∞,otherwise. | (1.1) |
Having a Gaussian approximation provides qualitative insight into μ, as it provides a concrete notion of the mean and variance of the distribution. Additionally, this optimized distribution can be used in algorithms, such as random walk Metropolis, as a preconditioned proposal distribution to improve performance. Such a strategy can benefit a number of applications, including path space sampling for molecular dynamics and parameter estimation in statistical inverse problems.
Observe that in the definition of R, (1.1), there is an asymmetry in the arguments. Were we to work with R(μ||ν), our optimal Gaussian would capture the first and second moments of μ, and in some applications this is desirable. However, for a multimodal problem (consider a distribution with two well separated modes), this would be inadequate; our form attempts to match individual modes of the distribution by a Gaussian. For a recent review of the R(ν||μ) problem, see [4], where it is remarked that this choice of arguments is likely to underestimate the dispersion of the distribution of interest, μ. The other ordering of arguments has been explored, in the finite dimensional case, in [2,3,10,18].
To be of computational use, it is necessary to have an algorithm that will converge to this optimal distribution. In [15], this was accomplished by first expressing ν=N(m,C(p)), where m is the mean and p is a parameter inducing a well defined covariance operator, and then solving the problem,
(m,p)∈argminR(N(m,C(p))||μ), | (1.2) |
over an admissible set. The optimization step itself was done using the Robbins-Monro algorithm (RM), [17], by seeking a root of the first variation of the relative entropy. While the numerical results of [15] were satisfactory, being consistent with theoretical expectations, no rigorous justification for the application of RM to the examples was given.
In this work, we emphasize the study and application of RM to potentially infinite dimensional problems. Indeed, following the framework of [15,16], we assume that μ is posed on the Borel σ-algebra of a separable Hilbert space (H,⟨∙,∙⟩,‖∙‖). For simplicity, we will leave the covariance operator C fixed, and only optimize over the mean, m. Even in this case, we are seeking m∈H, a potentially infinite-dimensional space.
Given the objective function f:H→H, assume that it has a root, x⋆. In our application to relative entropy, f will be its first variation. Further, we assume that we can only observe a noisy version of f, F:H×χ→H, such that for all x∈H,
f(x)=E[F(x,Z)]=∫χF(x,z)μZ(dz), | (1.3) |
where μZ is the distribution associated with the random variable (r.v.) Z, taking values in the auxiliary space χ. The naive Robbins-Monro algorithm is given by
Xn+1=Xn−an+1F(Xn,Zn+1), | (1.4) |
where Zn∼μZ, are independent and identically distributed (i.i.d.), and an>0 is a carefully chosen sequence. Subject to assumptions on f, F, and the distribution μZ, it is known that Xn will converge to x⋆ almost surely (a.s.), in finite dimensions, [5,6,17]. Often, one needs to assume that f grows at most linearly,
‖f(x)‖≤c0+c1‖x‖, | (1.5) |
in order to apply the results in the aforementioned papers. The analysis in the finite dimensional case has been refined tremendously over the years, including an analysis based on continuous dynamical systems. We refer the reader to the books [1,8,11] and references therein.
As noted, much of the analysis requires the regression function f to have, at most, linear growth. Alternatively, an a priori assumption is sometimes made that the entire sequence generated by (1.4) stays in a bounded set. Both assumptions are limiting, though, in practice, one may find that the algorithms converge.
One way of overcoming these assumptions, while still ensuring convergence, is to introduce trust regions that the sequence {Xn} is permitted to explore, along with a "truncation" which enforces the constraint. Such truncations distort (1.4) into
Xn+1=Xn−an+1F(Xn,Zn+1)+an+1Pn+1, | (1.6) |
where Pn+1 is the projection keeping the sequence {Xn} within the trust region. Projection algorithms are also discussed in [1,8,11].
We consider RM on a possibly infinite dimensional separable Hilbert space. This is of particular interest as, in the context of relative entropy optimization, we may be seeking a distribution in a Sobolev space associated with a PDE model. A general analysis of RM with truncations in Hilbert spaces can be found in [20]. The main purpose of this work is to adapt the analysis of [12] to the Hilbert space setting for two versions of the truncated problem. The motivation for this is that the analysis of [12] is quite straightforward, and it is instructive to see how it can be easily adapted to the infinite dimensional setting. The key modification in the proof is that results for Banach space valued martingales must be invoked. We also adapt the results to a version of the algorithm where there is prior knowledge on the location of the root. With these results in hand, we can then verify that the relative entropy minimization problem can be solved using RM.
In some problems, one may have a priori information on the root. For instance, we may know that x⋆∈U1, some open bounded set. In this version of the truncated algorithm, we have two open bounded sets, U0⊊U1, and x⋆∈U1. Let σ0=0 and X0∈U0 be given, then (1.6) can be formulated as
˜Xn+1=Xn−an+1F(Xn,Zn+1) | (1.7a) |
Xn+1={˜Xn+1˜Xn+1∈U1X(σn)0˜Xn+1∉U1 | (1.7b) |
σn+1={σn˜Xn+1∈U1σn+1˜Xn+1∉U1 | (1.7c) |
We interpret ˜Xn+1 as the proposed move, which is either accepted or rejected depending on whether or not it will remain in the trust region. If it is rejected, the algorithm restarts at X(σn)0∈U0. The restart points, {X(σn)0}, may be random, or it may be that X(σn)0=X0 is fixed. The essential property is that the algorithm will restart in the interior of the trust region, away from its boundary. The r.v. σn counts the number of times a truncation has occurred. Algorithm (1.7) can now be expressed as
Xn+1=Xn−an+1F(Xn,Zn+1)+Pn+1Pn+1={X(σn)0−˜Xn+1}1˜Xn+1∉U1. | (1.8) |
In the second version of truncated Robbins-Monro, define the sequence of open bounded sets, Un such that:
U0⊊U1⊊U2⊊…,∪∞n=0Un=H. | (1.9) |
Again, letting X0∈U0, σ0=0, the algorithm is
˜Xn+1=Xn−an+1F(Xn,Zn+1) | (1.10a) |
Xn+1={˜Xn+1˜Xn+1∈UσnX(σn)0˜Xn+1∉Uσn | (1.10b) |
σn+1={σn˜Xn+1∈Uσnσn+1˜Xn+1∉Uσn | (1.10c) |
A consequence of this formulation is that Xn∈Uσn for all n. As before, the restart points may be random or fixed, and they are in U0. This would appear superior to the fixed trust region algorithm, as it does not require knowledge of the sets. However, to guarantee convergence, global (in H) assumptions on the regression function are required; see Assumption 2 below. (1.10) can written with Pn+1 as
Xn+1=Xn−an+1F(Xn,Zn+1)+Pn+1Pn+1={X(σn)0−˜Xn+1}1˜Xn+1∉Uσn | (1.11) |
In Section 2, we state sufficient assumptions for which we are able to prove convergence in both the fixed and expanding trust region problems, and we also establish some preliminary results. In Section 3, we focus on the relative entropy minimization problem, and identify what assumptions must hold for convergence to be guaranteed. Examples are then presented in Section 4, and we conclude with remarks in Section 5.
We first reformulate (1.8) and (1.15) in the more general form
Xn+1=Xn−an+1f(Xn)−an+1δMn+1⏟=˜Xn+1+an+1Pn+1, | (2.1) |
where δMn+1, the noise term, is
δMn+1=F(Xn,Zn+1)−f(Xn)=F(Xn,Zn+1)−E[F(Xn,Zn+1)∣Xn]. | (2.2) |
A natural filtration for this problem is Fn=σ(X0,Z1,…,Zn). Xn is Fn measurable and the noise term can be expressed in terms of the filtration as δMn+1=F(Xn,Zn+1)−E[F(Xn,Zn+1)∣Fn].
We now state our main assumptions:
Assumption 1. f has a zero, x⋆. In the case of the fixed trust region problem, there exist R0<R1 such that
U0⊆BR0(x⋆)⊂BR1(x⋆)⊆U1. |
In the case of the expanding trust region problem, the open sets are defined as Un=Brn(0) with
0<r0<r1<r2<…<rn→∞. | (2.3) |
These sets clearly satisfy (1.9).
Assumption 2. For any 0<a<A, there exists δ>0:
infa≤‖x−x⋆‖≤A⟨x−x⋆,f(x)⟩≥δ. |
In the case of the fixed truncation, this inequality is restricted to x∈U1. This is akin to a convexity condition on a functional F with f=DF.
Assumption 3. x↦E[‖F(x,Z)‖2] is bounded on bounded sets, with the restriction to U1 in the case of fixed trust regions.
Assumption 4. an>0, ∑an=∞, and ∑a2n<∞
Theorem 2.1. Under the above assumptions, for the fixed trust region problem, Xn→x⋆ a.s. and σn is a.s. finite.
Theorem 2.2. Under the above assumptions, for the expanding trust region problem, Xn→x⋆ a.s. and σn is a.s. finite.
Note the distinction between the assumptions in the two algorithms. In the fixed truncation algorithm, Assumptions 2 and 3 need only hold in the set U1, while in the expanding truncation algorithm, they must hold in all of H. While this would seem to be a weaker condition, it requires identification of the sets U0 and U1 for which the assumptions hold. Such sets may not be readily identifiable, as we will see in our examples.
We first need some additional information about f and the noise sequence δMn.
Lemma 2.1. Under Assumption 3, f is bounded on U1, for the fixed trust region problem, and on arbitrary bounded sets, for the expanding trust region problem.
Proof. Trivially,
‖f(x)‖=‖E[F(x,Z)]‖≤E[‖F(x,Z)‖]≤√E[‖F(x,Z)‖2], |
and the results follows from the assumption.
Proposition 2.1. For the fixed trust region problem, let
Mn=n∑i=1aiδMi. |
Alternatively, in the expanding trust region problem, for r>0, let
Mn=n∑i=1aiδMi1‖Xi−1−x⋆‖≤r. |
Under Assumptions 3 and 4, Mn is a martingale, converging in H, a.s.
Proof. The following argument holds in both the fixed and expanding trust region problems, with appropriate modifications. We present the expanding trust region case. The proof is broken up into 3 steps:
1. Relying on Theorem 6 of [7] for Banach space valued martingales, it will be sufficient to show that Mn is a martingale, uniformly bounded in L1(P).
2. In the case of the expanding truncations,
E[‖δMi1‖Xi−1−x⋆‖≤r‖2]≤2E[‖F(Xi−1,Zi)1‖Xi−1−x⋆‖≤r‖2]+2E[‖f(Xi−1)1‖Xi−1−x⋆‖≤r‖2]≤2sup‖x−x⋆‖≤rE[‖F(x,Z)‖2]+2sup‖x−x⋆‖≤r‖f(x)‖2 |
Since both of these terms are bounded, independently of i, by Assumption 3 and Lemma 1, this is finite.
3. Next, since {δMi1‖Xi−1−x⋆‖≤r} is a martingale difference sequence, we can use the above estimate to obtain the uniform L2(P) bound,
E[‖Mn‖2]=n∑i=1a2iE[‖δMi1‖Xi−1−x⋆‖≤r‖2]≤supiE[‖δMi1‖Xi−1−x⋆‖≤r‖2]∞∑i=1a2i<∞ |
Uniform boundedness in L2, gives boundedness in L1, and this implies a.s. convergence in H.
In this section we prove results showing that only finitely many truncations will occur, in either the fixed or expanding trust region case. Recall that when a truncation occurs, the equivalent conditions hold: Pn+1≠0; σn+1=σn+1; and ˜Xn+1∉U1 in the fixed trust region algorithm, while ˜Xn+1∉Uσn in the expanding trust region case.
Lemma 2.2. In the fixed trust region algorithm, if Assumptions 1, 2, 3, and 4 hold, then the number of truncations is a.s. finite; a.s., there exists N, such that for all n≥N, σn=σN.
Proof. We break the proof up into 7 steps:
1. Pick ρ and ρ′ such that
R0<ρ′<ρ<R1 | (2.4) |
Let ˉf=sup‖f(x)‖, with the supremum over U1; this bound exists by Lemma 1. Under Assumption 2, there exists δ>0 such that
infR0/2≤‖x−x⋆‖≤R1⟨x−x⋆,f(x)⟩=δ. | (2.5) |
Having fixed ρ, ρ′, ˉf, and δ, take ϵ>0 such that:
ϵ<min{ρ′−R0,R1−ρ′2+ˉf,ρ′−R0ˉf,R02,δ2ˉf,δˉf2,ρ−ρ′}. | (2.6) |
Having fixed such an ϵ, by the assumptions of this lemma and Proposition 1, a.s., there exists nϵ such that for any n,m≥nϵ, both
‖m∑k=nakδMk‖≤ϵ,an≤ϵ. | (2.7) |
2. Define the auxiliary sequence
X′n=Xn−∞∑k=n+1akδMk. | (2.8) |
Using (2.1), we can then write
X′n+1=X′n−an+1f(Xn)+an+1Pn+1. | (2.9) |
By (2.7), for any n≥nϵ,
‖X′n−Xn‖≤ϵ | (2.10) |
3. We will show X′n∈Bρ′(x⋆) for all n large enough. The significance of this is that if n≥nϵ, and X′n∈Bρ′(x⋆), then no truncation occurs. Indeed, using (2.6)
‖˜Xn+1−x⋆‖≤‖X′n−x⋆‖+‖Xn−X′n‖+an+1ˉf+‖an+1δMn+1‖<ρ′+ϵ+ϵˉf+ϵ<R1,⇒˜Xn+1∈U1. | (2.11) |
Consequently, Pn+1=0, Xn+1=˜Xn+1, and σn+1=σn. Thus, establishing X′n∈Bρ′(x⋆) will yield the result.
4. Let
N=inf{n≥nϵ∣˜Xn+1∉U1}+1 | (2.12) |
This corresponds to the the first truncation after nϵ. If the above set is empty, for that realization, no truncations occur after nϵ, and we are done. In such a case, we may take N=nϵ in the statement of the lemma.
5. We now prove by induction that in the case that (2.12) is finite, X′n∈Bρ′(x⋆) for all n≥N. First, note that XN∈BR0(x⋆)⊂Bρ(x⋆). By (2.6) and (2.10),
‖X′N−x⋆‖≤‖XN−x⋆‖+‖X′N−XN‖<R0+ϵ<ρ′,⇒X′N∈Bρ′(x⋆). |
Next, assume X′N,X′N+1,…,X′n are all in Bρ′(x⋆). Using (2.11), we have that PN+1=…=Pn+1=0 and σN=…=σn=σn+1. Therefore,
‖X′n+1−x⋆‖2=‖X′n−x⋆‖2−2an+1⟨X′n−x⋆,f(Xn)⟩+a2n+1‖f(Xn)‖2≤‖X′n−x⋆‖2−2an+1⟨X′n−x⋆,f(Xn)⟩+an+1ϵˉf2 | (2.13) |
We now consider two cases of (2.13) to conclude ‖X′n+1−x⋆‖<ρ′.
6. In the first case, ‖X′n−x⋆‖≤R0. By Cauchy-Schwarz and (2.6)
‖X′n+1−x⋆‖2<R20+2ϵR0ˉf+ϵ2ˉf2=(R0+ϵˉf)2<(ρ′)2. |
In the second case, R0<‖X′n−x⋆‖<ρ′. Dissecting the inner product term in (2.13) and using Assumption 2 and (2.10),
⟨X′n−x⋆,f(Xn)⟩=⟨Xn−x⋆,f(Xn)⟩+⟨X′n−Xn,f(Xn)⟩≥⟨Xn−x⋆,f(Xn)⟩−ˉfϵ | (2.14) |
Conditions (2.6) and (2.10) yield the following upper and lower bounds:
‖Xn−x⋆‖≥‖X′n−x⋆‖−‖X′n−Xn‖≥R0−ϵ>12R0,‖Xn−x⋆‖≤‖X′n−x⋆‖+‖X′n−Xn‖≤ρ′+ϵ<ρ<R1. |
Therefore, (2.5) applies and ⟨Xn−x⋆,f(Xn)⟩≥δ. Using this in (2.14), and condition (2.6),
⟨X′n−x⋆,f(Xn)⟩≥δ−ˉfϵ>12δ. |
Substituting this last estimate back into (2.13), and using (2.6),
‖X′n+1−x⋆‖2<(ρ′)2−an+1(δ−ϵˉf2)<(ρ′)2. |
This completes the inductive step.
7. Since the auxiliary sequence remains in Bρ′(x⋆) for all n≥N>nϵ, (2.11) ensures ˜Xn+1∈BR1(x⋆), Pn+1=0, and σn+1=σN, a.s.
To obtain a similar result for the expanding trust region problem, we first relate the finiteness of the number of truncations with the sequence persisting in a bounded set.
Lemma 2.3. In the expanding trust region algorithm, if Assumptions 1, 3, and 4 hold, then the sequence remains in a set of the form BR(0) for some R>0 if and only if the number of truncations is finite, a.s.
Proof. We break this proof into 4 steps:
1. If the number of truncations is finite, then there exists N such that for all n≥N, σn=σN. Consequently, the proposed moves are always accepted, and Xn∈Uσn=UσN for all n≥N. Since Xn∈Uσn⊂UσN for n<N, Xn∈UσN for all n. By Assumption 3, BR(0)=BrσN(0)=UσN is the desired set.
2. For the other direction, assume that there exists R>0 such that Xn∈BR(0) for all n. Since the rn in (2.3) tend to infinity, there exists N1, such that R<R+1<rN1. Hence, for all n≥N1,
BR(0)⊂BR+1(0)⊂Un | (2.15) |
Let ˉf=sup‖f(x)‖, with the supremum over BR(0). Let ˜R be sufficiently large such that BR+1(0)⊂B˜R(x⋆). Lastly, using Proposition 1 and Assumption 4, a.s., there exists N2, such that for all n≥N2
‖anδMn1‖Xn−x⋆‖≤˜R‖<12,an<12(1+ˉf) | (2.16) |
Since Xn∈BR(0)⊂B˜R(x⋆), the indicator function in (2.16) is always one, and ‖anδMn‖<1/2.
3. Next, let
N=inf{n≥0∣σn≥max{N1,N2}} | (2.17) |
If the above set is empty, then σn<max{N1,N2} for all n, and the number of truncations is a.s. finite. In this case, the proof is complete.
4. If the set in (2.17) is not empy, then N<∞. Take n≥N. As Xn∈BR(0), and since n≥σn≥max{N1,N2}, (2.16) applies. Therefore,
‖˜Xn+1‖≤‖Xn‖+‖˜Xn+1−Xn‖≤‖Xn‖+an+1‖f(Xn)‖+‖an+1δMn+1‖<R+12+12<R+1. | (2.18) |
Thus, ˜Xn+1∈BR+1(0)⊂UN1, σn≥N1, and UN1⊂Uσn. Therefore, ˜Xn+1∈Uσn. No truncation occurs, and σn=σn+1. Since this holds for all n≥N, σn=σN, and the number of truncations is a.s. finite.
Next, we establish that, subject to an additional assumption, the sequence remains in a bounded set; the finiteness of the truncations is then a corollary.
Lemma 2.4. In the expanding trust region algorithm, if Assumptions 1, 2, 3, and 4 hold, and for any r>0, there a.s. exists N<∞, such that for all n≥N,
Pn+11‖Xn−x⋆‖≤r=0, |
then {Xn} remains in a bounded open set, a.s.
Proof. We break this proof into 7 steps:
1. We begin by setting some constants for the rest of the proof. Fix R>0 sufficiently large such that BR(x⋆)⊃U0. Next, let ˉf=sup‖f(x)‖ with the supremum taken over BR+2(x⋆). Assumption 2 ensures there exists δ>0 such that
infR/2≤‖x−x⋆‖≤R+2⟨x−x⋆,f(x)⟩=δ. | (2.19) |
Having fixed R, ˉf, and δ, take ϵ>0 such that:
ϵ<min{1,1ˉf,δ2ˉf,δˉf2,R2}. | (2.20) |
By the assumptions of this lemma and Proposition 1 there exists, a.s., nϵ≥N such that for all n≥nϵ,
‖∞∑i=n+1aiδMi1‖Xi−1−x⋆‖≤R+2‖≤ϵ, | (2.21a) |
Pn+11‖Xn−x⋆‖≤R+2=0, | (2.21b) |
an+1≤ϵ | (2.21c) |
2. Define the modified sequence for n≥nϵ as
X′n=Xn−∞∑k=n+1akδMk1‖Xk−1−x⋆‖≤R+2,⇒‖X′n−Xn‖≤ϵ. | (2.22) |
Using (2.1), we have the iteration
X′n+1=X′n−an+1δMn+11‖Xn−x⋆‖>R+2−an+1f(Xn)+an+1Pn+1. | (2.23) |
3. Let
N=inf{n≥nϵ∣σn+1≠σn}+1, | (2.24) |
the first time after nϵ that a truncation occurs.
If the above set is empty, no truncations occur after nϵ. In this case, σn=σnϵ≤nϵ<∞ for all n≥nϵ. Therefore, for all n≥nϵ, Xn∈Uσn⊂Uσnϵ. Since Uσn⊂Uσnϵ for all n<nϵ too, the proof is complete in this case.
4. Now assume that N<∞. We will show that {X′n} remains in BR+1(x⋆) for all n≥N. Were this to hold, then for n≥N,
‖Xn−x⋆‖≤‖X′n−x⋆‖+‖∞∑i=n+1aiδMi1‖Xi−1−x⋆‖≤R+2‖<R+1+ϵ<R+2, | (2.25) |
having used (2.21) and (2.22). For n<N, Xn∈Uσn⊂UσN=BrN(0). Therefore, for all n, Xn∈B˜R(0) where ˜R=max{rN,‖x⋆‖+R+2}.
5. We prove X′n∈BR+1(x⋆) by induction. First, since ϵ<1 and XN∈U0⊂BR(x⋆),
‖X′N−x⋆‖≤‖X′N−XN‖+‖XN−x⋆‖<ϵ+R<R+1. |
Next, assume that X′N,X′N+1,…,X′n are all in BR+1(x⋆). By (2.25), Xn∈BR+2(x⋆). Since Pn+11‖Xn−x⋆‖≤R+2=0, we conclude Pn+1=0. The modified iteration (2.23) simplifies to have
X′n+1=X′n−an+1f(Xn), |
and
‖X′n+1−x⋆‖2=‖X′n−x⋆‖2−2an+1⟨X′n−x⋆,f(Xn)⟩+a2n+1‖f(Xn)‖2<‖X′n−x⋆‖2−2an+1⟨X′n−x⋆,f(Xn)⟩+an+1ϵˉf2. | (2.26) |
6. We now consider two cases of (2.26). First, assume ‖X′n−x⋆‖≤R. Then (2.26) can immediately be bounded as
‖X′n+1−x⋆‖2<R2+2ϵRˉf+ϵ2ˉf2=(R+ϵˉf)2<(R+1)2, |
where we have used condition (2.20) in the last inequality.
7. Now consider the case R<‖X′n−x⋆‖<R+1. Using (2.20), the inner product in (2.26) can first be bounded from below:
⟨X′n−x⋆,f(Xn)⟩=⟨Xn−x⋆,f(Xn)⟩+⟨X′n−Xn,f(Xn)⟩≥⟨Xn−x⋆,f(Xn)⟩−ϵˉf>⟨Xn−x⋆,f(Xn)⟩−12δ. |
Next, using (2.20)
‖Xn−x⋆‖≥‖X′n−x⋆‖−‖Xn−X′n‖>R−ϵ>R−12R=12R |
Therefore, 12R<‖Xn−x⋆‖<R+2, so (2.19) ensures ⟨Xn−x⋆,f(Xn)⟩≥δ and
⟨X′n−x⋆,f(Xn)⟩>δ−12δ=12δ. |
Returning to (2.26), by (2.20),
‖X′n+1−x⋆‖2≤(R+1)2−an+1(δ−ϵˉf2)<(R+1)2. |
This completes the proof of the inductive step in this second case, completing the proof.
Corollary 2.1. For the expanding trust region algorithm, if Assumptions 1, 2, 3, and 4 hold, then the number of truncations is a.s. finite.
Proof. The proof is by contradiction. We break the proof into 4 steps:
1. Assuming that there are infinitely many truncations, Lemma 3 implies that the sequence cannot remain in a bounded set. Then, continuing to assume that Assumptions 1, 2, 3, and 4 hold, the only way for the conclusion of Lemma 4 to fail is if the assumption on Pn+11‖Xn−x⋆‖≤r is false. Therefore, there exists r>0 and a set of positive measure on which a subsequence, Pnk+11‖Xnk−x⋆‖≤r≠0. Hence Xnk∈Br(x⋆), and Pnk+1≠0. So truncations occur at these indices, and ˜Xnk+1∉Uσnk.
2. Let ˉf=sup‖f(x)‖ with the supremum over the set Br(x⋆) and let ϵ>0 satisfy
ϵ<(ˉf+1)−1. | (2.27) |
By our assumptions of the lemma and Proposition 1, there exists nϵ such that for all n≥nϵ
‖an+1δMn+11‖Xn−x⋆‖≤r‖≤ϵ,an+1≤ϵ | (2.28) |
Along the subsequence, for all nk≥nϵ,
‖ank+1δMnk+11‖Xnk−x⋆‖≤r‖=‖ank+1δMnk+1‖≤ϵ. | (2.29) |
3. Furthermore, for nk≥nϵ:
‖˜Xnk+1−x⋆‖≤‖Xnk−x⋆‖+ank+1‖f(Xnk)‖+‖ank+1δMnk+1‖<r+ϵˉf+ϵ<r+1,⇒˜Xnk+1∈Br+1(x⋆), | (2.30) |
where (2.27) has been used in the last inequality.
4. By the definition of the U_n , there exists an index M such that U_{M}\supset B_{r+1}(x_\star) . Let
\begin{equation} N = \inf\{n\geq n_ \epsilon\mid \sigma_n \geq M\}. \end{equation} | (2.31) |
This set is nonempty and N < \infty since we have assumed there are infinitely many truncations. Let n_k \geq N . Then \sigma_{n_k}\geq M and U_{\sigma_{n_k}}\supset B_{r+1}(x_\star) . But (2.30) then implies that \tilde{X}_{n_k+1} \in U_{\sigma_{n_k}} , and no truncation will occur; P_{n_k+1} = 0 , providing the desired the contradiction.
Using the above results, we are able to prove Theorems 2.1 and 2.2. Since the proofs are quite similar, we present the more complicated expanding trust region case.
Proof. We split this proof into 6 steps:
1. First, by Corollary 1, only finitely many truncations occur. By Lemma 3, there exists R > 0 such that X_n\in B_R(0) for all n . Consequently, there is an r such that X_n\in B_r(x_\star) for all n .
2. Next, we fix constants. Let \bar f = \sup \|f(x)\| with the supremum taken over B_r(x_\star) . Fix \eta \in (0, 2R) , and use Assumption 2 to determine \delta > 0 such that
\begin{equation} \inf\limits_{\eta/2 \leq \|x-x_\star\|\leq r}\left\langle {x-x_\star},{f(x)}\right\rangle = \delta \end{equation} | (2.32) |
Take \epsilon > 0 such that:
\begin{equation} \epsilon \lt \min\left\{{1,\frac{\eta}{2}, \frac{\delta}{2\bar{f}}, \frac{\delta}{2\bar{f}^2}}\right\} \end{equation} | (2.33) |
Having set \epsilon , we again appeal to Assumption 4 and Proposition 1 to find n_ \epsilon such that for all n\geq n_ \epsilon :
\begin{equation} \left \|{\sum\limits_{i = n+1}^\infty a_i \delta M_i 1_{\|X_{i-1} -x_\star\|\leq r}}\right\| = \left \|{\sum\limits_{i = n+1}^\infty a_i \delta M_i }\right\| \leq \epsilon, \quad a_{n+1}\leq \epsilon \end{equation} | (2.34) |
3. Define the auxiliary sequence,
\begin{equation} X_n' = X_n - \sum\limits_{i = n+1}^\infty a_i \delta M_i 1_{\|X_{i-1} - x_\star \|\leq r} = X_n - \sum\limits_{i = n+1}^\infty a_i \delta M_i. \end{equation} | (2.35) |
Since there are only finitely many truncations, there exists N\geq n_ \epsilon , such that for all n\geq N , P_{n+1} = 0 , as the truncations have ceased. Consequently, for n\geq N ,
\begin{equation} X'_{n+1} = X'_n - a_{n+1}f(X_n) \end{equation} | (2.36) |
By (2.34) and (2.35), for n\geq N , \|X_n - X_n'\|\leq \epsilon . Since \epsilon > 0 may be arbitrarily small, it will be sufficient to prove X_n'\to x_\star .
4. To obtain convergence of X_n' , we first examine \|X_{n+1}'-x_\star\| . For n\geq N ,
\begin{equation} \begin{split} \|X_{n+1}' - x_\star\|^2 &\leq \|X_{n}' - x_\star\|^2-2a_{n+1}\left\langle {X_n' - x_\star},{f(X_n)}\right\rangle + a_{n+1} \epsilon \bar f^2, \end{split} \end{equation} | (2.37) |
Now consider two cases of this expression. First, assume \|X_n' - x_\star\|\leq\eta . In this case, using (2.33),
\begin{equation} \begin{split} -2a_{n+1}\left\langle {X_n' - x_\star},{f(X_n)}\right\rangle + a_{n+1} \epsilon \bar f^2&\leq a_{n+1}(2\eta \bar f + \epsilon \bar f^2) \\ & \lt a_{n+1}(4R\bar f + \bar f^2) = a_{n+1} B. \end{split} \end{equation} | (2.38) |
where B > 0 is a constant depending only on R and \bar f . For \|X_n' - x_\star\| > \eta , using (2.33)
\begin{equation} \begin{split} \left\langle {X'_{n}-x_\star},{f(X_n) }\right\rangle & = \left\langle {X_{n}-x_\star},{f(X_n) }\right\rangle + \left\langle {X'_{n}-X_n},{f(X_n) }\right\rangle\\ &\geq \left\langle {X_{n}-x_\star},{f(X_n) }\right\rangle - \epsilon \bar{f}\\ & \gt \left\langle {X_{n}-x_\star},{f(X_n) }\right\rangle -\tfrac{1}{2}\delta. \end{split} \end{equation} | (2.39) |
By (2.33),
\begin{equation*} \|X_{n}-x_\star\|\geq \|X'_{n}-x_\star\| -\| X_{n}-X_n'\| \gt \eta - \epsilon \gt \tfrac{1}{2}\eta \end{equation*} |
Since \|X_n - x_\star\| < r too, (2.32) and (2.39) yield the estimate
\begin{equation*} \left\langle {X'_{n}-x_\star},{f(X_n) }\right\rangle \gt \delta - \epsilon \bar{f} \gt \tfrac{1}{2}\delta \end{equation*} |
Thus, in this regime, using (2.33),
\begin{equation} \begin{split} -2a_{n+1}\left\langle {X_n' - x_\star},{f(X_n)}\right\rangle + a_{n+1} \epsilon \bar f^2&\leq -a_{n+1}(\delta - \epsilon \bar f^2)\\ & \lt -\tfrac{1}{2}\delta a_{n+1} = - A a_{n+1} \end{split} \end{equation} | (2.40) |
where A > 0 is a constant depending only on \delta .
Combining estimates (2.38) and (2.40), we can write for n\geq N
\begin{equation} \|X_{n+1}' - x_\star\|^2 \lt \|X_{n}' - x_\star\|^2 - a_{n+1} A 1_{\|X_n' - x_\star\| \gt \eta} + a_{n+1} B 1_{\|X_n' - x_\star\|\leq \eta}. \end{equation} | (2.41) |
5. We now show that \|X_n' - x_\star\|\leq \eta i.o. The argument is by contradiction. Let M\geq N be such that for all n\geq M , \|X_n' - x_\star\| > \eta . For such n ,
\begin{equation} \begin{split} \eta^2 \lt \|X_{n+1}'-x_\star\|^2 & \lt \|X_{n}'-x_\star\|^2 - a_{n+1}A \\ & \lt \|X_{n-1}'-x_\star\|^2 - a_{n+1}A - a_n A\\ & \lt \ldots \lt \|X_M' - x_\star\|^2 - A \sum\limits_{i = M}^n a_{i+1}. \end{split} \end{equation} | (2.42) |
Using Assumption 4 and taking n\to \infty , we obtain a contradiction.
6. Finally, we prove convergence of X_n'\to x_\star . Since X_n'\in B_{\eta}(x_\star) i.o., let
\begin{equation} N' = \inf \{n\geq N\mid \|X_n' - x_\star\| \lt \eta \}. \end{equation} | (2.43) |
For n\geq N' , we can then define
\begin{equation} \varphi(n) = \max\left\{{p\leq n\mid \left \|{X_p' - x_\star}\right\| \lt \eta }\right\}. \end{equation} | (2.44) |
For all such n , \varphi(n)\leq n , and X_{\varphi(n)}'\in B_\eta(x_\star) .
We claim that for n\geq N' ,
\|X_{n+1}' - x_\star\|^2 \lt \|X_{\varphi(n)}' -x_\star\|^2 + B a_{\varphi(n)+1} \lt \eta^2 + Ba_{\varphi(n)+1}. |
First, if n = \varphi(n) , this trivially holds in (2.41). Suppose now that n > \varphi(n) . Then for i = \varphi(n)+1, \varphi(n)+2, \ldots n , \|X_{i}' - x_\star\| > \eta . Consequently,
\begin{equation*} \begin{split} \|X_{n+1}' - x_\star\|^2& \lt \|X_{n}' - x_\star\|^2 \lt \|X_{n-1}' - x_\star\|^2 \lt \ldots\\ & \lt \|X_{\varphi(n)+1}' - x_\star\|^2 \lt \|X_{\varphi(n)}' - x_\star\|^2 + B a_{\varphi(n)+1}\\ & \lt \eta^2 + B a_{\varphi(n)+1} \end{split} \end{equation*} |
As \varphi(n)\to \infty ,
\begin{equation*} \limsup\limits_{n\to \infty }\|X_{n+1}' - x_\star\|^2\leq \eta^2 \end{equation*} |
Since \eta may be arbitrarily small, we conclude that
\limsup\limits_{n\to \infty }\|X_{n+1}' - x_\star\| = \lim\limits_{n\to \infty }\|X_{n+1}' - x_\star\| = 0, |
completing the proof.
Recall from the introduction that our distribution of interest, \mu , is posed on the Borel subsets of Hilbert space \mathcal{H} . We assume that \mu \ll \mu_0 , where \mu_0 = N(m_0, C_0) is some reference Gaussian. Thus, we write
\begin{equation} \frac{d\mu}{d\mu_0} = \frac{1}{Z_\mu}\exp\left\{{-\Phi_\mu(u)}\right\}, \end{equation} | (3.1) |
where \Phi_\nu: X\to \mathbb{R} , X a Banach space, a subspace of \mathcal{H} , of full measure with respect to \mu_0 , a Gaussian on \mathcal{H} , assumed to be continuous. Z_\mu = \mathbb{E}^{\mu_0}[\exp\left\{{-\Phi(u)}\right\}]\in (0, \infty) is the partition function ensuring we have a probability measure.
Let \nu = N(m, C) , be another Gaussian, equivalent to \mu_0 , such that we can write
\begin{equation} \frac{d\nu}{d\mu_0} = \frac{1}{Z_\nu}\exp\left\{{-\Phi_\nu(v)}\right\}, \end{equation} | (3.2) |
Assuming that \nu \ll \mu , we can write
\begin{equation} \mathcal{R}(\nu||\mu) = \mathbb{E}^{\nu}[\Phi_\mu(u) - \Phi_\nu(u)] + \log(Z_\mu) - \log(Z_\nu) \end{equation} | (3.3) |
The assumption that \nu \ll \mu implies that \nu and \mu are equivalent measures. As was proven in [16], if \mathcal{A} is a set of Gaussian measures, closed under weak convergence, such that at least one element of \mathcal{A} is absolutely continuous with respect to \mu , then any minimizing sequence over \mathcal{A} will have a weak subsequential limit.
If we assume, for this work, that C = C_0 , then, by the Cameron-Martin formula (see [9]),
\begin{equation} \Phi_\nu(u) = -\left\langle {u-m},{m -m_0}\right\rangle_{ \mathcal{H}^1} - \frac{1}{2}\left \|{m - m_0}\right\|_{ \mathcal{H}^1}^2, \quad Z_{\nu} = 1. \end{equation} | (3.4) |
Here, \left\langle {\bullet}, {\bullet}\right\rangle_{ \mathcal{H}^1} and \|\bullet\|_{ \mathcal{H}^1} are the inner product and norms of the Cameron-Martin Hilbert space, denoted \mathcal{H}^1 ,
\begin{equation} \left\langle {f},{g}\right\rangle_{ \mathcal{H}^1} = \left\langle {C_0^{-1/2} f},{C_0^{-1/2} g}\right\rangle, \quad \left \|{f}\right\|_{ \mathcal{H}^1}^2 = \left\langle {f},{f}\right\rangle_{ \mathcal{H}^1}^2. \end{equation} | (3.5) |
Convergence to the minimizer will be established in \mathcal{H}^1 , and \mathcal{H}^1 will be the relevant Hilbert space in our application of Theorems 2.1 and 2.2 to this problem.
Letting \nu_0 = N(0, C_0) and v\sim \nu_0 , we can then rewrite (3.3) as
\begin{equation} J(m) \equiv \mathcal{R}(\nu||\mu) = \mathbb{E}^{\nu_0}[\Phi_{\mu}(v + m)] + \frac{1}{2}\left \|{m - m_0}\right\|_{ \mathcal{H}^1}^2 + \log(Z_\mu) \end{equation} | (3.6) |
The Euler-Lagrange equation associated with (3.6), and the second variation, are:
\begin{align} J'(m) & = \mathbb{E}^{\nu_0}[\Phi'_\mu(v+m)] + C_0^{-1}(m-m_0), \end{align} | (3.7) |
\begin{align} J''(m) & = \mathbb{E}^{\nu_0}[\Phi''_\mu(v+m)] + C_0^{-1}. \end{align} | (3.8) |
In [15], it was suggested that rather than try to find a root of (3.7), the equation first be preconditioned by multiplying by C_0 ,
\begin{equation} C_0 \mathbb{E}^{\nu_0}[\Phi'_\mu(v+m)] + (m-m_0), \end{equation} | (3.9) |
and a root of this mapping is sought, instead. Defining
\begin{align} f(m) & = C_0 \mathbb{E}^{\nu_0}[\Phi'_\mu(v+m)] + (m-m_0), \end{align} | (3.10) |
\begin{align} F(m,v) & = C_0\Phi'_\mu(v+m) + (m-m_0). \end{align} | (3.11) |
The Robbins-Monro formulation is then
\begin{equation} m_{n+1} = m_n - a_{n+1} F(m_n, v_{n+1}) + P_{n+1}, \end{equation} | (3.12) |
with v_n \sim \nu_0 , i.i.d.
We thus have
Theorem 3.1. Assume:
● There exists \nu = N(m, C_0)\sim \mu_0 such that \nu\ll\mu .
● \Phi_\mu' and \Phi_\mu'' exist for all u \in \mathcal{H}^1 .
● There exists m_\star , a local minimizer of J , such that J'(m_\star) = 0 .
● The mapping
\begin{equation} m\mapsto \mathbb{E}^{\nu_0}\left[{\left \|{\sqrt{C_0}\Phi_\mu'(m+v)}\right\|^2}\right] \end{equation} | (3.13) |
is bounded on bounded subsets of \mathcal{H}^1 .
● There exists a convex neighborhood U_\star of m_\star and a constant \alpha > 0 , such that for all m\in U_\star , for all u \in \mathcal{H}^1 ,
\begin{equation} \left\langle {J''(m)u},{u}\right\rangle\geq \alpha \left \|{u}\right\|_{ \mathcal{H}^1}^2 \end{equation} | (3.14) |
Then, choosing a_n according to Assumption 4,
● If the subset U_\star can be taken to be all of \mathcal{H}^1 , for the expanding truncation algorithm, m_n \to m_\star a.s. in \mathcal{H}^1 .
● If the subset U_\star is not all of \mathcal{H}^1 , then, taking U_1 to be a bounded (in \mathcal{H}^1 ) convex subset of U_\star , with m_\star \in U_1 , and U_0 any subset of U_1 such that there exist R_0 < R_1 with
U_0 \subset B_{R_0}(x_\star) \subset B_{R_1}(x_\star)\subset U_1, |
for the fixed truncation algorithm, m_n\to m_\star a.s. in \mathcal{H}^1 .
Proof. We split the proof into 2 steps:
1. By the assumptions of the theorem, we clearly satisfy Assumptions 1 and 4. To satisfy Assumption 3, we observe that
\begin{equation*} \mathbb{E}^{\nu_0}[\left \|{F(m,v)}\right\|^2_{ \mathcal{H}^1}]\leq 2 \mathbb{E}^{\nu_0}\left[{\left \|{\sqrt{C_0}\Phi_\mu'(m+v)}\right\|^2}\right] + 2\left \|{m-m_0}\right\|_{ \mathcal{H}^1}^2. \end{equation*} |
This is bounded on bounded subsets of \mathcal{H}^1 .
2. Per the convexity assumption, (3.14), implies Assumption 2, since, by the mean value theorem in function spaces,
\begin{equation*} \begin{split} \left\langle {m-m_\star},{f(m)}\right\rangle_{ \mathcal{H}^1} & = \left\langle {m-m_\star},{C_0\left[{J'(m_\star) +J''(\tilde m)(m-m_\star) }\right]}\right\rangle_{ \mathcal{H}^1}\\ & = \left\langle {m-m_\star},{J''(\tilde m)(m-m_\star)}\right\rangle\geq \alpha\left \|{m-m_\star}\right\|_{ \mathcal{H}^1}^2 \end{split} \end{equation*} |
where \tilde m is some intermediate point between m and m_\star . This completes the proof.
While condition (3.14) is sufficient to obtain convexity, other conditions are possible. For instance, suppose there is a convex open set U_\star containing m_\star and constant \theta\in [0, 1) , such that for all m \in U_\star ,
\begin{equation} \inf\limits_{\substack{u\in \mathcal{H}\\ u\neq 0}} \frac{\left\langle { \mathbb{E}^{\nu_0}[\Phi''_\mu(v+m)]u},{u}\right\rangle}{\left \|{u}\right\|^2}\geq -\theta\lambda_1^{-1}, \end{equation} | (3.15) |
where \lambda_1 is the principal eigenvalue of C_0 . Then this would also imply Assumption 2, since
\begin{equation*} \begin{split} \left\langle {m-m_\star},{f(m)}\right\rangle_{ \mathcal{H}^1} & = \left\langle {m-m_\star},{C_0\left[{J'(m_\star) +J''(\tilde m)(m-m_\star) }\right]}\right\rangle_{ \mathcal{H}^1}\\ & = \left\langle {m-m_\star},{J''(\tilde m)(m-m_\star)}\right\rangle\\ &\geq \left \|{m-m_\star}\right\|_{ \mathcal{H}^1}^2 + \left\langle {m-m_\star},{ \mathbb{E}^{\nu_0}[\Phi''_\mu(v+\tilde m)] (m-m_\star)}\right\rangle\\ &\geq \left \|{m-m_\star}\right\|_{ \mathcal{H}^1}^2 -\theta \lambda_1^{-1} \left \|{m-m_\star}\right\|^2\\ &\geq (1-\theta)\left \|{m-m_\star}\right\|_{ \mathcal{H}^1}^2. \end{split} \end{equation*} |
We mention (3.15) as there may be cases, shown below, for which the operator \mathbb{E}^{\nu_0}[\Phi''_\mu(v+ m)] is obviously nonnegative.
To apply the Robbins-Monro algorithm to the relative entropy minimization problem, the \Phi_\mu functional of interest must be examined. In this section we present a few examples, based on those presented in [15], and examine when the assumptions hold. The one outstanding assumption that we must make is that, a priori, \mu_0 is an equivalent measure to \mu .
Taking \mu_0 = N(0, 1) , the standard unit Gaussian, let V: \mathbb{R} \to \mathbb{R} be a smooth function such that
\begin{equation} \frac{d\mu}{d\mu_0} = \frac{1}{Z_\mu} \exp\left\{{-{ \epsilon^{-1}}V(x)}\right\} \end{equation} | (4.1) |
is a probability measure on \mathbb{R} . For these scalar cases, we use x in place of v . In the above framework,
\begin{align*} F(x,\xi) & = { \epsilon^{-1}}V'(x+\xi) -\xi,\\ f(x) & = { \epsilon^{-1}} \mathbb{E}[V'(x+\xi)]m \\ \Phi_\mu'(x) & = { \epsilon^{-1}}V'(x), \\ \Phi_\mu''(x)& = { \epsilon^{-1}}V''(x) \end{align*} |
and \xi\sim N(0, 1) = \nu_0 = \mu_0 .
Consider the case that
\begin{equation} V(x) = \tfrac{1}{2}x^2 + \tfrac{1}{4}x^4. \end{equation} | (4.2) |
In this case
\begin{align*} F(x,\xi)& = { \epsilon^{-1}}\left({x+\xi + (x+\xi)^3}\right) + x,\\ f(x) & = { \epsilon^{-1}}\left({4x + x^3}\right) +x, \\ \mathbb{E}[\Phi''_\mu(x+\xi)] & = { \epsilon^{-1}}(4 + 3x^2),\\ \mathbb{E}[\left |{\Phi'_\mu(x+\xi)}\right |^2] & = { \epsilon^{-1}} \left({22 + 58 x^2 + 17 x^4 + x^6}\right). \end{align*} |
Since \mathbb{E}[\Phi''_\mu(x+\xi)] \geq 4 { \epsilon^{-1}} , all of our assumptions are satisfied and the expanding truncation algorithm will converge to the unique root at x_\star = 0 a.s. See Figure 1 for an example of the convergence at \epsilon = 0.1 , U_{n} = (-n -1, n+1) , and always restarting at 0.5 .
We refer to this as a "globally convex'' problem since \mathcal{R} is globally convex about the minimizer.
In contrast to the above problem, some mimizers are only "locally'' convex. Consider the case the double well potential
\begin{equation} V(x) = \tfrac{1}{4}(4-x^2)^2 \end{equation} | (4.3) |
Now, the expressions for RM are
\begin{align*} F(x,\xi) & = { \epsilon^{-1}}\left({(x+\xi)^3-4(x+\xi))}\right) + x,\\ f(x) & = { \epsilon^{-1}}\left({x^3-x}\right) +x, \\ \mathbb{E}[\Phi''_\mu(x+\xi)] & = { \epsilon^{-1}}\left({3x^2-1}\right),\\ \mathbb{E}[\left |{\Phi'_\mu(x+\xi)}\right |^2] & = { \epsilon^{-1}} (1 + x^2) (7 + 6 x^2 +x^4). \end{align*} |
In this case, f(x) vanishes at 0 and \pm \sqrt{1- \epsilon} , and J'' changes sign from positive to negative when x enters ({-\sqrt{(1- \epsilon)/3}, \sqrt{({1- \epsilon})/{3}}}) . We must therefore restrict to a fixed trust region if we want to ensure convergence to either of \pm\sqrt{1- \epsilon} .
We ran the problem at \epsilon = 0.1 in two cases. In the first case, U_1 = (0.6, 3.0) and the process always restarts at 2 . This guarantees convergence since the second variation will be strictly postive. In the second case, U_1 = (-0.5, 1.5) , and the process always restarts at -0.1 . Now, the second variation can change sign. The results of these two experiments appear in Figure 2. For some random number sequences the algorithm still converged to \sqrt{1- \epsilon} , even with the poor choice of trust region.
Take \mu_0 = N(m_0(t), C_0) , with
\begin{equation} C_0 = \left({-\frac{d^2}{dt^2}}\right)^{-1}, \end{equation} | (4.4) |
equipped with Dirichlet boundary conditions on \mathcal{H} = L^2(0, 1) .* In this case the Cameron-Martin space \mathcal{H}^1 = H^1_0(0, 1) , the standard Sobolev space equipped with the Dirichlet norm. Let us assume m_0 \in H^1(0, 1) , taking values in \mathbb{R}^d .
* This is the covariance of the standard unit Brownian bridge, Y_t = B_t - t B_1 .
Consider the path space distribution on L^2(0, 1) , induced by
\begin{equation} \frac{d\mu}{d\mu_0} = - \frac{1}{Z_\mu}\exp\left\{{-\Phi_\mu(v)}\right\}, \quad \Phi_\mu(u) = { \epsilon^{-1}}\int_0^1 V(v(t))dt, \end{equation} | (4.5) |
where V: \mathbb{R}^d\to \mathbb{R} is a smooth function. We assume that V is such that this probability distribution exists and that \mu \sim \mu_0 , our reference measure.
We thus seek an \mathbb{R}^d valued function m(t) \in H^1(0, 1) for our Gaussian approximation of \mu , satisfying the boundary conditions
\begin{equation} m(0) = m_-,\quad m(1) = m_+. \end{equation} | (4.6) |
For simplicity, take m_0 = (1-t)m_- + t m_+ , the linear interpolant between m_\pm . As above, we work in the shifted coordinated x(t) = m(t) - m_0(t)\in H^1_0(0, 1) .
Given a path v(t)\in H^1_0 , by the Sobolev embedding, v is continuous with its L^\infty norm controlled by its H^1 norm. Also recall that for \xi \sim N(0, C_0) , in the case of \xi(t) \in \mathbb{R} ,
\begin{equation} \mathbb{E}\left[{\xi(t)^p}\right] = \begin{cases} 0, & \text{$p$ odd},\\ (p-1)!!\left[{t(1-t)}\right]^{\frac p 2}, & \text{$p$ even}. \end{cases} \end{equation} | (4.7) |
Letting \lambda_1 = 1/\pi^2 be the ground state eigenvalue of C_0 ,
\begin{equation*} \begin{split} \mathbb{E}[\|\sqrt{C_0}\Phi'_\mu(v +m_0+\xi)\|^2]&\leq {\lambda_1} \mathbb{E}[\|\Phi'_\mu(v +m_0+\xi)\|^2]\\ &\quad = {\lambda_1}{ \epsilon^{-2}}\int_0^1 \mathbb{E}[{\left |{V'(v(t)+m_0(t)+\xi(t))}\right |^2}]dt. \end{split} \end{equation*} |
The terms involving v+m_0 in the integrand can be controlled by the L^\infty norm, which in turn is controlled by the H^1 norm, while the terms involving \xi can be integrated according to (4.7). As a mapping applied to x , this expression is bounded on bounded subsets of H^1 .
Minimizers will satisfy the ODE
\begin{equation} { \epsilon}^{-1} \mathbb{E}\left[{V'(x+m_0 +\xi)}\right] -x'' = 0,\quad x(0) = x(1) = 0. \end{equation} | (4.8) |
With regard to convexity about a minimizer, m_\star , if, for instance, V'' were pointwise positive definite, then the problem would satisfy (3.15), ensuring convergence. Consider the quartic potential V given by (4.2). In this case,
\begin{equation} \Phi(v) = { \epsilon}^{-1}\int_0^1 \frac{1}{2}v(t)^2 +\frac{1}{4}v(t)^4 dt, \end{equation} | (4.9) |
and
\begin{align*} \Phi'(v+m_0+ \xi) & = { \epsilon}^{-1}\left[{(v+m_0 +\xi) +3(v+m_0 +\xi)^3 }\right]\\ \Phi''(v+m_0 + \xi) & = { \epsilon}^{-1}\left[{1 +3 (v+m_0+\xi)^2}\right],\\ \mathbb{E}[\Phi'(v+m_0 + \xi)]& = { \epsilon}^{-1}\left[{v+m_0 +(v+m_0)^3+ 3 t(1-t) (v+m_0)}\right]\\ \mathbb{E}[\Phi''(v+m_0 + \xi)]& = { \epsilon}^{-1}\left[{1 + 3 (v+m_0)^2 + 3 t(1-t) }\right] \end{align*} |
Since \Phi''(v+m_0+\xi)\geq \epsilon^{-1} , we are guaranteed convergence using expanding trust regions. Taking \epsilon = 0.01 , m_- = 0 and m_+ = 2 , this is illustrated in Figure 3, where we have also solved (4.8) by ODE methods for comparison. As trust regions, we take
\begin{equation} U_n = \left\{{m \in H^1_0(0,1)\mid \left \|{x}\right\|_{H^1}\leq 10+n}\right\}, \end{equation} | (4.10) |
and we always restart at the zero solution Figure 3 also shows robustness to discretization; the number of truncations is relatively insensitive to \Delta t .
For many problems of interest, we do not have global convexity. Consider the double well potential (4.3), but in the case of paths,
\begin{equation} \Phi(u) = { \epsilon^{-1}}\int_0^1\frac{1}{4} (4-v(t)^2)^2dt. \end{equation} | (4.11) |
Then,
\begin{align*} \Phi'(v + m_0 + \xi)& = { \epsilon}^{-1}\left[{(v + m_0 + \xi)^3 - 4 (v + m_0 + \xi)}\right]\\ \Phi''(v + m_0 + \xi) & = { \epsilon}^{-1}\left[{3 (v + m_0 + \xi)^2 - 4}\right],\\ \mathbb{E}[\Phi'(v+m_0 + \xi)]& = { \epsilon}^{-1}\left[{(v+m_0)^3 + 3 t(1-t) (v+m_0)-4(v+m_0)}\right]\\ \mathbb{E}[\Phi''(v+m_0 + \xi)]& = { \epsilon}^{-1}\left[{3(v+m_0)^2 + 3 t(1-t) -4}\right] \end{align*} |
Here, we take m_- = 0 , m_+ = 2 , and \epsilon = 0.01 . We have plotted the numerically solved ODE in Figure 4. Also plotted is \mathbb{E}[\Phi''(v_\star +m_0+ \xi)] . Note that \mathbb{E}[\Phi''(v_\star +m_0+ \xi)] is not sign definite, becoming as small as -400 . Since C_0 has \lambda_1 = 1/\pi^2 \approx 0.101 , (3.15) cannot apply.
Discretizing the Schrödinger operator
\begin{equation} J''(v_\star) = -\frac{d^2}{dt^2} + { \epsilon}^{-1}\left({3(v_\star(t)+m_0(t))^2 + 3 t(1-t) -4}\right), \end{equation} | (4.12) |
we numerically compute the eigenvalues. Plotted in Figure 5, we see that the minimal eigenvalue of J''(m_\star) is approximately \mu_1\approx 550 . Therefore,
\begin{equation} \left\langle {J''(x_\star)u},{u}\right\rangle\geq \mu_1 \left \|{u}\right\|^2_{L^2}\Rightarrow \left\langle {J''(x)u},{u}\right\rangle\geq \alpha\left \|{u}\right\|_{H^1}^2, \end{equation} | (4.13) |
for all v in some neighborhood of v_\star . For an appropriately selected fixed trust region, the algorithm will converge.
However, we can show that the convexity condition is not global. Consider the path m(t) = 2t^2 , which satisfies the boundary conditions. As shown in Figure 5, this path induces negative eigenvalues.
Despite this, we are still observe convergence. Using the fixed trust region
\begin{equation} U_1 = \left\{{x\in H^1_0(0,1)\mid \left \|{x}\right\|_{H^1}\leq 100}\right\}, \end{equation} | (4.14) |
we obtain the results in Figure 6. Again, the convergence is robust to discretization.
We have shown that the Robbins-Monro algorithm, with both fixed and expanding trust regions, can be applied to Hilbert space valued problems, adapting the finite dimensional proof of [12]. We have also constructed sufficient conditions for which the relative entropy minimization problem fits within this framework.
One problem we did not address here was how to identify fixed trust regions. Indeed, that requires a tremendous amount of a priori information that is almost certainly not available. We interpret that result as a local convergence result that gives a theoretical basis for applying the algorithm. In practice, since the root is likely unknown, one might run some numerical experiments to identify a reasonable trust region, or just use expanding trust regions. The practitioner will find that the algorithm converges to a solution, though perhaps not the one originally envisioned. A more sophisticated analysis may address the convergence to a set of roots, while being agnostic as to which zero is found.
Another problem we did not address was how to optimize not just the mean, but also the covariance in the Gaussian. As discussed in [15], it is necessary to parameterize the covariance in some way, which will be application specific. Thus, while the form of the first variation of relative entropy with respect to the mean, (3.7), is quite generic, the corresponding expression for the covariance will be specific to the covariance parameterization. Additional constraints are also necessary to guarantee that the parameters always induce a covariance operator. We leave such specialization as future work.
This work was supported by US Department of Energy Award DE-SC0012733. This work was completed under US National Science Foundation Grant DMS-1818716. The authors would like to thank J. Lelong for helpful comments, along with anonymous reviewers whose reports significantly impacted our work.
The authors declare that there is no conflicts of interest in this paper.
[1] |
Z. W. Liu, L. X. Li, J. Yi, S. K. Li, Z. H. Wang, G. Wang, Influence of heat treatment conditions on bending characteristics of 6063 aluminum alloy sheets, T. Nonferr. Metal. Soc., 27 (2017), 1498–1506. doi: 10.1016/s1003-6326(17)60170-5. doi: 10.1016/s1003-6326(17)60170-5
![]() |
[2] |
S. Bingol, A. Bozaci, Experimental and Numerical Study on the Strength of Aluminum Extrusion Welding, Materials (Basel), 8 (2015), 4389-4399. doi: 10.3390/ma8074389. doi: 10.3390/ma8074389
![]() |
[3] |
L. Donati, L. Tomesani, The effect of die design on the production and seam weld quality of extruded aluminum profiles, J. Mater. Process. Technol., 164-165 (2005), 1025–1031. doi: 10.1016/j.jmatprotec.2005.02.156. doi: 10.1016/j.jmatprotec.2005.02.156
![]() |
[4] | C. T. Mgonja, A review on effects of hazards in foundries to workers and environment, IJISET: Int. J. Innov. Sci. Eng. Technol., 4 (2017), 326–334. |
[5] |
J. Ahmed, B. Gao, W. l. Woo, Sparse low-rank tensor decomposition for metal defect detection using thermographic imaging diagnostics, IEEE T. Ind. Inform., 17 (2020), 1810–1820. doi: 10.1109/TⅡ.2020.2994227. doi: 10.1109/TⅡ.2020.2994227
![]() |
[6] |
Q. Luo, B. Gao, W. l. Woo, Y. Yang, Temporal and spatial deep learning network for infrared thermal defect detection, NDT & E. Int., 108 (2019), 102164. doi: 10.1016/j.ndteint.2019.102164. doi: 10.1016/j.ndteint.2019.102164
![]() |
[7] |
B. Z. Hu, B. Gao, W. l. Woo, L. F. Ruan, J. K. Jin, A Lightweight Spatial and Temporal Multi-Feature Fusion Network for Defect Detection, IEEE T. Image Process., 30 (2020), 472–486. doi: 10.1109/TIP.2020.3036770. doi: 10.1109/TIP.2020.3036770
![]() |
[8] |
J. Ahmed, B. Gao, W. l. Woo, Y. Zhu, Ensemble Joint Sparse Low-Rank Matrix Decomposition for Thermography Diagnosis System, IEEE T. Ind. Electronics, 68 (2020), 2648–2658. doi: 10.1109/TIE.2020.2975484. doi: 10.1109/TIE.2020.2975484
![]() |
[9] |
J. Sun, C. Li, X. J. Wu, V. Palade, W. Fang, An effective method of weld defect detection and classification based on machine vision, IEEE T. Ind. Inform., 15 (2019), 6322–6333. doi: 10.1109/TⅡ.2019.2896357. doi: 10.1109/TⅡ.2019.2896357
![]() |
[10] |
Z. F. Zhang, G. R. Wen, S. B. Chen, Weld image deep learning-based on-line defects detection using convolutional neural networks for Al alloy in robotic arc welding, J. Manuf. Process., 45 (2019), 208–216. Doi: 10.1016/j.jmapro.2019.06.023. doi: 10.1016/j.jmapro.2019.06.023
![]() |
[11] |
Y. Q. Bao, K. C. Song, J. Liu, Y. Y. Wang, Y. H. Yan, H. Yu, et al., Triplet-Graph Reasoning Network for Few-shot Metal Generic Surface Defect Segmentation, IEEE Trans. Instrum. Meas., 70 (2021). doi: 10.1109/TIM.2021.3083561. doi: 10.1109/TIM.2021.3083561
![]() |
[12] |
S. Fekri-Ershad, F. Tajeripour, Multi-resolution and noise-resistant surface defect detection approach using new version of local binary patterns, Appl. Artif. Intell., 31 (2017), 395–410. doi: 10.1080/08839514.2017.1378012. doi: 10.1080/08839514.2017.1378012
![]() |
[13] |
P. Y. Jong, C. S. Woosang, K. Gyogwon, S. K. Min, L. Chungki, J. L. Sang, Automated defect inspection system for metal surfaces based on deep learning and data augmentation, J. Manuf. Syst., 55 (2020), 317–324. doi: 10.1016/j.jmsy.2020.03.009. doi: 10.1016/j.jmsy.2020.03.009
![]() |
[14] |
K. Ihor, M. Pavlo, B. Janette, B. Jakub, Steel surface defect classification using deep residual neural network, Metals, 10 (2020), 846. doi: 10.3390/met10060846. doi: 10.3390/met10060846
![]() |
[15] |
S. H. Guan, M. Lei, H. Lu, A steel surface defect recognition algorithm based on improved deep learning network model using feature visualization and quality evaluation, IEEE Access, 8 (2020), 49885–49895. doi: 10.1109/ACCESS.2020.2979755. doi: 10.1109/ACCESS.2020.2979755
![]() |
[16] |
B. Zhang, M. M. Liu, Y. Z. Tian, G. Wu, X. H. Yang, S. Y. Shi, et al., Defect inspection system of nuclear fuel pellet end faces based on machine vision, J. Nucl. Sci. Technol., 57 (2020), 617–623. doi: 10.1080/00223131.2019.1708827. doi: 10.1080/00223131.2019.1708827
![]() |
[17] |
Z. H. Liu, H. B. Shi, X. F. Zhou, Aluminum Profile Type Recognition Based on Texture Features, Appl. Mech. Mater., 556–562 (2014), 2846–2851. doi: 10.4028/www.scientific.net/AMM.556-562.2846. doi: 10.4028/www.scientific.net/AMM.556-562.2846
![]() |
[18] |
A. Chondronasios, I. Popov, I, Jordanov., Feature selection for surface defect classification of extruded aluminum profiles, Int. J. Adv. Manuf. Technol., 83 (2015), 33–41. doi: 10.1007/s00170-015-7514-3. doi: 10.1007/s00170-015-7514-3
![]() |
[19] | A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. |
[20] |
Q. H. Li, D. Liu, Aluminum Plate Surface Defects Classification Based on the BP Neural Network, Appl. Mech. Mater., 734 (2015), 543–547. doi: 10.4028/www.scientific.net/AMM.734.543. doi: 10.4028/www.scientific.net/AMM.734.543
![]() |
[21] |
R. F. Wei, Y. B. Bi, Research on Recognition Technology of Aluminum Profile Surface Defects Based on Deep Learning, Materials (Basel), 12 (2019), 1681. doi: 10.3390/ma12101681. doi: 10.3390/ma12101681
![]() |
[22] |
F. M. Neuhauser, G. Bachmann, P. Hora, Surface defect classification and detection on extruded aluminum profiles using convolutional neural networks, Int. J. Mater. Form., 13 (2019), 591–603. doi: 10.1007/s12289-019-01496-1. doi: 10.1007/s12289-019-01496-1
![]() |
[23] |
D. F. Zhang, K. C. Song, J. Xu, Y. He, Y. H. Yan, Unified detection method of aluminium profile surface defects: Common and rare defect categories, Opt. Lasers Eng., 126 (2020), 105936. doi: 10.1016/j.optlaseng.2019.105936. doi: 10.1016/j.optlaseng.2019.105936
![]() |
[24] |
R. X. Chen, D. Y. Cai, X. L. Hu, Z. Zhan, S. Wang, Defect Detection Method of Aluminum Profile Surface Using Deep Self-Attention Mechanism under Hybrid Noise Conditions, IEEE Trans. Instrum. Meas., (2021). doi: 10.1109/TIM.2021.3109723. doi: 10.1109/TIM.2021.3109723
![]() |
[25] |
J. Liu, K. C. Song, M. Z. Feng, Y. H. Yan, Z. B. Tu, L. Liu, Semi-supervised anomaly detection with dual prototypes autoencoder for industrial surface inspection, Opt. Lasers Eng., 136 (2021), 106324. doi: 10.1016/j.optlaseng.2020.106324. doi: 10.1016/j.optlaseng.2020.106324
![]() |
[26] |
C. M. Duan, T. C. Zhang, Two-Stream Convolutional Neural Network Based on Gradient Image for Aluminum Profile Surface Defects Classification and Recognition, IEEE Access, 8 (2020), 172152-172165. doi: 10.1109/ACCESS.2020.3025165. doi: 10.1109/ACCESS.2020.3025165
![]() |
[27] |
Y. L. Yu, F. X. Liu, A Two-Stream Deep Fusion Framework for High-Resolution Aerial Scene Classification, Comput. Intell. Neurosci., 2018 (2018), 8639367. doi: 10.1155/2018/8639367. doi: 10.1155/2018/8639367
![]() |
[28] |
C. Khraief, F. Benzarti, H. Amiri, Elderly fall detection based on multi-stream deep convolutional networks, Multimed. Tools Appl., 79 (2020), 19537–19560. doi: 10.1007/s11042-020-08812-x. doi: 10.1007/s11042-020-08812-x
![]() |
[29] |
W. Ye, J. Cheng, F. Yang, Y. Xu, Two-Stream Convolutional Network for Improving Activity Recognition Using Convolutional Long Short-Term Memory Networks, IEEE Access, 7 (2019), 67772–67780. doi: 10.1109/ACCESS.2019.2918808. doi: 10.1109/ACCESS.2019.2918808
![]() |
[30] |
Q. S. Yan, D. Gong, Y. N. Zhang, Two-Stream Convolutional Networks for Blind Image Quality Assessment, IEEE Trans. Image Process., 28 (2019), 2200–2211. doi: 10.1109/TIP.2018.2883741. doi: 10.1109/TIP.2018.2883741
![]() |
[31] |
T. Zhang, H. Zhang, R. Wang, Y. D. Wu, A new JPEG image steganalysis technique combining rich model features and convolutional neural networks, Math. Biosci. Eng., 16 (2019), 4069–4081. doi: 10.3934/mbe.2019201. doi: 10.3934/mbe.2019201
![]() |
[32] |
M. Uno, X. H. Han, Y. W. Chen, Comprehensive Study of Multiple CNNs Fusion for Fine-Grained Dog Breed Categorization, 2018 IEEE Int. Sym. Multim. (ISM), (2018), 198–203. doi: 10.1109/ISM.2018.000-7. doi: 10.1109/ISM.2018.000-7
![]() |
[33] | T. Akilan, Q. J. Wu, H. Zhang, Effect of fusing features from multiple DCNN architectures in image classification, IET Image Process., 12 (2018), 1102–1110. |
[34] | D. J. Li, H. T. Guo, B. M. Zhang, C. Zhao, D. H. Yu, Double vision full convolution network for object extraction in remote sensing imagery, J. Image Graph., 25 (2020), 0535–0545. |
[35] | M. Lin, Q. Chen, S. Yan, Network In Network, arXiv preprint arXiv: 1312. 4400(2013). |
[36] | K. M. He, X. Zhang, S. Q. Ren, J. Sun, Deep residual learning for image recognition, Proc. IEEE confer. Computer vis. Pattern recognit., (2016), 770–778. |
[37] | C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., Going deeper with convolutions, Proc. IEEE confer. Computer vis. Pattern recognit., (2015), 1–9. |
[38] | K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv preprint arXiv: 1409. 1556 (2014). |
[39] | Y. Lecun, Y. Bengio, Convolutional Networks for Images, Speech, and Time-Series, The Handbook of Brain Theory & Neural Networks, 3361 (10), 1995. |
[40] |
V. Suarez-Paniagua, I. Segura-Bedmar, Evaluation of pooling operations in convolutional architectures for drug-drug interaction extraction, BMC Bioinformatics, 19 (2018), 209. doi: 10.1186/s12859-018-2195-1. doi: 10.1186/s12859-018-2195-1
![]() |
[41] |
X. L. Zhang, J. F. Xu, J. Yang, L. Chen, H. B. Zhou, X. J. Liu, et al., Understanding the learning mechanism of convolutional neural networks in spectral analysis, Anal Chim Acta, 1119 (2020), 41–51. doi: 10.1016/j.aca.2020.03.055. doi: 10.1016/j.aca.2020.03.055
![]() |
[42] |
S. W. Kwon, I. J. Choi, J. Y. Kang, W. I. Jang, G. H. Lee, M. C. Lee, Ultrasonographic Thyroid Nodule Classification Using a Deep Convolutional Neural Network with Surgical Pathology, J. Digit. Imaging, 33 (2020), 1202–1208. doi: 10.1007/s10278-020-00362-w. doi: 10.1007/s10278-020-00362-w
![]() |
[43] |
G. E. Dahl, T. N. Sainath, G. E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, 2013 IEEE Int. Conf. Acoustics, IEEE, 2013. doi: 10.1109/ICASSP.2013.6639346. doi: 10.1109/ICASSP.2013.6639346
![]() |
[44] | N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, J. Mach. Learn. Res., 15 (2014), 1929–1958. |
[45] | S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Int. Conf. Mach. Learn., PMLR, (2015), pp. 448–456. |
[46] | V. Nair, G. E. Hinton, Rectified linear units improve restricted boltzmann machines, lcml, 2010. |
[47] |
P. Li, X. Liu, Bilinear interpolation method for quantum images based on quantum Fourier transform, Int. J. Quantum Inf., 16 (2018), 1850031. doi: 10.1142/S0219749918500314. doi: 10.1142/S0219749918500314
![]() |
[48] | D. Y. Han, Comparison of commonly used image interpolation methods, Proc. 2nd Int. Conf. Comput. Sci. Electron. Eng. (ICCSEE 2013), 10 (2013). |
[49] |
X. Wang, X. Jia, W. Zhou, et al., Correction for color artifacts using the RGB intersection and the weighted bilinear interpolation, Appl. Opt., 58 (2019), 8083–8091. doi: 10.1364/AO.58.008083. doi: 10.1364/AO.58.008083
![]() |
[50] |
J. F. Dou, Q. Qin, Z. M. Tu, Image fusion based on wavelet transform with genetic algorithms and human visual system, Multimed. Tools Appl., 78 (2018), 12491–12517. doi: 10.1007/s11042-018-6756-0. doi: 10.1007/s11042-018-6756-0
![]() |
[51] |
H. M. Lu, L. F. Zhang, S. Serikawa, Maximum local energy: An effective approach for multisensor image fusion in beyond wavelet transform domain, Comput. Math. Appl. 64 (2012), 996–1003. doi: 10.1016/j.camwa.2012.03.017. doi: 10.1016/j.camwa.2012.03.017
![]() |
[52] | B. Zhang, Study on image fusion based on different fusion rules of wavelet transform, 2010 3rd Int. Conf. Adv. Comput. Theo. Eng. (ICACTE), Vol. 3. IEEE, 2010. doi: 10.1109/ICACTE.2010.5579586. |
[53] | S. L. Liu, Z. J. Song, M. N. Wang, WaveFuse: A Unified Deep Framework for Image Fusion with Discrete Wavelet Transform, arXiv preprint arXiv: 2007. 14110(2020). |
[54] |
D. Kusumoto, M. Lachmann, T. Kunihiro, S. Yuasa, Y. Kishino, M. Kimura, et al., Automated Deep Learning-Based System to Identify Endothelial Cells Derived from Induced Pluripotent Stem Cells, Stem Cell Rep., 10 (2018), 1687–1695. doi: 10.1016/j.stemcr.2018.04.007. doi: 10.1016/j.stemcr.2018.04.007
![]() |
[55] |
Su. P, Guo. S, Roys. S, F. Maier, H. Bhat, J. Zhuo, et al., Transcranial MR Imaging-Guided Focused Ultrasound Interventions Using Deep Learning Synthesized CT, AJNR Am. J. Neuroradiol., 41 (2020), 1841–1848. doi: 10.3174/ajnr.A6758. doi: 10.3174/ajnr.A6758
![]() |
[56] |
S. J. Pan, Q. Yang, A Survey on Transfer Learning, IEEE Trans. Knowl. Data Eng., 22 (2010), 1345–1359. doi: 10.1109/TKDE.2009.191. doi: 10.1109/TKDE.2009.191
![]() |
[57] |
S. Medghalchi, C. F. Kusche, E. Karimi, U. Kerzel, S. K. Kerzel, et al., Damage Analysis in Dual-Phase Steel Using Deep Learning: Transfer from Uniaxial to Biaxial Straining Conditions by Image Data Augmentation, JOM, 72 (2020), 4420–4430. doi: 10.1007/s11837-020-04404-0. doi: 10.1007/s11837-020-04404-0
![]() |
[58] |
X. R. Yu, X. M. Wu, C. B. Luo, P. Ren, Deep learning in remote sensing scene classification: a data augmentation enhanced convolutional neural network framework, GISci. Remote Sens., 54 (2017), 741–758. doi: 10.1080/15481603.2017.1323377. doi: 10.1080/15481603.2017.1323377
![]() |
[59] |
A. Taheri-Garavand, H. Ahmadi, M. Omid, S. S. Mohtasebi, K. Mollazade, G. M. Carlomagno, et al., An intelligent approach for cooling radiator fault diagnosis based on infrared thermal image processing technique, Appl. Therm. Eng., 87 (2015), 434–443. doi: 10.1016/j.applthermaleng.2015.05.038. doi: 10.1016/j.applthermaleng.2015.05.038
![]() |
[60] | M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, C. Pal, The Importance of Skip Connections in Biomedical Image Segmentation, Deep learning and data labeling for medical applications, Springer, Cham, 2016. 179–187. doi: 10.1007/978-3-319-46976-8_19. |
[61] |
Y-Lan. Boureau, Bach. F, Y. LeCun, Ponce. J, Learning mid-level features for recognition, 2010 IEEE Computer Society Conf. Comput. Vis. Pattern Recognit., IEEE, (2010), 2559–2566. doi: 10.1109/CVPR.2010.5539963. doi: 10.1109/CVPR.2010.5539963
![]() |