The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
Citation: Jiaxin Zhang, Hoang Tran, Guannan Zhang. Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy[J]. Electronic Research Archive, 2021, 29(6): 4119-4135. doi: 10.3934/era.2021075
[1] | Hongxia Lin, Sabana, Qing Sun, Ruiqi You, Xiaochuan Guo . The stability and decay of 2D incompressible Boussinesq equation with partial vertical dissipation. Communications in Analysis and Mechanics, 2025, 17(1): 100-127. doi: 10.3934/cam.2025005 |
[2] | Chunyou Sun, Junyan Tan . Attractors for a Navier–Stokes–Allen–Cahn system with unmatched densities. Communications in Analysis and Mechanics, 2025, 17(1): 237-262. doi: 10.3934/cam.2025010 |
[3] | Zhigang Wang . Serrin-type blowup Criterion for the degenerate compressible Navier-Stokes equations. Communications in Analysis and Mechanics, 2025, 17(1): 145-158. doi: 10.3934/cam.2025007 |
[4] | Reinhard Racke . Blow-up for hyperbolized compressible Navier-Stokes equations. Communications in Analysis and Mechanics, 2025, 17(2): 550-581. doi: 10.3934/cam.2025022 |
[5] | Haifa El Jarroudi, Mustapha El Jarroudi . Asymptotic behavior of a viscous incompressible fluid flow in a fractal network of branching tubes. Communications in Analysis and Mechanics, 2024, 16(3): 655-699. doi: 10.3934/cam.2024030 |
[6] | Andrea Brugnoli, Ghislain Haine, Denis Matignon . Stokes-Dirac structures for distributed parameter port-Hamiltonian systems: An analytical viewpoint. Communications in Analysis and Mechanics, 2023, 15(3): 362-387. doi: 10.3934/cam.2023018 |
[7] | Antoine Bendimerad-Hohl, Ghislain Haine, Laurent Lefèvre, Denis Matignon . Stokes-Lagrange and Stokes-Dirac representations of N-dimensional port-Hamiltonian systems for modeling and control. Communications in Analysis and Mechanics, 2025, 17(2): 474-519. doi: 10.3934/cam.2025020 |
[8] | Rui Sun, Weihua Deng . A generalized time fractional Schrödinger equation with signed potential. Communications in Analysis and Mechanics, 2024, 16(2): 262-277. doi: 10.3934/cam.2024012 |
[9] | Hilal Essaouini, Pierre Capodanno . Analysis of small oscillations of a pendulum partially filled with a viscoelastic fluid. Communications in Analysis and Mechanics, 2023, 15(3): 388-409. doi: 10.3934/cam.2023019 |
[10] | İrem Akbulut Arık, Seda İğret Araz . Delay differential equations with fractional differential operators: Existence, uniqueness and applications to chaos. Communications in Analysis and Mechanics, 2024, 16(1): 169-192. doi: 10.3934/cam.2024008 |
The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
As it is well-known that the cylindrically symmetric Navier-Stokes equations take the form
ρt+(rρu)rr=0, | (1.1) |
ρ(ut+uur)−ρv2r+Pr=(λ(ru)rr)r−2uμrr, | (1.2) |
ρ(vt+uvr)+ρuvr=(μvr)r+2μvrr−(μv)rr−uvr2, | (1.3) |
ρ(wt+uwr)=(μwr)r+μwrr, | (1.4) |
ρ(e+uer)+P(ru)rr=(κrθr)rr+Q, | (1.5) |
where ρ(r,t) is the density, u(r,t), v(r,t), and w(r,t) are velocities in different directions, θ(r,t) is the temperature, the pressure P and the internal energy e are related with the density and temperature
P=P(ρ,θ)=Rρθande=e(ρ,θ)=cvθ, | (1.6) |
the specific gas constant R and the specific heat at constant volume cv are positive constants, respectively; the symbol Q denotes
Q=λ(ru)2rr2−4μuurr+μw2r+μ(vr−vr)2, | (1.7) |
μ and λ are viscosity coefficients, and κ is the heat conductivity coefficient.
Without loss of generality, we shall consider the system (1.1) with the following initial-boundary data:
{(ρ,u,v,w,θ)|t=0=(ρ0,u0,v0,w0,θ0)(r),0<a≤r≤b<∞,(u,v,w,∂rθ)|r=a=(u,v,w,∂rθ)|r=b=0,t≥0. | (1.8) |
Our main goal is to show the large-time behavior of global solutions to the initial-boundary value problem (1.1)–(1.8) with large initial data. For this purpose, it is convenient to transform the initial-boundary value problem (1.1)–(1.8) into Lagrangian coordinates. We introduce the Lagrangian coordinates (t,x) and denote (˜ρ,˜u,˜v,˜w,˜θ)(t,x)=(ρ,u,v,w,θ)(t,r), where
r=r(t,x)=r0(x)+∫t0u(s,r(s,x))ds, | (1.9) |
and
r0(x):=f−1(x),f(r):=∫rayρ0(y)dz. |
Note that the function f is invertible on [a,b] provided that ρ0(y)>0 for each y∈[a,b] (which will be assumed in Theorem 1.1). Due to (1.1)1 and (1.8), we see
∂∂t∫r(t,x)ayρ(t,y)dy=0. |
Then it is easy to check
∫r(t,x)ayρ(t,y)dy=f(r0(x))=xand∫r(t,1)byρ(t,y)dy=0, | (1.10) |
which translates the domain [0,T]×[a,b] into [0,T]×[0,1]. Hereafter, we denote (˜ρ,˜u,˜v,˜w,˜θ) by (ρ,u,v,w,θ) for simplicity. The identities (1.9) and (1.10) imply
rt(t,x)=u(t,x),rx(t,x)=r−1τ(t,x), | (1.11) |
where τ:=ρ−1 is the specific volume. By means of identities (1.11), system (1.1)–(1.8) is changed to
τt=(ru)x, | (1.12) |
ut−v2r+rPx=r(λ(ru)xτ)x−2uμx, | (1.13) |
vt−uvr=r(μrvxτ)x+2μvx−(μv)x−μτvr2, | (1.14) |
wt=r(μrwxτ)x+μwx, | (1.15) |
et+P(ru)x=(κr2θxτ)x+Q, | (1.16) |
where t>0, x∈Ω=(0,1), P=Rθτ, e=Rθγ−1, and Q=λ(ru)2xτ−4μuux+μr2w2xτ+μτ(rvxτ−vr)2.
Throughout this paper, we assume that μ,λ, and κ are power functions of absolute temperature as follows:
μ=˜μθα,λ=˜λθα,κ=˜κθβ, | (1.17) |
where constants ˜μ,˜λ,˜κ,α, and β are positive constants.
The objective of this paper is to study the global existence and stability of the solutions to an initial-boundary value problem of (1.12)–(1.16) with the initial data:
(τ,u,v,w,θ)(x,0)=(τ0,u0,v0,w0,θ0),x∈(0,1), | (1.18) |
and the boundary conditions:
(u,v,w,θx)(0,t)=(u,v,w,θx)(1,t)=0,t≥0. | (1.19) |
Using Navier-Stokes equations as a model for describing fluid motion has been widely accepted by the physics community. In recent years, some significant progress has been made in the study of Navier-Stokes equations with constant viscosity coefficients. When the initial value has a certain small property and vacuum state does not exist, the global existence, uniqueness, and large-time behavior of the solutions can be easily calculated [1,2,3,4,5,6,7,8]. However, solving the problem of large initial values is very challenging, and the first significant breakthrough was achieved by Lions [9]. Besides, by assuming that the initial value is only sufficiently small in the energy space, Hoff [10,11] confirmed the existence of global weak solutions. In the process of studying fluid motion, a vacuum state is often involved, which makes calculations far more complex. The results in [12,13] indicate the Cauchy problem of Navier-Stokes equations with constant coefficients containing vacuum state is not appropriate. This uncertainty is reflected by the fact that the solutions of the system have no continuous dependence on the initial values. Based on physical considerations, Liu-Xin-Yang [12] studied the Cauchy problem of the Navier-Stokes equations with density dependent viscosity, and proved its local suitability. However, only when the temperature and density change within a suitable range, real fluids can be considered as ideal fluids (viscosity coefficients are constants). In the case of large changes in temperature or density, the viscosity of the real fluid will vary greatly [14].
On the other hand, Navier-Stokes equations can be developed using the Chapman-Enskog expansion of the microscopic particle collision model Boltzman equation. Consequently, it can be determined that the viscosity depends on the temperature. However, compared to the abundant research using classical models, the studies on the physical case using the temperature-dependent viscosity model are lacking. Because the viscosity and heat conductivity are both temperature-dependent, degeneracy and strong non-linearity may appear. Pan-Zhang [15] and Huang-Shi [16] obtained global strong solutions and large-time behavior in bounded domains for one-dimensional Navier-Stokes equations, when α=0 and 0<β<1. The studies of Liu-Yang-Zhao-Zhou [17] and Wan-Wang [18] also acquired global solutions of Navier-Stokes equations in one dimensional and cylindrically symmetrical cases, respectively, with the requirement that |γ−1| was small enough. Wang-Zhao [19] removed the smallness condition of |γ−1|, and established global classical solutions to Navier-Stokes equations in the one-dimensional whole space when μ and κ satisfy:
μ=˜μh(τ)θα,κ=˜κh(τ)θα, |
where α is small enough. In their calculations, the viscosity and heat-conductivity were dependent on temperature and density, and to overcome the difficulties caused by density, the following conditions could not be removed:
‖h(τ)−1τ−1‖L∞(Ω)+‖h(τ)−1τ‖L∞(Ω)≤C. |
This means that estimate of ‖τx‖L2(Ω) can be directly obtained without the upper and lower bounds of density, as long as the coefficient μ−1 or κ−1 appears. However, if h(τ) is constant, then the constants l1=l2=0 and the result of this case cannot be established using the model in [19]. Recently, Sun-Zhang-Zhao [20] considered an initial-boundary value problem of the compressible Navier-Stokes equations for one-dimensional viscous and heat-conducting ideal polytropic fluids with temperature-dependent transport coefficients, and discovered the global-in-time existence of strong solutions. In that paper, the initial data could be large if α≥0 is small and the growth exponent β≥0 is arbitrarily large. It is worth mentioning that the smallness of α>0 depends on the size of the initial data. However, unfortunately the study did not provide a specific relationship between α and the initial data in [20]. Our main results are concluded as follows.
Theorem 1.1. For given positive constants M0,V0>0, assume that
‖(τ0,u0,v0,w0,θ0)‖H2(Ω)≤M0,infx∈(0,1){τ0,θ0}≥V0. | (1.20) |
Then there exist ϵ0>0 and C0 which depend only on β, M0, and V0, such that the initial-boundary value problem (1.12)–(1.19) with 0≤α≤ϵ0:=min{|α1|,|α2|} and β>0 admit a unique global-in-time strong solution (τ,u,v,w,θ) on [0,1]×[0,+∞) satisfying
C−10≤τ(x,t)≤C0,C−11≤θ(x,t)≤C1, |
and
(τ−ˉτ,u,v,w,θ−E0)∈C([0,+∞);H2(Ω)), |
where α1,α2 defined in what follows are dependent only on β, M0, and V0 (see details in (3.2), (3.5), and (3.6)). Moreover, for any t>0, the exponential decay rate is
‖(τ−ˉτ,u,v,w,θ−E0)‖2H1+‖r−ˉr‖2H2≤Ce−γ0t, | (1.21) |
where
ˉτ=∫10τdx,E0=∫10(θ0+u20+v20+w202cv)dx,ˉr=[a2+2ˉτx]12. |
A few remarks are in order.
Remark 1. For k=1,2 and 1≤p≤∞, we adopt the simplified nations for the standard Sobolev space as follows:
‖⋅‖:=‖⋅‖L2(Ω),‖⋅‖k:=‖⋅‖Hk(Ω),‖f‖∞:=maxx∈Ω|f(x)|,‖⋅‖Lp:=‖⋅‖Lp(Ω). |
Remark 2. We remark here that the growth exponent β∈(0,+∞) can be arbitrarily large, and the choice of ϵ0>0 depends only on β, V0, and the H2−norm of the initial data. An outline of this paper is as follows. We devote Section 2 to a discussion of a number of a priori estimates independent of time, which are needed to extend the local solution to all time. Based on the previous estimates, the main results, Theorem 1.1 are proved in Section 3.
Remark 3. In this paper, the positive c, C, and Ci(i=0,1,⋯,16) are some positive constants which depend only on β, M0, and V0, but not on the time t. Furthermore, c and C are different from line to line.
First of all, define
X(t1,t2;m1,m2;N):={(τ,u,v,w,θ)∈C([t1,t2];H2(Ω)),τx∈L2(t1,t2;H1(Ω))(ux,vx,wx,θx)∈L2(t1,t2;H2(Ω)),τt∈C([t1,t2];H1(Ω))∩L2(t1,t2;H1(Ω)),(ut,vt,wt,θt)∈C([t1,t2];L2(Ω))∩L2(t1,t2;H1(Ω)),τ≥m1,θ≥m2,E(t1,t2)≤N2,∀(x,t)∈[0,1]×[t1,t2]}, |
where N, m1, m2, and t1,t2(t2>t1) are constants and
E(t1,t2):=supt1≤t≤t2‖(τx,ux,θx)‖21+‖θt‖2+∫t2t1‖θt‖2dt |
with
θt|t=t1:=1cv[−P(ru)x+(κr2θxτ)x+Q]|t=t1, |
θxt|t=t1:=1cv[−P(ru)x+(κr2θxτ)x+Q]x|t=t1. |
The main purpose of this section is to derive the global t-independent estimates of the solutions (τ,u,v,w,θ)∈X(0,T;m1,m2,N).
We start with the following basic energy estimate.
Lemma 2.1. Assume that the conditions listed in Theorem 1.1 hold. Then there exists a constant 0<ϵ1≤1 depending only on M0 and V0, such that if
m−α2≤2,Nα≤2,αH(m1,m2,N)≤ϵ1, | (2.1) |
where
H(m1,m2,N):=(m1+m2+N+1)5, |
then for T≥0,
∫10ηˆθ(τ,u,v,w,θ)(x,t)dx+∫T0∫10[τu2θ+u2x+w2xτθ+θβθ2xτθ2+τθ(rvxτ−vr)2]dxds≤C, | (2.2) |
where
ηˆθ(τ,u,v,w,θ):=ˆθϕ(τˉτ)+u2+v2+w22+cvˆθϕ(θˆθ),ϕ(z):=z−logz−1. |
Proof. Multiplying (1.12)–(1.16) by Rˆθ(ˉτ−1−τ−1), u, v, w, and (1−ˆθθ−1), respectively, integrating over [0,1], and adding them together, one obtains
ddt∫10ηˆθ(τ,u,v,w,θ)dx+∫10[˜κr2θβθ2xτθ2+Qθ]dx=0, | (2.3) |
where Q=λ(ru)2xτ−4μuux+μr2w2xτ+μτ(rvxτ−vr)2.
Apparently, by means of λ=2μ+λ′, one has
λ(ru)2x−4τμuux=(2μ+λ′)r2u2x+(2μ+λ′)τ2u2r2+2λ′τuux=2μr2u2x+2μτ2u2r2+2μ+3λ′3[rux+τur]2−2μ3[rux+τur]2=23μ(r2u2x+τ2u2r2)+2μ+3λ′3[rux+τur]2+2μ3[rux−τur]2 ≥23μ(r2u2x+τ2u2r2). |
Thus, one has
Q≥Cu2xτ+Cτu2+Cw2xτ+Cτ(rvxτ−vr)2, |
which combined with (2.1) and (2.3) yields
ddt∫10ηˆθ(τ,u,v,w,θ)(t,x)dx+c∫10(τu2θ+u2x+w2xτθ+θβθ2xτθ2+τθ(rvxτ−vr)2)dx≤0. | (2.4) |
Integrating (2.4) over (0,T), we can obtain (2.2) by the initial conditions (τ0,u0,v0,θ0).
Next, by means of Lemma 2.1, we derive the upper and lower bounds of τ.
Lemma 2.2. Assume that the conditions of Lemma 2.1 hold. Then for (x,t)∈Ω×[0,∞),
C−10≤τ(x,t)≤C0. |
Proof. The proof is divided into three steps.
Step 1 (Representation of the formula for τ).
It follows from (1.13) that
(ur)t+u2−v2r2+2uμxr+Px=(λ(lnτ)t)x=λ(lnτ)xt+λx(ru)xτ. |
that is
(uλr)t+g+(λ−1P)x=(lnτ)xt, | (2.5) |
where
g:=u2−v2λr2+2uμxλr−(λ−1)xP−λx(ru)xλτ−(λ−1)tur. |
Integrating (2.5) over [0,t]×[x1(t),x], we have
∫xx1(t)(uλr−u0λ0r0)dξ+∫t0∫xx1(t)gdξds+∫t0λ−1P(x)−λ−1P(x1)ds=lnτ(x,t)−lnτ(x1(t),t)−[lnτ0(x)−lnτ(x1(t),0)], | (2.6) |
where x1(t)∈[0,1] is determined by the following progresses. Next, for convenience, we define
F:=(ru)xτ−λ−1P−∫x0g(ξ)dξ,φ:=∫t0F(x,s)ds+∫x0u0λ0r0dξ. |
It follows from the definitions above that
φx=uλr,φt=F. | (2.7) |
By the definition of F and (1.12), one has
∫t0[λ−1P(x1(t),s)+∫x1(t)0g(ξ,s)dξ]ds=∫t0((ru)xτ−F)(x1(t),s)ds=lnτ(x1(t),t)−lnτ(x1(t),0)−∫t0F(x1(t),s)ds. | (2.8) |
Due to (1.12) and (2.7), we have
(τφ)t−(ruφ)x=τφt−ruφx=τF−u2λ=(ru)x−τPλ−τ∫x0g(ξ)dξ−u2λ. | (2.9) |
Integrating (2.9) over [0,t]×Ω, one has
∫10φτdx=∫10τ0∫x0(u0λ0r0)(ξ)dξdx−∫t0∫10[τλP+τ∫x0gdξ+u2λ]dxds. | (2.10) |
Hence, by virtue of the mean value theorem, there exits x1(t)∈[0,1] such that φ(x1(t),t)=∫10φτdx. By the definition of φ, (2.8), and (2.10), one obtains
∫t0F(x1(t),s)ds=φ(x1(t),t)−∫x1(t)0u0λ0r0(ξ)dξ=∫10τ0∫x0u0λ0r0(ξ)dξdx−∫t0∫10(τλP+τ∫x0gdξ+u2λ)dxds−∫x1(t)0u0λ0r0(ξ)dξ. | (2.11) |
Putting (2.11) into (2.8), it follows that
∫t0(Pλ(x1(t),s)+∫x1(t)0g(ξ,s)dξ)ds=lnτ(x1(t),t)−lnτ(x1(t),0)−∫10τ0∫x0u0λ0r0(ξ)dξdx+∫x1(t)0u0λ0r0(ξ)dξ+∫t0∫10(τλP+τ∫x0gdξ+u2λ)dxds. | (2.12) |
Inserting (2.12) into (2.6), we derive
∫t0Pλds+∫t0∫x0gdξds−∫t0∫10(τλP+τ∫x0gdξ+u2λ)dxds+∫xx1(t)(uλr−u0λ0r0)dξ+∫10τ0∫x0u0λ0r0dξdx−∫x1(t)0u0λ0r0dξ=lnτ−lnτ0. | (2.13) |
Let
g=u2−v2λr2+g1, |
where
g1:=2uμxλr−(λ−1)xP−λx(ru)xλτ−(λ−1)tur. |
It follows from (2.13) that
τ=B−1AD, | (2.14) |
where
A:=exp{∫t0[Pλ(x,s)+∫x0(g1(ξ,s)+u2λr2)dξ+∫10τ∫x0(v2λr2−g1)dξdx]ds},B:=exp{∫t0[∫10(τλP+τ∫x0u2λr2(ξ)dξ+u2λ)dx+∫x0v2λr2dξ]ds},D:=τ0exp{∫10τ0∫x0u0λ0r0dξdx−∫x1(t)0u0λ0r0dξ+∫xx1(t)(uλr−u0λ0r0)(ξ)dξ}. |
By (2.14), one has
τD−1B=A. | (2.15) |
Define that
J:=Pλ(x,s)+∫x0(g1(ξ,s)+u2λr2)dξ+∫10τ∫x0(v2λr2−g1)dξdx. |
Then, multiplying (2.15) by J gives
τD−1BJ=ddtA. |
Since A(0)=1, integrating the above equality over (0,t) about time, one has
τ=DB−1+1λ∫t0B(s)B(t)D(t)D(s)τ[Pλ(x,s)+∫x0(g1(ξ,s)+u2λr2)dξ+∫10τ∫x0(v2λr2−g1)dξdx]ds. | (2.16) |
Step 2 (Lower bound for τ). First of all, by means of (2.1) and (2.2), one has
C−1≤D≤C. | (2.17) |
Next, we estimate B. Employing Jensen's inequality to the convex function ϕ, we have
∫10zdx−log∫10zdx−1≤∫10ϕ(z)dx. | (2.18) |
By (2.18) and Lemma 2.1, one obtains
C−1≤∫10τdx,ˉθ:=∫10θdx≤C, | (2.19) |
which means that
C−1≤∫10τλPdx≤C. | (2.20) |
Hence, by means of the definition of B and (2.20), choosing ε1 suitably small, there exist two constants C1 and C2, such that
ec1t≤B(t)≤ec2t. | (2.21) |
That is,
e−c1(t−s)≤B(s)B(t)≤e−c2(t−s). | (2.22) |
Apparently, by means of (2.1) and (2.19), we deduce
|τ∫10τ∫x0g1dξdx|≤C|α|‖τ‖2∞(‖θ−1‖∞‖θx‖‖u‖+‖θ−ατ−1‖∞‖θx‖+‖θ−1τ−1‖∞‖θx‖‖u‖1+‖θ−1τ−1‖∞‖θt‖‖u‖)≤Cε1. | (2.23) |
Similarly, one also has
‖∫x0g1dξ‖∞≤Cε1. | (2.24) |
Thus, for t≤t0<∞,
τ≥DB−1−Cε1∫t0e−c2(t−s)ds=DB−1−Cε1c2(1−e−c2t)≥Ce−ct0−ε2(1−e−c2t0). |
For a enough large t, we have
infx∈Ωτ(x,t)≥C∫t0B(s)B(t)θds−ε2(1−e−c2t). | (2.25) |
So, we need the estimates of θ and B(s)B(t). By the mean value theorem and (2.19), there exits x2(t)∈[0,1], such that
C−1≤θ(x2(t),t)≤C. | (2.26) |
By Cauchy-Schwarz's inequality and (2.19), one has
|[ln(θ+1)]β2+1−[ln(θ(x2(t),t)+1)]β2+1|=|∫xx2(ln(θ+1))β2θx√τ(θ+1)√τ(ξ)dξ|≤(∫10(ln(θ+1))βθ2xτ(θ+1)2dx)12(∫10τdx)12≤C(∫10θβθ2xτθ2dx)1/2, |
which means that
θ≥C−C∫10θβθ2xτθ2dx. | (2.27) |
By (2.16)–(2.17), (2.23)–(2.24), (2.21), Lemma 2.1, and (2.19), one has
∫10τdx≤Ce−ct+C∫t0B(s)B(t)ds, |
that is
∫t0B(s)B(t)ds≥C−Ce−ct. | (2.28) |
Putting (2.27) into (2.25), by (2.22), (2.28), and Lemma 2.1, for a enough large t, one has
∫t0B(s)B(t)θds≥C∫t0B(s)B(t)(1−∫10θβθ2xτθ2dx)ds≥C−Ce−ct−C(∫t/20+∫tt/2)B(s)B(t)∫10θβθ2xτθ2dxds≥C−Ce−ct−C∫t/20e−c(t−s)∫10θβθ2xτθ2dxds−C∫tt/2∫10θβθ2xτθ2dxds≥C−Ce−ct−Ce−ct/2−C∫tt/2∫10θβθ2xτθ2dxds≥C. | (2.29) |
Inserting (2.29) into (2.25), for a large enough time T0, when t>T0, it follows that
infx∈Ωτ(x,t)≥C. |
Step 3 (Upper bound for τ). By (2.17), (2.22)–(2.24), and Lemma 2.1, one obtains
‖τ‖∞≤C+C∫t0e−c2(t−s)‖τ‖∞(∫10θβθ2xτθ2dx+1)ds, | (2.30) |
where we have used the results
{‖θ‖∞≤C+C‖τ‖∞∫10θβθ2xτθ2dxwhen0<β≤1,‖θ‖∞≤C+C∫10θβθ2xτθ2dxwhen1<β<∞. | (2.31) |
In fact, by Hölder's inequality, for 0<β≤1,
|θ1/2(x,t)−θ1/2(x2(t),t)|≤∫10θ−12θxdx≤‖τ‖1/2∞(∫10θβθ2xτθ2dx)1/2(∫10θ1−βdx)1/2≤‖τ‖1/2∞(∫10θβθ2xτθ2dx)1/2. | (2.32) |
For 1<β<∞,
|θβ/2(x,t)−θβ/2(x2(t),t)|≤∫10θβ/2θxθdx≤(∫10θβθ2xτθ2dx)1/2(∫10τdx)1/2. | (2.33) |
By means of (2.26) and (2.32)–(2.33), we can obtain (2.31).
Thus, the inequality (2.30) combined with Gronwall's inequality and Lemma 2.1 yields that for any t≥0,
supt≥0‖τ(x,t)‖∞≤C. |
However, we cannot get the time-space estimate of vx in Lemma 2.1. To obtain this estimate, we need the following result.
Lemma 2.3. Assume that the conditions listed in Lemma 2.1 hold. Then for any p>0 and T≥0,
∫10θ1−pdx+∫T0∫10(θβθ2xθp+1+θα(u2+u2x+w2x)θp+θατθp(rvxτ−vr)2)dxds≤C. | (2.34) |
Proof. By Lemma 2.1, the result of (2.34) has been established for p=1. In the following steps, we do the estimate for p>0 and p≠1. Multiplying (1.16) by θ−p, integrating over [0,1], and using integration by parts gives
cvp−1ddt∫10θ1−pdx+p∫10˜κr2θβθ2xτθp+1dx+∫10Qθpdx=R∫10θ1−pτ(ru)xdx=R∫10θ1−p−E0τ(ru)xdx+RE0∫10(ru)xτdx. | (2.35) |
Apparently, there exists constant C(p) depending on p such that
|θ1−p−E0|≤C(p)|θ1/2−E1/20|(E1/20+θ12−p). | (2.36) |
By means of (2.35), (2.36), Lemma 2.2, (1.13), and (1.12), we deduce
cvp−1ddt∫10θ1−pdx+p∫10˜κr2θβθ2xτθp+1dx+∫10Qθpdx≤C(p)‖θ1/2−E1/20‖∞∫10(E1/20+θ12−p)(|u|+|ux|)dx+RE0ddt∫10lnτdx≤C(p)‖θ1/2−E1/20‖∞[(∫10u2+u2xθdx)12(∫10θdx)12+(∫10θ1−pdx)12(∫10u2+u2xτθpdx)12]+RE0ddt∫10lnτdx≤C(p)‖θ1/2−E1/20‖2∞+C(p)∫10u2+u2xθdx+δ∫10u2+u2xτθpdx+C(δ,p)‖θ1/2−E1/20‖2∞∫10θ1−pdx+RE0ddt∫10lnτdx. | (2.37) |
Thus, employing the truth of
∫t0‖θ1/2−E1/20‖2∞ds≤C, | (2.38) |
we can conclude from (2.37), Grönwall's inequality, and Lemma 2.2 that (2.34) is correct. In fact,
‖θ1/2−E1/20‖∞≤‖θ1/2−ˉθ1/2‖∞+‖ˉθ1/2−E1/20‖∞. | (2.39) |
By virtue of Lemmas 2.1–2.2 and (2.19), one has
|ˉθζ−Eζ0|=|∫10ddη{[∫10(θ+ηu2+v2+w22cv)dx]ζ}dη|=|ζ∫10[∫10(θ+ηu2+v2+w22cv)dx]ζ−1dη∫10(u,v,w)22dx|≤C‖(u,v,w)‖‖(u,v,w)‖∞≤C∫10|(ux,(vr)x,wx)|dx≤C(∫10[u2xθ+1θ(rvxτ−vr)2+w2xθdx])1/2(∫10θdx)1/2≤C(∫10[u2xθ+1θ(rvxτ−vr)2+w2xθ]dx)1/2, | (2.40) |
where we have used the fact that
(vr)x=τr2(rvxτ−vr). |
For β<1, it follows from Lemma 2.1 and (2.19) that
‖θ1/2−ˉθ1/2‖∞≤C∫10θ−12|θx|dx≤C(∫10θβθ2xθ2dx)12(∫10θ1−βdx)12≤C(∫10θβθ2xθ2dx)12. | (2.41) |
For 1≤β<∞,
‖θ12−ˉθ12‖∞≤C‖θβ2−ˉθβ2‖∞≤C∫10θβ2−1|θx|dx≤C(∫10θβθ2xθ2dx)12. | (2.42) |
Hence, by (2.39)–(2.42) and Lemmas 2.1–2.2, we can derive (2.38). The proof of Lemma 2.3 is thus complete.
According to Lemmas 2.1–2.3, we can conclude that the following results have been established.
Corollary 2.1. Assume that the conditions listed in Lemma 2.1 hold. Then for −∞<q<1, 0<p<∞, and T≥0,
C1≤τ≤C,C−1≤∫10τdx≤C,C−1≤∫10θdx≤C,∫10(|lnτ|+|lnθ|+θq+u2+v2+w2)dx≤C3,∫T0∫10[(u2+u2x+v2+v2x+w2x+τ2t)(1+θ−p)+θβθ2xθ1+p]dxds≤C. | (2.43) |
Here, we have taken p=α in (2.34) to obtain the time-space estimates of v and vx.
Using the result above, we establish the following estimate about τx.
Lemma 2.4. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
∫10τ2xdx+∫T0∫10τ2x(1+θ)dxds≤C2. |
Proof. According to the chain rule, one has
(λτxτ)t=(λτtτ)x+λθτ(τxθt−τtθx). | (2.44) |
By means of (1.12), (1.13), and (2.44), we have
(λτxτ)t=utr+Px−v2r2+2uμxr+λθτ(τxθt−τtθx). | (2.45) |
Multiplying (2.45) by λτxτ, integrating over [0,1] about x, and using (1.12) and (2.44), we obtain
ddt∫10[12(λτxτ)2−λuτxrτ]dx+∫10Rλθτ2xτ3dx=∫10(ur)xλτtτdx+∫10Rλτxθxτ2dx+∫10λτx(u2−v2)τr2dx+∫102uμxλτxrτdx+∫10λθτ2(λτx−r−1uτ)(τxθt−τtθx)dx:=5∑i=1Ii. | (2.46) |
By Hölder's inequality, (2.1), (1.12), and Corollary 2.1, one has
I1=∫10(uxr−τur3)λτtτdx≤C‖(u,ux,τt)‖2≤C‖(u,ux)‖2. | (2.47) |
Using Corollary 2.1 and taking p=β, one has
∫T0∫10θ2xθdxds≤C. | (2.48) |
Hence, we argue the term I2 as the following
I2≤δ∫10τ2xθτ3dx+C(δ)∫10θ2xθdx. | (2.49) |
By means of integration by parts, Corollary 2.1, and (2.1), one can derive
I3=−∫10logτ(λr2(u2−v2))xdx≤C‖lnτ‖∞∫10|(|α|θxu2,|α|θxv2,θαu2,θαv2,θαu2x,θαv2x)|dx≤C∫10(u2,v2,u2x,v2x)dx. | (2.50) |
By virtue of (2.1), we derive
I4≤C|α|m−322N(∫10u2dx)1/2(∫10τ2xθτ3dx)1/2≤δ∫10τ2xθτ3dx+C(δ)∫10u2dx. | (2.51) |
By means of (2.1), Corollary 2.1, and (1.16), one can deduce
I5≤C∫10|α||θ−1||(τ2xθt,uτxθt,τxuθx,u2θx,τxuxθx,uuxθx)|dx≤C|α|m−22N∫10τ2xθτ3dx+C|α|m−322N(∫10u2+u2xdx)1/2(∫10τ2xθτ2dx)1/2+C|α|m−12N∫10u2+u2xdx≤ε∫10τ2xθτ3dx+C(ε)∫10u2+u2xdx. | (2.52) |
Inserting (2.47) and (2.49)–(2.52) into (2.46), and choosing ε suitable small, we obtain
ddt∫10[12(λτxτ)2−λuτxrτ]dx+c∫10θτ2xdx≤C‖(u,ux,θx/√θ,v,vx)‖2. | (2.53) |
Integrating (2.53) over [0,t], using Cauchy-Schwarz's inequality, (2.48), and Corollary 2.1, for any t≥0, one has
∫10τ2xdx+∫t0∫10τ2xθdxds≤C. | (2.54) |
By virtue of (2.54), we have
ˉθ∫10τ2xdx=∫10τ2x(ˉθ−θ)dx+∫10τ2xθdx≤ˉθ2∫10τ2xdx+12ˉθ‖θ−ˉθ‖2∞∫10τ2xdx+∫10τ2xθdx≤ˉθ2∫10τ2xdx+C‖θ−ˉθ‖2∞+∫10τ2xθdx. | (2.55) |
It follows from (2.19) and (2.48) that
∫T0‖θ−ˉθ‖2∞ds≤C∫T0∫10θ2xθdx∫10θdxdt≤C. | (2.56) |
Thus, it follows from (2.55)–(2.56) that
∫T0∫10τ2xdxdt≤C. | (2.57) |
The proof of Lemma 2.4 has been completed by (2.54) and (2.57).
Next, based on the estimate of τx, we are devoted to derive the estimates on the first-order derivatives of wx.
Lemma 2.5. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
∫10w2xdx+∫T0∫10w2xxdxdt≤C3. |
Proof. Multiplying (1.15) by wxx and integrating over [0,1] about x, we find from (2.1) and Lemma 2.4 that
12ddt‖wx‖2+∫10μr2w2xxτdx=−∫10rwxxwx(μrτ)xdx−∫10μwxxwxdx≤C∫10|wxxwx|(|α|m−12|θx|+1+|τx|)dx≤ε‖wxx‖2+C(ε)‖wx‖2+C(ε)‖τx‖2‖wx‖2∞≤ε‖wxx‖2+C(ε)‖wx‖2. | (2.58) |
Taking ε suitably small in (2.58) finds
12ddt‖wx‖2+c∫10w2xxdx≤C‖wx‖2. | (2.59) |
The proof of Lemma 2.5 is complete by integrating (2.59) over (0,t) about time and choosing ε suitably small.
Based on the above result, we have the following uniform first-order derivatives estimates on the velocity (u,v).
Lemma 2.6. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
∫10(u2x+v2x+τ2t)dx+∫T0∫10(u2xx+v2xx+θ2x+u2t+v2t+w2t+τ2tx)dxdt≤C4. |
Proof. Multiplying (1.13) and (1.14) by uxx and vxx, respectively, and integrating over Ω about x, by integration by parts, one has
12ddt∫10(u2x+v2x)dx+∫10r2τ(λu2xx+μv2xx)dx=∫10uxxrPxdx+∫10(vxxuvr−uxxv2r)dx−∫10uxxr[(λ(ru)xτ)x−λruxxτ]dx+2∫10uμxuxxdx−∫10vxx[rvx(μrτ)x+2μvx−(μv)x−μτvr2]dx:=5∑i=1IIi. | (2.60) |
By Cauchy-Schwarz's inequality, one has
II1≤ε‖uxx‖2+C(ε)‖(θx,τx)‖2. | (2.61) |
It follows from Sobolev's inequality, the boundary condition of v, and Corollary 2.1, that we have
II2≤ε‖(uxx,vxx)‖2+C(ε)‖v‖2∞‖(u,v)‖2≤ε‖(uxx,vxx)‖2+C(ε)‖vx‖2. | (2.62) |
Direct computation from (2.1) yields
II3≤ε‖uxx‖2+C(ε)∫10[τ2xu2x+(1+|α|m−22θ2x)|(ux,uτx,u)|2]dx≤ε‖uxx‖2+C(ε)‖(ux,u)‖2+C(ε)‖τx‖2‖(ux,u)‖2∞≤2ε‖uxx‖2+C(ε)‖(ux,u)‖2, | (2.63) |
II4≤ε‖uxx‖2+C(ε)|α|N2m−22‖u‖2≤ε‖uxx‖2+C(ε)‖u‖2, | (2.64) |
and
II5≤ε‖vxx‖2+C(ε)∫10[v2x(1+|α|m−22θ2x+τ2x)+v2]dx≤2ε‖vxx‖2+C(ε)‖(vx,v)‖2. | (2.65) |
Putting (2.61)–(2.65) into (2.60) and taking ε suitably small gives
12ddt∫10(u2x+v2x)dx+c∫10(u2xx+v2xx)dx≤C‖(θx,τx,vx,ux,u,v)‖2. | (2.66) |
Integrating (2.66) over (0,T) about time, and using Lemma 2.4 and Corollary 2.1, we find
∫10(u2x+v2x)dx+∫T0∫10(u2xx+v2xx)dxdt≤C+C∫T0∫10θ2xdxdt. | (2.67) |
For β>1, we take p=β−1 in (2.43), and then
∫T0∫10θ2xdxdt≤C. | (2.68) |
Substituting (2.68) into (2.67), it follows for β>1 that
∫10(u2x+v2x)dx+∫T0∫10(u2xx+v2xx+θ2x)dxdt≤C. | (2.69) |
Next, we need to estimate the L2(Ω×(0,t))-norm of θx for 0<β≤1. We deduce from multiplying (1.16) by θ1−β2 and integration by parts that
2cv4−βddt∫10θ2−β2dx+2−β2∫10˜κr2θβ2θ2xτdx=−R∫10θ2−β2τ(ru)xdx+∫10θ1−β2Qdx=R∫10ˉθ2−β2−θ2−β2τ(ru)xdx−Rˉθ2−β2∫10(ru)xτdx+∫10θ1−β2Qdx≤C∫10|ˉθ2−β2−θ2−β2||(u,ux)|dx−Rˉθ2−β2ddt∫10lnτdx+∫10θ1−β2Qdx. | (2.70) |
Notice that
∫10|ˉθ2−β2−θ2−β2||(u,ux)|dx≤C‖ˉθ1−β4−θ1−β4‖∞(∫10(1+θ2−β2)dx)1/2(∫10(u2+u2x)dx)1/2≤C(∫10θ−β4|θx|dx)2+C∫10(1+θ2−β2)dx∫10(u2+u2x)dx≤C∫10θ1−β2dx∫10θ2xθdx+C∫10(1+θ2−β2)dx∫10(u2+u2x)dx≤C∫10θ2xθdx+C∫10(1+θ2−β2)dx∫10(u2+u2x)dx, | (2.71) |
and
∫10θ1−β2Qdx≤C(‖ˉθ1−β2−θ1−β2‖∞+1)∫10(u2+u2x+v2+v2x+w2x)dx≤C∫10θ−β2|θx|dx∫10(u2x+v2x+w2x)dx+C∫10(u2x+v2x+w2x)dx≤C∫10(θ−12+θβ4)|θx|dx∫10(u2x+v2x+w2x)dx+C∫10(u2x+v2x+w2x)dx≤ε∫10θβ2θ2xdx+C(ε)(∫10(u2x+v2x+w2x)dx)2+C∫10(θ2xθ+u2x+v2x+w2x)dx. | (2.72) |
We can conclude from (2.70)–(2.72) that
∫10θ2−β2dx+∫T0∫10θβ/2θ2xdxdt≤C+C∫T0(∫10(u2x+v2x+w2x)dx)2ds, |
which combined with Young's inequality and Corollary 2.1 yields
∫T0∫10θ2xdxdt≤C∫T0∫10θβθ2xθ2dxds+C∫T0∫10θβ/2θ2xdxds≤C+C∫T0(∫10(u2x+v2x+w2x)dx)2dt. | (2.73) |
By means of Lemma 2.5, (2.67), and (2.73), we find for 0<β≤1,
∫10(u2x+v2x)dx+∫T0∫10(u2xx+v2xx+θ2x)dxdt≤C. | (2.74) |
By virtue of (1.12)–(1.16), (2.1), Corollary 2.1, Lemma 2.4, (2.69), and (2.74), it follows that
∫10τ2tdx+∫T0∫10(u2t+v2t+w2t+τ2tx)dxds≤C. | (2.75) |
To obtain the first-order derivative estimate of the temperature, we need to first establish the uniform upper and lower bounds of θ.
Lemma 2.7. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
C−11≤θ≤C1. |
Proof. First of all, multiplying (1.16) by θ, and integrating over [0,1] about x, yields
cv2ddt∫10θ2dx+∫10˜κr2θβθ2xτdx=∫10θQdx−R∫10θ2(ru)xτdx≤C‖(u,ux,v,vx,wx)‖2∞∫10θdx+‖ux‖2∞∫10θ2dx. | (2.76) |
Applying Gronwall's inequality to (2.76), we can obtain
∫10θ2dx+∫T0∫10θβθ2xdxdt≤C. | (2.77) |
Based on the estimate above, we can get the bound of ∫10θβθ2xdx which will be used to obtain the upper bound of θ. Multiplying (1.16) by θβθt and integrating over (0,1) about x, it follows that
cv∫10θβθ2tdx+R∫10θβ+1θt(ru)xτdx−∫10θβθtQdx=∫10(˜κr2θβθxτ)xθβθtdx. | (2.78) |
By integration by parts, one has
∫10(˜κr2θβθxτ)xθβθtdx=−∫10˜κr2θβθxτ(θβθx)tdx=−˜κ2ddt∫10r2τ(θβθx)2dx+˜κ2∫10(2ruτ−ruτ−r3uxτ2)(θβθx)2dx. | (2.79) |
Inserting (2.79) into (2.78), we can deduce that
˜κ2ddt∫10r2τ(θβθx)2dx+cv∫10θβθ2tdx=−R∫10θβ+1θt(ru)xτdx+∫10θβθtQdx+˜κ2∫10(ruτ−r3uxτ2)(θβθx)2dx≤cv2∫10θβθ2tdx+C∫10θβ+2(u2+u2x)dx+C∫10θβ(u4+u4x+v4+v4x+w4x)dx+C‖(u,ux)‖∞∫10(θβθx)2dx≤cv2∫10θβθ2tdx+C‖(u2,u2x,u4,u4x,v4,v4x,w4x)‖∞+C(∫10θβθ2tdx)2. | (2.80) |
By Sobolev's inequality, Corollary 2.1, and Lemmas 2.5–2.6, one can find that
∫T0‖(u2,u2x,u4,u4x,v4,v4x,w4x)‖∞ds≤C. | (2.81) |
By virtue of (2.80), Grönwall's inequality, and (2.81), we can obtain
∫10(θβθx)2dx+∫T0∫10θβθ2tdxds≤C. | (2.82) |
Thanks to (2.82), it follows that
‖θβ+1−ˉθβ+1‖∞≤C(∫10(θβθx)2dx)12≤C. | (2.83) |
That is, for t≥0,
‖θ‖∞≤C. | (2.84) |
Thanks to (2.77) and (2.84), one has
∫T0∫10(θβ+1−ˉθβ+1)2dxdt≤∫T0∫10θ2βθ2xdxdt≤Csup0≤t≤T‖θ‖β∞∫T0∫10θβθ2xdxdt≤C. | (2.85) |
Combining (2.83) and (2.84), one has
∫T0|ddt∫10(θβ+1−ˉθβ+1)2dx|dt≤C∫T0∫10(θβ+1−ˉθβ+1)2dxdt+C∫T0‖θβθt‖2dt≤Csup0≤t≤T‖θ‖β∞∫T0∫10θβθ2xdxdt≤C. | (2.86) |
So, from (2.83), (2.85), and (2.86), one has
limt→+∞∫10(θβ+1−ˉθβ+1)2dx=0. |
From (2.83), when t→+∞,
‖(θβ+1−ˉθβ+1)‖2∞≤C‖(θβ+1−ˉθβ+1)‖‖θβθx‖→0, |
and we can obtain that there exists some time T0≫1 such that when t>T0,
θ(x,t)≥γ12. | (2.87) |
Fixing T0 in (2.87), multiplying (1.16) by θ−p, p>2, and integrating over [0,1] about x yield
cvp−1ddt‖θ−1‖p−1p−1+p∫10˜κr2θβθ2xτθp+1dx+∫10Qθpdx=R∫10θτθp(ru)xdx≤12∫10u2+u2xτθpdx+C‖θ−1‖p−2Lp−1. |
Hence,
ddt‖θ−1‖Lp−1≤C, |
where C is a generic positive constant independent of p. Thus, integrating the above inequality over (0,t) and letting p→∞, we arrive that
θ−1(x,t)≤C(T0+1)↔θ(x,t)≥[C(T0+1)]−1,∀(x,t)∈[0,1]×[T0,+∞). |
The proof of Lemma 2.7 is complete.
Lemma 2.8. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
∫10θ2xdx+∫T0∫10(θ2xx+θ2t)dxds≤C5. |
Proof. Multiplying (1.16) by θxx, integrating over [0,1] on x, and by Hölder's, Poincaré's, and Cauchy-Schwarz's inequalities, Corollary 2.1, Lemma 2.4, and Lemma 2.7, we have
cv2ddt∫10θ2xdx+∫10κr2θ2xxτdx=∫10θxx[Rθτ(ru)x−θx(κr2τ)x−Q]dx≤ε∫10θ2xxdx+C(ε)∫10[θ2(ru)2x−θ2x(κr2τ)2x−Q2]dx≤ε∫10θ2xxdx+C(ε)‖ux‖2‖θ‖2∞+C(ε)‖θx‖2+C(ε)‖θx‖2∞‖τx‖2+C(ε)∫10(u4+v4+u4x+v4x+w4x)dx≤ε‖θxx‖2+C(ε)(‖ux‖2+‖θx‖2+‖θx‖‖θxx‖+‖u‖2‖u‖2+‖v‖2∞‖v‖2)+C(ε)(‖ux‖2∞‖ux‖2+‖vx‖2∞‖vx‖2+‖wx‖2∞‖wx‖2)≤ε‖θxx‖2+C(ε)‖(ux,vx,wx,uxx,vxx,wxx)‖2+C(ε)‖θx‖2. | (2.88) |
Choosing ε suitably small in (2.88) gives
cv2ddt∫10θ2xdx+c∫10θ2xxdx≤C‖(ux,vx,wx)‖21+C‖θx‖2. | (2.89) |
Integrating (2.89) and using Lemmas 2.5–2.6, one has
‖θx(t)‖2+∫T0‖θxx‖2ds≤C. | (2.90) |
Hence, similar to (2.75), by means of (1.16), Corollary 2.1, Lemmas 2.4–2.7, and (2.90), one can deduce that
∫T0∫10θ2tdxdt≤C. |
Next, we derive the second-order derivatives estimates of (τ,u,v,w,θ).
Lemma 2.9. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
∫10(u2t+v2t+w2t+θ2t+u2xx+v2xx+w2xx+θ2xx+τ2xt)dx+∫T0∫10(u2xt+τ2tt+v2xt+w2xt+θ2xt)dxds≤C6. |
Proof. Applying ∂t to (1.13) and multiplying by ut in L2, one has
12ddt∫10u2tdx+∫10˜λr2θαu2xtτdx=∫10ruxt[Pt−(λτ)t(ru)x−λτ((ru)xt−ruxt)]dx−∫10rxut[λ(ru)xτ]tdx+∫10ut[(v2r)t−rtPx+rxPt+rt(λ(ru)xτ)x−2(uμx)t]dx:=3∑i=1IIIi. | (2.91) |
Applying ∂t to (1.14) and multiplying by vt in L2, one has
12ddt∫10v2tdx+∫10˜μr2θαv2xtτdx=∫10{vt[2(μvx)t−rx(μrvxτ)t−(μv)xt]−rvxtvx[μrτ]t}dx+∫10vt[rt(μrvxτ)x−(uvr)t−(μτvr2)t]dx:=5∑i=4IIIi. | (2.92) |
Applying ∂t to (1.15) and multiplying by wt in L2, one has
12ddt∫10w2tdx+∫10˜μr2θαw2xtτdx=∫10wtrt(μrwxτ)xdx−∫10{rwxwxt(μrτ)t+wt[rx(μrwxτ)t−(μwx)t]}dx:=7∑i=6IIIi. | (2.93) |
Adding (2.91)–(2.93) together, we get
12ddt∫10(u2t+v2t+w2t)dx+∫10˜λr2θαu2xt+˜μr2θαv2xt+˜μr2θαw2xtτdx=7∑i=1IIIi. | (2.94) |
Before the computations of III1 to III7, we need to keep in mind the following facts:
‖(u,v,w)‖∞≤C,a≤r≤b,rx=r−1τ,rt=u,rtx=ux,C−1≤τ≤C,C−1≤θ≤C,|(ru)x|≤C|(u,ux)|,|(ru)xt−ruxt|≤C|(u2,ut,uux)|,|(ru)xt|≤C|(u2,ut,uux,uxt)|. |
Then, by Hölder's, Sobolev's, and Cauchy-Schwarz's inequalities, one has
III1≤C∫10|uxt||(θt,τt,θtux,τtux,u,ut,ux)|dx≤ε‖uxt‖2+C(ε)‖(θt,τt,u,ut)‖2+C(ε)‖(θt,τt)‖2∞‖ux‖2≤ε‖uxt‖2+δ‖θxt‖2+C(ε,δ)‖(θt,τt,u,ut,τxt)‖2, | (2.95) |
and
III2≤C∫10|ut||(θt,θtux,u2,ut,ux,uxt,τt,τ2t)|dx≤ε‖uxt‖2+C(ε)‖(ut,θt,u,ux,τt)‖2+ε‖θt‖2∞‖ux‖2+C(ε)‖τt‖2∞‖τt‖2≤ε‖uxt‖2+δ‖θxt‖2+C(ε,δ)‖(ut,θt,u,ux,τt,τxt)‖2. | (2.96) |
By virtue of (1.13), one has
|(λ(ru)xτ)x|≤C|(ut,v2,θx,τx)|. |
Thus, it follows from Hölder's, Sobolev's, and Cauchy-Schwarz's inequalities that
III3≤C∫10|ut||(vt,v2,θx,τx,τt,θt,ut,utθx,θxθt,θxt)|dx≤ε‖θxt‖2+C(ε)‖(vt,ut,v,θx,θt,θt,τx,τt)‖2+ε‖(ut,θt)‖2∞‖θx‖2≤ε‖(uxt,θxt)‖2+C(ε)‖(vt,ut,θt,τt,v,θx,τx)‖2, | (2.97) |
and
III4≤C∫10|vt||(θtvx,vx,vxt,vxτt,θxθt,θxt,θxvt)|dx+C∫10|vxtvx||(θt,v,τt)|dx≤ε2‖(vxt,θxt)‖2+C(ε)‖(vt,vx)‖2+C(ε)‖(θt,τt,vt)‖2∞‖(vx,θx)‖2≤ε‖(vxt,θxt)‖2+C(ε)‖(vt,vx,θt,τt,τtx)‖2. | (2.98) |
It follows from (1.14) that
|(μrvxτ)x|≤C|(vt,v,vx,θx)|. |
Then
III5≤C∫10|vt||(vt,v,θx,vx,ut,θt,τt)|dx≤C‖(vt,v,θx,vx,ut,θt,τt)‖2. | (2.99) |
By virtue of (1.15), we can obtain
III6≤C∫10|wt||u||(μrwxτ)x|dx≤C∫10|wt||(wt,wx)|dx≤C‖(wt,wx)‖2, | (2.100) |
and
III7≤C∫10|wxt||(wxθt,wxτt,wx)|+|wt||(θtwx,τtwx,wx,wxt)|dx≤ε‖wxt‖2+C(ε)‖(wx,wt)‖2+C(ε)‖(θt,τt)‖2∞‖wx‖2≤ε‖(wxt,θxt)‖2+C(ε)‖(wx,wt,θt,τt,τxt)‖2. | (2.101) |
Putting (2.95)–(2.101) into (2.94) gives
12ddt‖(ut,vt,wt)‖2+c‖(uxt,vxt,wxt)‖2≤ε‖(uxt,vxt,wxt,θxt)‖2+C(ε)‖(ut,vt,wt,θt,wx,τx,θx)‖2+C(ε)‖(τt,u,v)‖21. | (2.102) |
Applying ∂t to (1.16) and multiplying by θt in L2, it follows that
cv2ddt∫10θ2tdx+∫10˜κθβr2θ2xtτdx=∫10θt[Qt−(P(ru)x)t]−θxθxt(κr2τ)tdx. | (2.103) |
First of all, by means of the definition of Q, one has
|θtQt|≤C|θt||(u,τt,ut,ux,uxt,uxτt,uxut,u2x,uxuxt)|+C|θt||(θt,θtu2x,τtu2x,θtw2x,w2x,τtw2x,wxwxt,θtv2x,τtv2x)|+C|θt||(v2x,vxt,τtvx,vt,v,vx,vxvxt,vxvt)|≤C(ε)|(θt,u,τt,ut,ux,wx,vx,vt,v,vx)|2+ε|(uxt,wxt,vxt)|2+C(ε)|(τt,ut,ux)|2|(ux,wx,vx)|2+C(ε)|θt|2|(ux,vx,wx)|2. | (2.104) |
Using (2.104) and Sobolev's inequality, we can derive from (2.103) that
cv2ddt∫10θ2tdx+∫10˜κθβr2θ2xtτdx≤C∫10(|θt||(Qt,θt,τt,θtux,τtux,u,ut,ux,uxt)|+|θx||(θxtθt,θxt,θxtτt)|)dx≤C(ε)‖(θt,u,τt,ut,ux,θx,wx,vx,vt,v,vx)‖2+ε‖(uxt,wxt,vxt,θxt)‖2+C(ε)‖(τt,θt,ut,ux)‖2∞‖(ux,wx,vx,θx)‖2+C(ε)‖τt‖2∞‖θt‖2≤C(ε)‖(θt,u,τt,ut,ux,wx,vx,vt,v,τtx,uxx)‖2+ε‖(uxt,wxt,vxt,θxt)‖2+C(ε)‖(ux,vx,wx)‖21‖θt‖2+C(ε)‖τt‖21‖θt‖2. | (2.105) |
Adding (2.102) to (2.105) and choosing ε>0 suitably small, it follows that
12ddt‖(√cvθt,ut,vt,wt)‖2+c‖(uxt,vxt,wxt,θxt)‖2≤C‖(ut,vt,wt,θt,wx,τx,θx)‖2+C‖(ux,τt,v)‖21+C‖(ux,vx,wx)‖21‖θt‖2+C‖τt‖21‖θt‖2. | (2.106) |
By means of (2.106) and Grönwall's inequality, we deduce
‖(ut,vt,wt,θt)‖2+∫T0‖(uxt,vxt,wxt,θxt)‖2ds≤C. | (2.107) |
According to (1.13), one has
λr2uxxτ=ut−v2r+rPx+2uμx−r[(λ(ru)xτ)x−λruxxτ], |
which means that
|uxx|≤C|(ut,v,θx,τx,θxux,τxux,u,ux)|. |
Hence, by means of (2.107), we obtain
‖uxx‖2≤C. |
Similarly, use the equations (1.12)–(1.16), we also can derive
‖(vxx,wxx,θxx,τtx)‖2+∫T0‖τtt‖2ds≤C7. | (2.108) |
Here, we omit the details of (2.108). The proof of Lemma 2.9 is complete.
Lemma 2.10. Assume that the conditions listed in Lemma 2.1 hold. Then for T≥0,
∫10τ2xxdx+∫T0∫10(τ2xx+τ2xxt+u2xxx+v2xxx+w2xxx+θ2xxx)dxds≤C7. |
Proof. Apply ∂x to (2.45) and multiply by (λτx/τ)x in L2 to get
12ddt∫10(λτxτ)2xdx+∫10Rθλτ(λτxτ)2xdx=∫10(λτxτ)x[λθτ(τxθt−τtθx)]xdx+∫10(λτxτ)x(utr−v2r2+2uμxr)xdx−R∫10(λτxτ)x[2θxτxτ2−θxxτ−2θτ2xτ3−θτxλτ(λτ)x]dx≤C(ε)∫10[|(τx,θx)|2|(τxθt,τtθx)|2+|(τxxθt,τxθxt,τtxθx,τtθxx)|2]dx+C(ε)∫10[|(uxt,ut,vx,v,uxθx,θxx,θ2x,θx)|2+|(θxx,τ2x,θxτx)|2]dx+ε∫10(λτxτ)2xdx:=9∑i=8IIIi+ε∫10(λτxτ)2xdx, | (2.109) |
where the following fact has been used:
(θτ)xx=θxxτ−2θxτxτ2+2θτ2xτ3−θτxxτ2=θxxτ−2θxτxτ2+2θτ2xτ3−θλτ[(λτxτ)x−λxτxτ+λτ2xτ2]. |
By Sobolev's inequality and Lemmas 2.6–2.9, we have
III8≤C(ε)(‖(τx,θx)‖4∞‖(τt,θt)‖2+‖θt‖2∞‖τxx‖2+‖τx‖2∞‖θxt‖2+‖τt‖2∞‖θxx‖2+‖θx‖2∞‖τxt‖2)≤C(ε)(‖(τx,θx)‖4+‖(τx,θx)‖2‖(τxx,θxx)‖2+‖θt‖21‖τxx‖2+‖θxt‖2‖τx‖2+‖τt‖21+‖θx‖21)≤C(ε)(‖(τx,θx,τt)‖2+‖θt‖2‖τxx‖2), | (2.110) |
and
III9≤C(ε)‖(uxt,ut,vx,v,θxx,θx)‖2+C(ε)‖(ux,θx,τx)‖2∞‖(θx,τx)‖2≤ε‖τxx‖2+C(ε)‖(uxt,ut,vx,v,θxx,θx,τx)‖2. | (2.111) |
Noting that
|τxx|≤C|(λτxτ)x|+C|(θxτx,τ2x)|, |
we can derive from Sobolev's inequality and Lemma 2.4 that
‖τxx‖2≤C‖(λτxτ)x‖2+C‖θx‖2∞‖τx‖2+C‖τx‖44≤C‖(λτxτ)x‖2+C‖θx‖21+C‖τx‖4+C‖τx‖3‖τxx‖. |
So, it follows from Cauchy-Schwarz's inequality and Lemma 2.4 that
‖τxx‖2≤C‖θx‖21+C‖τx‖2+C‖(λτxτ)x‖2. | (2.112) |
Taking ε suitably small, putting (2.110)–(2.111) into (2.109), and using Lemmas 2.4, 2.8, and 2.9, we find
12ddt∫10(λτxτ)2xdx+c∫10(λτxτ)2xdx≤C‖(τt,θt,θx,ut,v)‖21+C‖τx‖2+C‖θt‖21‖(λτxτ)x‖2. | (2.113) |
By (2.113), Grönwall's inequality, Corollary 2.1, and Lemmas 2.6 and 2.8–2.9, one obtains
∫10(λτxτ)2xdx+∫T0∫10(λτxτ)2xdxds≤C. | (2.114) |
It follows from (2.112) and (2.114) that
‖τxx‖2+∫T0‖τxx‖2ds≤C. | (2.115) |
Letting ∂x act on (1.13) gives
˜λθ2r2uxxxτ+rx(λτtτ)x+r[(λτ)xτt]x+r(λτ)x(ru)xx=uxt−(v2r)x+(rPx)x+2(uμx)x−rλτ[(ru)xxx−ruxxx]. | (2.116) |
It follows from (2.115) and (2.116) that
∫T0‖uxxx‖2ds≤C∫T0‖(θxτt,τxt,τtτx)‖2ds+C∫T0‖(θ2xτt,θxxτt,θxτtx,θxτtτx)‖2ds+C∫T0‖(τxxτt,τxτtx,τ2xτt)‖2ds+C∫T0‖(θx,θxτx,θxux,θxuxx)‖2ds+C∫T0‖(uxt,vx,v,θx,τx,θxx,θxτx,τxx)‖2ds+C∫T0‖(uxθx,θ2x,θxx)‖2ds+C∫T0‖(u,τx,τxx,ux,τxux,uxx)‖2ds≤C∫T0‖(τt,τxx,τtx,θx,ux,uxt,vx,v,θxx,u,τx,uxx)‖2ds≤C, |
where the following fact has been used:
‖(θx,τx,θ2x,τt,θxτt,τ2x)‖∞≤C+C‖(θx,τt,τx)‖21≤C. |
Similarly, using (1.14)–(1.15), we also have
∫T0‖(vxxx,wxxx)‖2ds≤C. |
Letting ∂x act on (1.16) gives
˜κθβr2θxxxτ=cvθxt+(Pτt)x−(κr2τ)xxθx−2(κr2τ)xθxx−Qx. | (2.117) |
It follows from (2.114) and (2.117) that
∫T0‖θxxx‖2ds≤C∫T0‖(θxt,θxτt,τxτt,τtx)‖2ds+C∫T0‖(θ3x,θxxθx,θ2x,θ2xτx,θxτx,θxτ2x,θxτxx)‖2ds+C∫T0‖(θxx,τxθxx)‖2+‖Qx‖2ds. | (2.118) |
By the definition of Q, one has
∫T0‖Qx‖2ds≤C∫T0‖(θx,θxux,u,τx,ux,uxx,uxτx,u2x,uxuxx)‖2ds+C∫T0‖(θxw2x,w2x,wxwxx,w2xτx)‖2ds+C∫T0‖(θxv2x,τxv2x,v2x,vxvxx,v2xτx,vx,vxx,vxτx,v)‖2ds. | (2.119) |
Since the following estimates have been obtained:
‖(θx,τx,ux,wx,vx)‖∞≤C‖(θx,τx,ux,wx,vx)‖1≤C, |
putting (2.119) into (2.118) yields
∫T0‖θxxx‖2ds≤C∫T0‖(θxt,τt,τxt,θx,θxx,τx,τxx,ux,u,uxx,wx,wxx,vx,vxx,v)‖2ds≤C. |
The proof of Lemma 2.10 is complete.
With all a priori estimates from Section 2 at hand, we can complete the proof of Theorem 1.1. For this purpose, it will be shown that the existence and uniqueness of local solutions to the initial-boundary value problem (1.12)–(1.19) can be obtained by using the Banach theorem and the contractivity of the operator defined by the linearization of the problem on a small time interval.
Lemma 3.1. Letting (1.20) hold, then there exists T0=T0(V0,V0,M0)>0, depending only on β, V0, and M0, such that the initial boundary value problem (1.12)–(1.19) has a unique solution (τ,u,v,w,θ)∈X(0,T0;12V0,12V0,2M0).
Proof of Theorem 1.1: First, to prove Theorem 1.1, according to (1.20), one has
τ0≥V0,θ0≥V0,∀x∈Ω,‖(τ0,u0,v0,w0,θ0)‖H2≤M0. |
Combined with Lemma 3.1, there exists t1=T0(V0,V0,M0) such that (τ,u,v,w,θ)∈X(0,t1;12V0,12V0,2M0).
We find the positive constant |α|≤α1, where α1 satisfies
(12V0)−|α1|≤2,(2M0)|α1|≤2,|α1|H(12V0,12V0,2M0)≤ϵ1, | (3.1) |
where ϵ1 is chosen in Lemma 2.1. That means that one can choose
|α1|:=min{ln2|ln2−lnV0|,ln2|ln2+lnM0|,ϵ1H−1(12V0,12V0,2M0)}. | (3.2) |
One deduces from Lemmas 2.1–2.10 with T=t1 that for each t∈[0,t1], the local solution (τ,u,v,w,θ) satisfies
C−10≤v(x,t)≤C0,C−11≤θ(x,t)≤C1,x∈(0,1), | (3.3) |
and
sup0≤t≤t1‖(τ,u,v,w,θ)‖22+∫t10‖θt‖2dt≤C28, | (3.4) |
where Ci(i=2,⋯,7) is chosen in Section 2 and C28:=∑7i=2Ci. It follows from Lemma 2.9 and Lemma 2.10 that (τ,u,v,w,θ)∈C([0,T);H2). If one takes (τ,u,v,w,θ)(⋅,t1) as the initial data and applies Lemma 3.1 again, the local solution (τ,u,v,w,θ) can be extended to the time interval [t1,t1+t2] with t2(C0,C1,C8) such that (τ,u,v,w,θ)∈X(t1,t1+t2;12C0,12C1,12C8). Moreover, for all (x,t)∈[0,1]×[0,t1+t2], one gets
12C0≤v(x,t),12C1≤θ(x,t), |
and
supt1≤t≤t1+t2‖(τ,u,v,w,θ)‖22+∫t1+t2t1‖θt‖2dt≤4C28, |
which combined with (3.3) and (3.4) implies that for all t∈[0,t1+t2],
12C0≤v(x,t),12C1≤θ(x,t), |
sup0≤t≤t1+t2‖(τ,u,v,w,θ)‖22+∫t1+t20‖θt‖2dt≤5C28. |
Take α≤min{α1,α2}, where αi(i=1,2) are positive constants satisfying (3.1) and
(12C0)−α2≤2,(√5C8)α2≤2,α2H(12C0,12C1,√5C8)≤ϵ1, |
where the value of ϵ1 is chosen in Lemma 2.1. That means that we can choose
|α2|:=min{ln2|ln2−lnC0|,ln2|ln√5+lnC8|,ϵ1H−1(12C0,12C1,√5C8)}. | (3.5) |
Then one can employ Lemmas 2.1–2.10 with T=t1+t2 to infer the local solution (τ,u,v,w,θ) satisfying (3.3) and (3.4).
Choosing
ϵ0=min{α1,α2}, | (3.6) |
and repeating the above procedure, one can extend the solution (τ,u,v,w,θ) step-by-step to a global one provided that |α|≤ϵ0. Furthermore,
‖(τ,u,v,w,θ)‖2H2+∫+∞0[‖(ux,vx,wx,θx)‖2+‖τ‖2]dt≤C29, |
from which we derive that the solution (τ,u,v,w,θ)∈X(0,+∞;C0,C1,C9).
The large-time behavior (1.21) follows from Lemmas 2.4–2.10 by using a standard argument [21].
First, thanks to (1.15), (2.1), (2.43), (2.55), (2.62), (2.73), Corollary 2.1, and Lemmas 2.4–2.10, taking ˆθ=E0, one has
ddt∫10ηE0(τ,u,v,w,θ)dx+c1‖(u,v)‖21+c1‖(wx,θx)‖2≤0, | (3.7) |
ddt∫10[12(λτxτ)2−λuτxrτ]dx+c2‖τx‖2≤C10‖(u,ux,θx,v,vx)‖2, | (3.8) |
ddt‖(ux,vx,wx)‖2+c3‖(uxx,vxx,wxx)‖2≤C11‖(θx,τx,vx,ux,u,v,wx)‖2, | (3.9) |
ddt‖θx‖2+c4‖θxx‖2≤C12‖(ux,vx,wx)‖21+C12‖θx‖2. | (3.10) |
By Cauchy-Schwarz's inequality, one has
|λuτxrτ|≤14(λτxτ)2+C‖u‖2. | (3.11) |
Hence, by means of (3.11), Poincaré's inequalities, Corollary 2.1, and Lemma 2.7, one can deduce
c‖τx‖2−C13‖u‖2≤∫10[12(λτxτ)2−λuτxrτ]dx≤C‖(τx,ux)‖2. |
Multiplying (3.7)–(3.10) by C14, C15, and C16, respectively, and adding them together with (3.10), one has
ddtA+c‖(ux,vx,wx,θx)‖2H1+c‖τx‖2≤0, | (3.12) |
where we have defined
A:=∫10C14ηE0(τ,u,v,w,θ)+C15[12(λτxτ)2−λuτxrτ]dx+C16‖(ux,vx,wx)‖2+‖θx‖2, |
and chosen constants C14>C15>C16>0 suitably large such that
c1C14−C10C15−C11C16−C12>0, |
c2C15−C11C16−C12>0, |
c3C16−C12>0. |
Taking C142>C13 and using Poincaré's inequality gives
c‖(τ−ˉτ,u,v,w,θ−E0)‖2≤A≤C‖(ux,vx,wx,θx)‖21+C‖τx‖2, | (3.13) |
where we have used the facts
‖θ−E0‖2≤C∫10|θ−ˉθ|2dx+C‖(u,v,w)‖2≤C‖(θx,ux,vx,wx)‖2. |
By means of (3.12) and (3.13), we can derive that
‖(τ−ˉτ,u,v,w,θ−E0)(t)‖2H1(Ω)≤Ce−ct. | (3.14) |
By means of ˉr, one has
r2−ˉr2=2∫x0τ−ˉτdξ. | (3.15) |
By means of (3.14) and (3.15), we have
‖r−ˉr‖22≤Ce−ct. |
The proof is thus complete.
Dandan Song: Writing-original draft, Writing-review & editing, Supervision, Formal Analysis; Xiaokui Zhao: Writing-review & editing, Methodology, Supervision.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors are grateful to the referees for their helpful suggestions and comments on the manuscript. This work was supported by the NNSFC (Grant No. 12101200), the Doctoral Scientific Research Foundation of Henan Polytechnic University (No. B2021-53) and the China Postdoctoral Science Foundation (Grant No. 2022M721035).
The authors declare there is no conflict of interest.
[1] | M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Functions, Dover, New York, 1972. |
[2] | M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel and W. Zaremba, Hindsight experience replay, in Advances in Neural Information Processing Systems, (2017), 5048–5058. |
[3] | A. S. Berahas, L. Cao, K. Choromanskiv and K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv: 1905.01332. |
[4] | G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540. |
[5] | K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization, NeurIPS. |
[6] | K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, Provably robust blackbox optimization for reinforcement learning, arXiv: 1903.02993. |
[7] | E. Conti, V. Madhavan, F. P. Such, J. Lehman, K. O. Stanley and J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, NIPS. |
[8] | E. Coumans and Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, GitHub Repository. |
[9] | P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu and P. Zhokhov, Openai Baselines, https://github.com/openai/baselines, 2017. |
[10] | A. D. Flaxman, A. T. Kalai and H. B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, Proceedings of the 16th Annual ACM-SIAM symposium on Discrete Algorithms, 385–394, ACM, New York, (2005). |
[11] | S. Fujimoto, H. Van Hoof and D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint, arXiv: 1802.09477. |
[12] |
N. Hansen, The CMA evolution strategy: A comparing review, in Towards a new Evolutionary Computation, Springer, 192 (2006), 75–102. doi: 10.1007/3-540-32494-1_4
![]() |
[13] |
Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation (2001) 9: 159-195. ![]() |
[14] | T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, ICLR. |
[15] | N. Maheswaranathan, L. Metz, G. Tucker, D. Choi and J. Sohl-Dickstein, Guided evolutionary strategies: Augmenting random search with surrogate gradients, Proceedings of the 36th International Conference on Machine Learning. |
[16] | F. Meier, A. Mujika, M. M. Gauy and A. Steger, Improving gradient estimation in evolutionary strategies with past descent directions, Optimization Foundations for Reinforcement Learning Workshop at NeurIPS. |
[17] | V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, ICML, 1928–1937. |
[18] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. doi: 10.1038/nature14236
![]() |
[19] | P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., Ray: A distributed framework for emerging {AI} applications, in 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), (2018), 561–577. |
[20] |
Random gradient-free minimization of convex functions. Found. Comput. Math. (2017) 17: 527-566. ![]() |
[21] | A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, Automatic differentiation in pytorch., |
[22] |
A. Quarteroni, R. Sacco and F. Saleri, Numerical Mathematics, Texts in Applied Mathematics, 37. Springer-Verlag, Berlin, 2007. doi: 10.1007/b98885
![]() |
[23] | T. Salimans, J. Ho, X. Chen, S. Sidor and I. Sutskever, From complexity to simplicity as a scalable alternative to reinforcement learning, arXiv preprint, arXiv: 1703.03864. |
[24] | J. Schulman, S. Levine, P. Abbeel, M. I. Jordan and P. Moritz, Trust region policy optimization, ICML, 1889–1897. |
[25] | J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, arXiv preprint, arXiv: 1707.06347. |
[26] |
Parameter-exploring policy gradients. Neural Networks (2010) 23: 551-559. ![]() |
[27] | Robot skill learning: From reinforcement learning to evolution strategies. Paladyn Journal of Behavioral Robotics (2013) 4: 49-61. |
[28] |
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, et al., Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961
![]() |
[29] | F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint, arXiv: 1712.06567. |
[30] | R. S. Sutton and A. G. Barto (eds.), Reinforcement Learning: An introduction, Second edition. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018. |
[31] |
Fast buckling load numerical prediction method for imperfect shells under axial compression based on pod and vibration correlation technique. Composite Structures (2020) 252: 112721. ![]() |
[32] |
K. Tian, Z. Li, L. Huang, K. Du, L. Jiang and B. Wang, Enhanced variable-fidelity surrogate-based optimization framework by gaussian process regression and fuzzy clustering, Comput. Methods Appl. Mech. Engrg., 366 (2020), 113045, 19 pp. doi: 10.1016/j.cma.2020.113045
![]() |
[33] | J. Zhang, H. Tran, D. Lu and G. Zhang, Enabling long-range exploration in minimization of multimodal functions, Proceedings of 37th on Uncertainty in Artificial Intelligence (UAI). |