Loading [MathJax]/jax/output/SVG/jax.js
Special Issues

Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy

  • The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.

    Citation: Jiaxin Zhang, Hoang Tran, Guannan Zhang. Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy[J]. Electronic Research Archive, 2021, 29(6): 4119-4135. doi: 10.3934/era.2021075

    Related Papers:

    [1] Omar Hassan Mahmood, Mustafa Sh. Aljanabi, Farouk M. Mahdi . Effect of Cu nanoparticles on microhardness and physical properties of aluminum matrix composite prepared by PM. AIMS Materials Science, 2025, 12(2): 245-257. doi: 10.3934/matersci.2025013
    [2] Supriya Rattan, Derek Fawcett, Gerrard Eddy Jai Poinern . Williamson-Hall based X-ray peak profile evaluation and nano-structural characterization of rod-shaped hydroxyapatite powder for potential dental restorative procedures. AIMS Materials Science, 2021, 8(3): 359-372. doi: 10.3934/matersci.2021023
    [3] Xuan Luc Le, Nguyen Dang Phu, Nguyen Xuan Duong . Enhancement of ferroelectricity in perovskite BaTiO3 epitaxial thin films by sulfurization. AIMS Materials Science, 2024, 11(4): 802-814. doi: 10.3934/matersci.2024039
    [4] Saif S. Irhayyim, Hashim Sh. Hammood, Hassan A. Abdulhadi . Effect of nano-TiO2 particles on mechanical performance of Al–CNT matrix composite. AIMS Materials Science, 2019, 6(6): 1124-1134. doi: 10.3934/matersci.2019.6.1124
    [5] Saif S. Irhayyim, Hashim Sh. Hammood, Anmar D. Mahdi . Mechanical and wear properties of hybrid aluminum matrix composite reinforced with graphite and nano MgO particles prepared by powder metallurgy technique. AIMS Materials Science, 2020, 7(1): 103-115. doi: 10.3934/matersci.2020.1.103
    [6] Denise Arrozarena Portilla, Arturo A. Velázquez López, Rosalva Mora Escobedo, Hernani Yee Madeira . Citrate coated iron oxide nanoparticles: Synthesis, characterization, and performance in protein adsorption. AIMS Materials Science, 2024, 11(5): 991-1012. doi: 10.3934/matersci.2024047
    [7] Christoph Janiak . Inorganic materials synthesis in ionic liquids. AIMS Materials Science, 2014, 1(1): 41-44. doi: 10.3934/matersci.2014.1.41
    [8] Abdulkader A. Annaz, Saif S. Irhayyim, Mohanad L. Hamada, Hashim Sh. Hammood . Comparative study of mechanical performance between Al–Graphite and Cu–Graphite self-lubricating composites reinforced by nano-Ag particles. AIMS Materials Science, 2020, 7(5): 534-551. doi: 10.3934/matersci.2020.5.534
    [9] Purabi R. Ghosh, Derek Fawcett, Michael Platten, Shashi B. Sharma, John Fosu-Nyarko, Gerrard E. J. Poinern . Sustainable green chemical synthesis of discrete, well-dispersed silver nanoparticles with bacteriostatic properties from carrot extracts aided by polyvinylpyrrolidone. AIMS Materials Science, 2020, 7(3): 269-287. doi: 10.3934/matersci.2020.3.269
    [10] Mohsen Safaei, Mohammad Salmani Mobarakeh, Bahram Azizi, Ehsan Shoohanizad, Ling Shing Wong, Nafiseh Nikkerdar . Optimization of synthesis of cellulose/gum Arabic/Ag bionanocomposite for antibacterial applications. AIMS Materials Science, 2025, 12(2): 278-300. doi: 10.3934/matersci.2025015
  • The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.



    Consider the semilinear elliptic equation

    Δu=f(u),inRn. (1.1)

    A solution u is (linearized) stable if

    Rn[|φ|2f(u)φ2]0,φC0(Rn). (1.2)

    The Morse index of a solution is defined to be the maximal dimension of the negative space for this quadratic form. A solution with finite Morse index is therefore not too unstable.

    In 2000s, Dancer wrote a series of papers on stable and finite Morse index solutions, [13,14,15,16,17,18,19,21] (see also his survey [20] at 2010 ICM and the summary in Du [35,Section 8]). He obtained various classification results about these solutions and applied them to the study of equations with small parameters in bounded domains and global bifurcation problems. Many results on stable and finite Morse index solutions have appeared since then. We refer the reader to the monograph of Dupaigne [40] for an extensive list of results and references up to 2010.

    In this paper we review some recent results about stable and finite Morse index solutions, mostly between 2010-2020. We will restrict our attention to the Liouville (-Bernstein-De Giorgi) type results on stable and finite Morse index solutions defined on the entire Rn, although there are many studies devoted to problems on bounded or unbounded domains in Rn and on complete Riemannian manifolds. As in Dancer's work, by a blow up procedure, these Liouville (-Bernstein-De Giorgi) type results should be useful in many other problems. We also refer the reader to a recent survey of Takahashi [83], which is focused on the a priori estimate aspect for stable solutions.

    In [20], Dancer proposed the following conjecture:

    Assume n8 and u is a bounded, linearized stable solutions of (1.1).

    Then either u is constant on Rn or, after a rotation of axis, u=u(xn).

    If u=u(xn), the solution is called one dimensional. This is equivalent to saying that all level sets of u are hyperplanes. In the following, we say the dimension N is critical if for n<N there are only trivial (constant or one dimensional) stable solutions on Rn, while for nN, there exist nontrivial stable solutions. This critical dimension always exists and it is usually finite.

    For nonegative nonlinearities f0, in a recent breakthrough work [7], Cabre et. al. obtained several a priori estimates on stable solutions, and used them to settle Brezis's conjecture on the L regularity of "extremal solutions" [5]. In [41], Dupaigne and Farina used these estimates to establish the following Liouville theorem:

    Theorem 2.1. Assume that f0 and it is locally Lipschitz. Suppose u is a stable solution of (1.1) and it is bounded below. If n10, then u is constant.

    Hence in this case, the critical dimension is 11. This theorem can be trivially extended to nonpositive nonlinearities f0 (subharmonic functions) if the solution is bounded from above. But in the subharmonic case, in view of results on the singular elliptic equation

    Δu=up (2.1)

    (see Esposito-Ghoussoub-Guo [42], Meadows [72], Ma-Wei [71] and Du-Guo [36]), to consider Liouville property for stable solutions, a more natural class is those solutions with a suitable polynomial growth at infinity (or even without any growth condition). In this case, it seems that the critical dimension should be 7. (In fact, (2.1) with p=1 is almost the worst case.) More precisely, we want to know if the following Liouville property holds.

    Problem 1. Assume f0 is locally Lipschitz on (0,+), and it satisfies

    10f(u)du=. (2.2)

    Then there is no positive, stable solution on Rn, provided that n<7.

    Here the condition (2.2) implies that there does not exist one dimensional stable solution. Note that by the above remark, no assumption on the growth of solutions is added.

    If f is sign-changing but there is no one dimensional stable solution of (1.1), currently there is still not too much progress on this conjecture in dimensions n3.

    If there exists a one dimensional stable solution of (1.1), then a double well structure is associated to this equation, that is, there exist two constants a<a+ such that a primitive function of f, F satisfies

    F(a±)=0,F(t)<0in(a,a+).

    Here a±=limxn±g(xn), where g is a one dimensional stable (equivalently, monotone) solution.

    Under this assumption, Dancer's conjecture is closely related to De Giorgi conjecture (De Giorgi [28]) and stable De Giorgi conjecture about Allen-Cahn equation

    Δu=uu3. (2.3)

    By the way, based on the result of Pacard-Wei [76], in this case, the critical dimension should be 8.

    Recall that De Giorgi conjecture states that

    Suppose u is a solution of the Allen-Cahn equation (2.3) in Rn. If xnu>0 and n8, then after a rotation of axis, u=u(xn).

    While the stable De Giorgi conjecture states that

    Suppose u is a stable solution of the Allen-Cahn equation (2.3) in Rn. If n7, then after a rotation of axis, u=u(xn).

    In the Allen-Cahn equation, the standard double well potential (1u2)2/4 contains only one critical point in the interval (1,1). But in Dancer's conjecture, besides stating that if there is no one dimensional stable profile, then stable solutions must be constant, it also includes the claim when F contains more than one critical points in the interval (a,a+). Here it is possible that a sub-double-well structure exists in (a,a+), which makes the situation more complicate.

    The De Giorgi conjecture has been solved by Ghoussoub-Gui [55] in dimension 2, by Ambrosio-Cabre [4] in dimension 3 and under an additional assumption in dimension 4n8 by Savin [78]. It was also shown to be not true in n9 by Del Pino-Kowalczyk-Wei [32].

    By an observation of Dancer, it is known that the method introduced in Ghoussoub-Gui [55] and Ambrosio-Cabre [4] can be used to prove the stable De Giorgi conjecture in dimension 2. By the time of this writing, this is still the only proven case for stable De Giorgi conjecture. The main difficulty seems to be that, by now the only tool to tackle such problems is the Liouville property for the degenerate equation

    div(σ2φ)=0, (2.4)

    but by Gazzola [54], there is no hope to use this in dimensions n3.

    Even the following weaker version of the stable De Giorgi conjecture is still open.

    Problem 2. Suppose u is a stable solution of the Allen-Cahn equation (2.3) in Rn, satisfying the natural energy growth bound

    BR(0)[12|u|2+14(1u2)2]CRn1,R>0. (2.5)

    If 4n7, then u is one dimensional.

    We say the energy growth condition (2.5) is natural because it is satisfied by minimizing solutions. (Whether this condition holds for stable solutions is another unknown point in the stable De Giorgi conjecture.) If n=3, the energy growth bound is quadratic and we can repeat the above proof of stable De Giorgi conjecture in dimension 2. This fact has been used by Chodosh-Mantoulidis in [12] to establish curvature estimates for Allen-Cahn equation on three dimensional manifolds. But as explained above, this approach does not work without such a quadratic bound. A possible way to solve Problem 2 is to establish a sheeting theorem similar to the ones for stable minimal hypersurfaces (see Schoen-Simon [80] and Wickramasekera [93]), but a geometric estimate (i.e. the oscillation estimate in [80]) is not known in this semilinear setting.

    Another missing geometric estimate for semilinear elliptic equations is the one correspondent to the famous Simons inequality (Simons [82]) for minimal hypersurfaces, which is a fundamental tool in the study of stable minimal hypersurfaces (see e.g. [81], [79]). This difficulty is also encountered in the study of Bernstein property for the Alt-Caffarelli one phase free boundary problem (see Alt-Caffarelli [2]) and nonlocal minimal surfaces (see Caffarelli-Roquejoffre-Savin [8]).

    Recall that the Alt-Caffarelli one phase free boundary problem is

    {Δu=0in{u>0},|u|=1on{u>0}. (2.6)

    There are some variants, such as the problem studied by Phillips [77] and Alt-Phillips [3]

    Δu=upχ{u>0},0<p<1, (2.7)

    as well as various approximations to these problems, e.g.

    Δu=fε(u),

    where fε is an approximate Dirac at 0 (see Fernández-Real and Ros-Oton [51]). These problems are similar to the ones considered in Problem 1, but with an integrable condition rather than the divergence condition (2.2). This integrable condition implies that there is a one dimensional monotone solution, where the solution has a non-empty dead core {u=0}, leading to a free boundary condition on {u=0}.

    Although most studies on these free boundary problems are focused on minimizing solutions or viscosity solutions, recently there arises some interest in higher energy critical points, see Jerison-Perera [64]. To understand these solutions, the stability condition should play an important role. Indeed, even for minimizing solutions, to prove the optimal partial regularity of free boundaries, one needs the classification of stable, homogeneous solutions just as in the Bernstein problem for minimal hypersurfaces, see Weiss [92]. For the Alt-Caffarelli one phase free boundary problem, it is conjectured that the critical dimension is 7. Currently, the Bernstein property is known to be true if n4 (by Caffarelli-Jerison-Kenig [10] and Jerison-Savin [65]), and not true if n7 (by De Silva-Jerison [29]).

    In recent years, we see also much progress on De Giorgi conjecture for fractional Allen-Cahn equation

    (Δ)su=uu3. (2.8)

    In particular, Figalli-Serra [52] solved the stable De Giorgi conjecture for the s=1/2 (half Laplacian) case in dimension 3 and the corresponding De Giorgi conjecture in dimension 4. Later their energy estimates were generalized to other fractional Laplacians by Gui-Li [56]. These estimates are optimal for s<1/2. In this case, the fractional Allen-Cahn equation is related to the nonlocal minimal surfaces. But for nonlocal minimal surfaces, it is still not known in general what is the critical dimension for the stable Bernstein property. Only some perturbative results have been obtained for those s sufficiently close to 1/2, see Caffarelli-Valdinoci [9] and Cabre-Cinti-Serra [6].

    For equations enjoying a scaling invariance, much progress has been obtained in the last decade. This is because in this case, usually there exists a monotonicity formula. As in the Bernstein problem for minimal hypersurfaces (see Fleming [53]), we can use the blowing down analysis and then the classification of homogeneous solutions to prove Liouville type results.

    This approach was first undertaken by the author in [86] to study the partial regularity of stable solutions to the Lane-Emden equation (see also Davila-Dupaigne-Farina [25] for related results)

    Δu=|u|p1u,p>1. (3.1)

    The scaling invariance for this equation says, if u is a solution of (3.1), then for any λ>0,

    uλ(x):=λ2p1u(λx)

    is also a solution of (3.1).

    The optimal Liouville theorem for stable and finite Morse index solutions to this equation was established in Farina [43] by a Moser type iteration argument. The blowing down analysis can be used to give another proof. By employing Pacard's monotonicity formula ([74,75]) and Federer's dimension reduction principle ([50]), a sharp dimension estimate on the singular set of stable solutions is given in [86]. In this approach, usually we need only an estimate up to the energy level.

    This approach was further developed in Davila et. al. [26] and Du-Guo-Wang [39]. In [26], a monotonicity formula is derived for the fourth order Lane-Emden equation

    Δ2u=|u|p1u. (3.2)

    Then by the blowing down analysis and the classification of stable, homogeneous solutions, an optimal Liouville theorem for stable solutions of (3.2) is established. In [39], a similar result is obtained for the weighted equation

    div(|x|θu)=|x||u|p1u. (3.3)

    For this equation, an ε-regularity theorem is needed to analyse the convergence of blowing down sequences. See also Dancer-Du-Guo [22], Du-Guo [37,38] and Wang-Ye [84] for related results on this equation.

    By now this approach has been applied to many other problems, for example, fourth order weighted equations or weighted systems [60,61], polyharmonic equations [70], nonlinear elliptic system [94], Toda system [89] and various elliptic equations involving fractional Laplacians [27,48,45,46,47,49,63].

    One may be tempted to believe that this approach works well once the equation enjoys a scaling invariance. However, there are several important exceptions.

    Problem 3. What is the optimal dimension for the Liouville theorem for stable solutions to the equation

    Δ2u=up,u0. (3.4)

    This equation arises from the MEMS problem, see Esposito-Ghoussoub-Guo [42]. For some p, a monotonicity formula was given in [26], but it is not known whether there exists a monotonicity formula for all p. See Guo-Wei [59], Huang-Ye-Zhou [62], Lai [67], Lai-Ye [68] for some recent results on this problem.

    We also encounter the same difficulty with the possible lack of a monotonicity formula in some other problems, which include the equation with p-Laplacians

    Δpu=|u|m1u (3.5)

    and its fourth order version

    Δ(|Δu|m1Δu)=|u|p1u. (3.6)

    When both u and Δu are positive, (3.6) is equivalent to the Lane-Emden system

    {Δu=vq,Δv=up, (3.7)

    see Mtiri-Ye [73]. For these problems, a mysterious problem is the role of homogeneous (or radial) solutions in the classification of stable solutions. In particular, is the radial, homogenous solution mostly unstable (in a suitable sense) among all solutions?

    Finally, if the blowing down analysis approach works, usually we could obtain a radial symmetry result about stable solutions when the space dimension is critical. By the moving plane method, this claim can be reduced to the classification of stable, homogeneous solutions in the critical dimension. This is similar to the classification of stable minimal hypercones in R8. For example, the following problem is still not completely solved.

    Problem 4. Suppose u is a stable, homogeneous solution of (3.1) in Rn, where n is the critical dimension. Is u radially symmetric (after a translation)?

    By the radial symmetry criterion of Guo [58], this is reduced to the classification of solutions wC2(Sn1) to the equation

    ΔSn1w+2p1(n22p1)w=|w|p1w, (3.8)

    satisfying the stability condition

    Sn1[|φ|2+(n2)24φ2]pSn1|w|p1φ2,φC1(Sn1). (3.9)

    If n is critical, we need to show that w must be constant.

    In [86], this is wrongly claimed to be true. But the proof therein works only for a small range p[pJL(n),pJL(n)+ε(n)). (Here pJL is the Joseph-Lundgren exponent, see [66].) The remaining case is still not known.

    In Dancer-Guo-Wei [24], infinitely many solutions to (3.8) are constructed. However, it seems difficult to verify (3.9) because it involves a spectral bound condition. To the best knowledge of the author, there is still no known smooth stable (in the sense of (3.9)) solutions of (3.8) other than the constant solutions.

    Problem 5. Take k,1 so that k+=n. Does there exist a nontrivial, stable solution of (3.1) invariant under the group SO(k)×SO()?

    Concerning finite Morse index solutions on Rn, for nonegative nonlinearities f0, Dupaigne and Farina in [41], under the same assumptions of Theorem 2.1 (plus some technical conditions), proved that finite Morse index solutions to (1.1) are radially symmetric.

    Next, for those scaling invariant equations discussed in Section 3, the blowing down analysis still works. So if the Liouville theorem holds for stable solutions, then it also holds for finite Morse index solutions, except in an exceptional dimension. (This is the "Sobolev" critical dimension. For example, for (3.1), it is well known that when p=(n+2)/(n2), there exist infinitely many solutions with finite Morse index.)

    In the rest of this section we review some recent results on finite Morse index solutions of Allen-Cahn equation (2.3). The author in [88], and later jointly with Wei in [90], studied the structure of finite Morse index solutions in R2. For example, the main result in [90] states that

    Theorem 4.1. A finite Morse index solutions of the Allen-Cahn equation (2.3) in R2 has finitely many ends.

    Here an end is a connected component of the nodal set {u=0} at infinity. But in fact more information such as the refined asymptotic behavior of u at infinity follows from the proof.

    By applying a reverse version of the infinite dimensional Lyapnunov-Schimdt reduction method (see Del Pino et. al. [30], [31], [32], [33], [34]), when the interfaces in the Allen-Cahn equation are clustering, we were able to reduce the stability condition in the Allen-Cahn equation to a corresponding one for Toda system

    Δfk=e2(fkfk1)e2(fk+1fk). (4.1)

    With such a connection between Allen-Cahn equation and Toda system, various results about stable solutions of Toda system can be transferred to the Allen-Cahn equation. For example, in Wang-Wei [90,91], Farina's integral estimate in [44] and the ε-regularity theorem in Wang [85,87], both for the Liouville equation

    Δu=eu, (4.2)

    were used to establish a curvature estimate for level sets of solutions to singularly perturbed Allen-Cahn equation. (More precisely, we need the corresponding results for Toda system (4.1), but which are direct generalizations of the results about Liouville equation, see [89].) In Gui-Wang-Wei [57], the Liouville theorem in Dancer-Farina [23] about finite Morse index solutions to (4.2) was used to establish the following result.

    Theorem 4.2. Suppose u is an entire solution of the Allen-Cahn equation (2.3) in Rn+1, and it is axially symmetric (i.e. u depends only on x21+x2n and xn+1) and stable outside a cylinder {x21++x2n<R2}. If 3n9, then u=u(xn+1).

    Here we do not state the n=2 case (i.e. for Allen-Cahn equation in R3), which can be viewed as the "Sobolev" critical case for Allen-Cahn equation. It was proved in [57] that in this case we have "finite Morse index implies finite ends" as in Theorem 4.1.

    The "finite Morse index implies finite ends" property is expected to be true in any dimension, but by now only this axially symmetric case has been proven. Note that, although Theorem 4.2 is a result in high dimensions, the axial symmetry makes the problem essentially two dimensional. This allows us to prove the quadratic decay for curvatures of {u=0} as in [90].

    In Cao-Shen-Zhu [11], it was shown that stable minimal hypersurfaces in Rn have only one end, that is, they are connected at infinity. Later in [69], Li-Wang also showed that minimal hypersurfaces with finite Morse index have finitely many ends (still in the topological sense). We want to know whether the corresponding results hold for stable or finite Morse index solutions of Allen-Cahn equation (2.3).

    Problem 6. Can we prove a topological finiteness result for ends of finite Morse index solutions to Allen-Cahn equation (2.3)?

    Since we only want a topological finiteness, this should hold for any dimension n. However, the one end result for stable solutions is not true for Allen-Cahn equation, at least in dimensions n11. In fact, the examples constructed in [1] are stable solutions of Allen-Cahn equation (2.3) in Rn, where n11, but they have two ends. Whether this one end result holds for n10 is open.

    In general, our understanding of finite Morse index solutions in higher dimensions is very lacking. Anyway, we do not know too much about stable solutions in dimensions n3. But even assuming the stable De Giorgi conjecture, we can not deduce that "finite Morse index implies finite ends", that is,

    Problem 7. Assuming the stable De Giorgi conjecture in dimensions 4n7, can we prove that finite Morse index solutions of (2.3) in these dimensions have finitely many ends?

    The three dimensional case can be proved as in [90] and [57], but this approach does not work in dimensions 4n7. This is because the techniques used in [90] relies on a finiteness result for nodal domains of directional derivatives ξu, which in turn needs the Liouville property for the degenerate elliptic equation (2.4), and as explained before, it cannot be applied in high dimensions. (In dimension 3, as in [57], a localization around each end allows us to use this technique, so we could expect the above result.) Note that the proof of Theorem 4.2 in [57] relies crucially on the axially symmetric assumption and the idea in Dancer-Farina [23]. Here the stability condition is used only in a reductive way as in Wang-Wei [90,91]. Therefore the power of the stability condition is not fully utilized. But now it is still not clear how to use the stability condition more efficiently in high dimensions.

    The author's interest in stable and finite Morse index solutions was largely intrigued by N. Dancer and Y. Du about ten years ago. They taught me a lot about stable and finite Morse index solutions when I was a postdoc at Sydney University. Several problems collected in this paper were communicated to the author during various occasions from Juan Davila, Louis Dupaigne, Zongming Guo, Xia Huang, Yong Liu, Yoshihiro Tonegawa, Juncheng Wei and Dong Ye over a long period. I am grateful to them for sharing their insights on these problems. My research has been supported by the National Natural Science Foundation of China (No. 11871381).



    [1] M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Functions, Dover, New York, 1972.
    [2] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel and W. Zaremba, Hindsight experience replay, in Advances in Neural Information Processing Systems, (2017), 5048–5058.
    [3] A. S. Berahas, L. Cao, K. Choromanskiv and K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv: 1905.01332.
    [4] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540.
    [5] K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization, NeurIPS.
    [6] K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, Provably robust blackbox optimization for reinforcement learning, arXiv: 1903.02993.
    [7] E. Conti, V. Madhavan, F. P. Such, J. Lehman, K. O. Stanley and J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, NIPS.
    [8] E. Coumans and Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, GitHub Repository.
    [9] P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu and P. Zhokhov, Openai Baselines, https://github.com/openai/baselines, 2017.
    [10] A. D. Flaxman, A. T. Kalai and H. B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, Proceedings of the 16th Annual ACM-SIAM symposium on Discrete Algorithms, 385–394, ACM, New York, (2005).
    [11] S. Fujimoto, H. Van Hoof and D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint, arXiv: 1802.09477.
    [12] N. Hansen, The CMA evolution strategy: A comparing review, in Towards a new Evolutionary Computation, Springer, 192 (2006), 75–102. doi: 10.1007/3-540-32494-1_4
    [13] Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation (2001) 9: 159-195.
    [14] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, ICLR.
    [15] N. Maheswaranathan, L. Metz, G. Tucker, D. Choi and J. Sohl-Dickstein, Guided evolutionary strategies: Augmenting random search with surrogate gradients, Proceedings of the 36th International Conference on Machine Learning.
    [16] F. Meier, A. Mujika, M. M. Gauy and A. Steger, Improving gradient estimation in evolutionary strategies with past descent directions, Optimization Foundations for Reinforcement Learning Workshop at NeurIPS.
    [17] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, ICML, 1928–1937.
    [18] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. doi: 10.1038/nature14236
    [19] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., Ray: A distributed framework for emerging {AI} applications, in 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), (2018), 561–577.
    [20] Random gradient-free minimization of convex functions. Found. Comput. Math. (2017) 17: 527-566.
    [21] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, Automatic differentiation in pytorch.,
    [22] A. Quarteroni, R. Sacco and F. Saleri, Numerical Mathematics, Texts in Applied Mathematics, 37. Springer-Verlag, Berlin, 2007. doi: 10.1007/b98885
    [23] T. Salimans, J. Ho, X. Chen, S. Sidor and I. Sutskever, From complexity to simplicity as a scalable alternative to reinforcement learning, arXiv preprint, arXiv: 1703.03864.
    [24] J. Schulman, S. Levine, P. Abbeel, M. I. Jordan and P. Moritz, Trust region policy optimization, ICML, 1889–1897.
    [25] J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, arXiv preprint, arXiv: 1707.06347.
    [26] Parameter-exploring policy gradients. Neural Networks (2010) 23: 551-559.
    [27] Robot skill learning: From reinforcement learning to evolution strategies. Paladyn Journal of Behavioral Robotics (2013) 4: 49-61.
    [28] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, et al., Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961
    [29] F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint, arXiv: 1712.06567.
    [30] R. S. Sutton and A. G. Barto (eds.), Reinforcement Learning: An introduction, Second edition. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018.
    [31] Fast buckling load numerical prediction method for imperfect shells under axial compression based on pod and vibration correlation technique. Composite Structures (2020) 252: 112721.
    [32] K. Tian, Z. Li, L. Huang, K. Du, L. Jiang and B. Wang, Enhanced variable-fidelity surrogate-based optimization framework by gaussian process regression and fuzzy clustering, Comput. Methods Appl. Mech. Engrg., 366 (2020), 113045, 19 pp. doi: 10.1016/j.cma.2020.113045
    [33] J. Zhang, H. Tran, D. Lu and G. Zhang, Enabling long-range exploration in minimization of multimodal functions, Proceedings of 37th on Uncertainty in Artificial Intelligence (UAI).
  • This article has been cited by:

    1. Sk Irsad Ali, Anjan Das, Apoorva Agrawal, Shubharaj Mukherjee, Maudud Ahmed, P M G Nambissan, Samiran Mandal, Atis Chandra Mandal, Characterization, spectroscopic investigation of defects by positron annihilation, and possible application of synthesized PbO nanoparticles* , 2021, 30, 1674-1056, 026103, 10.1088/1674-1056/abd2a9
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2997) PDF downloads(236) Cited by(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog