The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
Citation: Jiaxin Zhang, Hoang Tran, Guannan Zhang. Accelerating reinforcement learning with a Directional-Gaussian-Smoothing evolution strategy[J]. Electronic Research Archive, 2021, 29(6): 4119-4135. doi: 10.3934/era.2021075
[1] | Omar Hassan Mahmood, Mustafa Sh. Aljanabi, Farouk M. Mahdi . Effect of Cu nanoparticles on microhardness and physical properties of aluminum matrix composite prepared by PM. AIMS Materials Science, 2025, 12(2): 245-257. doi: 10.3934/matersci.2025013 |
[2] | Supriya Rattan, Derek Fawcett, Gerrard Eddy Jai Poinern . Williamson-Hall based X-ray peak profile evaluation and nano-structural characterization of rod-shaped hydroxyapatite powder for potential dental restorative procedures. AIMS Materials Science, 2021, 8(3): 359-372. doi: 10.3934/matersci.2021023 |
[3] | Xuan Luc Le, Nguyen Dang Phu, Nguyen Xuan Duong . Enhancement of ferroelectricity in perovskite BaTiO3 epitaxial thin films by sulfurization. AIMS Materials Science, 2024, 11(4): 802-814. doi: 10.3934/matersci.2024039 |
[4] | Saif S. Irhayyim, Hashim Sh. Hammood, Hassan A. Abdulhadi . Effect of nano-TiO2 particles on mechanical performance of Al–CNT matrix composite. AIMS Materials Science, 2019, 6(6): 1124-1134. doi: 10.3934/matersci.2019.6.1124 |
[5] | Saif S. Irhayyim, Hashim Sh. Hammood, Anmar D. Mahdi . Mechanical and wear properties of hybrid aluminum matrix composite reinforced with graphite and nano MgO particles prepared by powder metallurgy technique. AIMS Materials Science, 2020, 7(1): 103-115. doi: 10.3934/matersci.2020.1.103 |
[6] | Denise Arrozarena Portilla, Arturo A. Velázquez López, Rosalva Mora Escobedo, Hernani Yee Madeira . Citrate coated iron oxide nanoparticles: Synthesis, characterization, and performance in protein adsorption. AIMS Materials Science, 2024, 11(5): 991-1012. doi: 10.3934/matersci.2024047 |
[7] | Christoph Janiak . Inorganic materials synthesis in ionic liquids. AIMS Materials Science, 2014, 1(1): 41-44. doi: 10.3934/matersci.2014.1.41 |
[8] | Abdulkader A. Annaz, Saif S. Irhayyim, Mohanad L. Hamada, Hashim Sh. Hammood . Comparative study of mechanical performance between Al–Graphite and Cu–Graphite self-lubricating composites reinforced by nano-Ag particles. AIMS Materials Science, 2020, 7(5): 534-551. doi: 10.3934/matersci.2020.5.534 |
[9] | Purabi R. Ghosh, Derek Fawcett, Michael Platten, Shashi B. Sharma, John Fosu-Nyarko, Gerrard E. J. Poinern . Sustainable green chemical synthesis of discrete, well-dispersed silver nanoparticles with bacteriostatic properties from carrot extracts aided by polyvinylpyrrolidone. AIMS Materials Science, 2020, 7(3): 269-287. doi: 10.3934/matersci.2020.3.269 |
[10] | Mohsen Safaei, Mohammad Salmani Mobarakeh, Bahram Azizi, Ehsan Shoohanizad, Ling Shing Wong, Nafiseh Nikkerdar . Optimization of synthesis of cellulose/gum Arabic/Ag bionanocomposite for antibacterial applications. AIMS Materials Science, 2025, 12(2): 278-300. doi: 10.3934/matersci.2025015 |
The objective of reinforcement learning (RL) is to find an optimal strategy for solving a dynamical control problem. Evolution strategy (ES) has been shown great promise in many challenging reinforcement learning (RL) tasks, where the underlying dynamical system is only accessible as a black box such that adjoint methods cannot be used. However, existing ES methods have two limitations that hinder its applicability in RL. First, most existing methods rely on Monte Carlo based gradient estimators to generate search directions. Due to low accuracy of Monte Carlo estimators, the RL training suffers from slow convergence and requires more iterations to reach the optimal solution. Second, the landscape of the reward function can be deceptive and may contain many local maxima, causing ES algorithms to prematurely converge and be unable to explore other parts of the parameter space with potentially greater rewards. In this work, we employ a Directional Gaussian Smoothing Evolutionary Strategy (DGS-ES) to accelerate RL training, which is well-suited to address these two challenges with its ability to (i) provide gradient estimates with high accuracy, and (ii) find nonlocal search direction which lays stress on large-scale variation of the reward function and disregards local fluctuation. Through several benchmark RL tasks demonstrated herein, we show that the DGS-ES method is highly scalable, possesses superior wall-clock time, and achieves competitive reward scores to other popular policy gradient and ES approaches.
Consider the semilinear elliptic equation
−Δu=f(u),inRn. | (1.1) |
A solution
∫Rn[|∇φ|2−f′(u)φ2]≥0,∀φ∈C∞0(Rn). | (1.2) |
The Morse index of a solution is defined to be the maximal dimension of the negative space for this quadratic form. A solution with finite Morse index is therefore not too unstable.
In 2000s, Dancer wrote a series of papers on stable and finite Morse index solutions, [13,14,15,16,17,18,19,21] (see also his survey [20] at 2010 ICM and the summary in Du [35,Section 8]). He obtained various classification results about these solutions and applied them to the study of equations with small parameters in bounded domains and global bifurcation problems. Many results on stable and finite Morse index solutions have appeared since then. We refer the reader to the monograph of Dupaigne [40] for an extensive list of results and references up to 2010.
In this paper we review some recent results about stable and finite Morse index solutions, mostly between 2010-2020. We will restrict our attention to the Liouville (-Bernstein-De Giorgi) type results on stable and finite Morse index solutions defined on the entire
In [20], Dancer proposed the following conjecture:
Assume
Then either
If
For nonegative nonlinearities
Theorem 2.1. Assume that
Hence in this case, the critical dimension is
Δu=u−p | (2.1) |
(see Esposito-Ghoussoub-Guo [42], Meadows [72], Ma-Wei [71] and Du-Guo [36]), to consider Liouville property for stable solutions, a more natural class is those solutions with a suitable polynomial growth at infinity (or even without any growth condition). In this case, it seems that the critical dimension should be
Problem 1. Assume
∫10f(u)du=−∞. | (2.2) |
Then there is no positive, stable solution on
Here the condition (2.2) implies that there does not exist one dimensional stable solution. Note that by the above remark, no assumption on the growth of solutions is added.
If
If there exists a one dimensional stable solution of (1.1), then a double well structure is associated to this equation, that is, there exist two constants
F(a±)=0,F(t)<0in(a−,a+). |
Here
Under this assumption, Dancer's conjecture is closely related to De Giorgi conjecture (De Giorgi [28]) and stable De Giorgi conjecture about Allen-Cahn equation
−Δu=u−u3. | (2.3) |
By the way, based on the result of Pacard-Wei [76], in this case, the critical dimension should be
Recall that De Giorgi conjecture states that
Suppose
While the stable De Giorgi conjecture states that
Suppose
In the Allen-Cahn equation, the standard double well potential
The De Giorgi conjecture has been solved by Ghoussoub-Gui [55] in dimension
By an observation of Dancer, it is known that the method introduced in Ghoussoub-Gui [55] and Ambrosio-Cabre [4] can be used to prove the stable De Giorgi conjecture in dimension
−div(σ2∇φ)=0, | (2.4) |
but by Gazzola [54], there is no hope to use this in dimensions
Even the following weaker version of the stable De Giorgi conjecture is still open.
Problem 2. Suppose
∫BR(0)[12|∇u|2+14(1−u2)2]≤CRn−1,∀R>0. | (2.5) |
If
We say the energy growth condition (2.5) is natural because it is satisfied by minimizing solutions. (Whether this condition holds for stable solutions is another unknown point in the stable De Giorgi conjecture.) If
Another missing geometric estimate for semilinear elliptic equations is the one correspondent to the famous Simons inequality (Simons [82]) for minimal hypersurfaces, which is a fundamental tool in the study of stable minimal hypersurfaces (see e.g. [81], [79]). This difficulty is also encountered in the study of Bernstein property for the Alt-Caffarelli one phase free boundary problem (see Alt-Caffarelli [2]) and nonlocal minimal surfaces (see Caffarelli-Roquejoffre-Savin [8]).
Recall that the Alt-Caffarelli one phase free boundary problem is
{Δu=0in{u>0},|∇u|=1on∂{u>0}. | (2.6) |
There are some variants, such as the problem studied by Phillips [77] and Alt-Phillips [3]
Δu=u−pχ{u>0},0<p<1, | (2.7) |
as well as various approximations to these problems, e.g.
Δu=fε(u), |
where
Although most studies on these free boundary problems are focused on minimizing solutions or viscosity solutions, recently there arises some interest in higher energy critical points, see Jerison-Perera [64]. To understand these solutions, the stability condition should play an important role. Indeed, even for minimizing solutions, to prove the optimal partial regularity of free boundaries, one needs the classification of stable, homogeneous solutions just as in the Bernstein problem for minimal hypersurfaces, see Weiss [92]. For the Alt-Caffarelli one phase free boundary problem, it is conjectured that the critical dimension is
In recent years, we see also much progress on De Giorgi conjecture for fractional Allen-Cahn equation
(−Δ)su=u−u3. | (2.8) |
In particular, Figalli-Serra [52] solved the stable De Giorgi conjecture for the
For equations enjoying a scaling invariance, much progress has been obtained in the last decade. This is because in this case, usually there exists a monotonicity formula. As in the Bernstein problem for minimal hypersurfaces (see Fleming [53]), we can use the blowing down analysis and then the classification of homogeneous solutions to prove Liouville type results.
This approach was first undertaken by the author in [86] to study the partial regularity of stable solutions to the Lane-Emden equation (see also Davila-Dupaigne-Farina [25] for related results)
−Δu=|u|p−1u,p>1. | (3.1) |
The scaling invariance for this equation says, if
uλ(x):=λ2p−1u(λx) |
is also a solution of (3.1).
The optimal Liouville theorem for stable and finite Morse index solutions to this equation was established in Farina [43] by a Moser type iteration argument. The blowing down analysis can be used to give another proof. By employing Pacard's monotonicity formula ([74,75]) and Federer's dimension reduction principle ([50]), a sharp dimension estimate on the singular set of stable solutions is given in [86]. In this approach, usually we need only an estimate up to the energy level.
This approach was further developed in Davila et. al. [26] and Du-Guo-Wang [39]. In [26], a monotonicity formula is derived for the fourth order Lane-Emden equation
Δ2u=|u|p−1u. | (3.2) |
Then by the blowing down analysis and the classification of stable, homogeneous solutions, an optimal Liouville theorem for stable solutions of (3.2) is established. In [39], a similar result is obtained for the weighted equation
−div(|x|θ∇u)=|x|ℓ|u|p−1u. | (3.3) |
For this equation, an
By now this approach has been applied to many other problems, for example, fourth order weighted equations or weighted systems [60,61], polyharmonic equations [70], nonlinear elliptic system [94], Toda system [89] and various elliptic equations involving fractional Laplacians [27,48,45,46,47,49,63].
One may be tempted to believe that this approach works well once the equation enjoys a scaling invariance. However, there are several important exceptions.
Problem 3. What is the optimal dimension for the Liouville theorem for stable solutions to the equation
−Δ2u=u−p,u≥0. | (3.4) |
This equation arises from the MEMS problem, see Esposito-Ghoussoub-Guo [42]. For some
We also encounter the same difficulty with the possible lack of a monotonicity formula in some other problems, which include the equation with
−Δpu=|u|m−1u | (3.5) |
and its fourth order version
Δ(|Δu|m−1Δu)=|u|p−1u. | (3.6) |
When both
{−Δu=vq,−Δv=up, | (3.7) |
see Mtiri-Ye [73]. For these problems, a mysterious problem is the role of homogeneous (or radial) solutions in the classification of stable solutions. In particular, is the radial, homogenous solution mostly unstable (in a suitable sense) among all solutions?
Finally, if the blowing down analysis approach works, usually we could obtain a radial symmetry result about stable solutions when the space dimension is critical. By the moving plane method, this claim can be reduced to the classification of stable, homogeneous solutions in the critical dimension. This is similar to the classification of stable minimal hypercones in
Problem 4. Suppose
By the radial symmetry criterion of Guo [58], this is reduced to the classification of solutions
−ΔSn−1w+2p−1(n−2−2p−1)w=|w|p−1w, | (3.8) |
satisfying the stability condition
∫Sn−1[|∇φ|2+(n−2)24φ2]≥p∫Sn−1|w|p−1φ2,∀φ∈C1(Sn−1). | (3.9) |
If
In [86], this is wrongly claimed to be true. But the proof therein works only for a small range
In Dancer-Guo-Wei [24], infinitely many solutions to (3.8) are constructed. However, it seems difficult to verify (3.9) because it involves a spectral bound condition. To the best knowledge of the author, there is still no known smooth stable (in the sense of (3.9)) solutions of (3.8) other than the constant solutions.
Problem 5. Take
Concerning finite Morse index solutions on
Next, for those scaling invariant equations discussed in Section 3, the blowing down analysis still works. So if the Liouville theorem holds for stable solutions, then it also holds for finite Morse index solutions, except in an exceptional dimension. (This is the "Sobolev" critical dimension. For example, for (3.1), it is well known that when
In the rest of this section we review some recent results on finite Morse index solutions of Allen-Cahn equation (2.3). The author in [88], and later jointly with Wei in [90], studied the structure of finite Morse index solutions in
Theorem 4.1. A finite Morse index solutions of the Allen-Cahn equation (2.3) in
Here an end is a connected component of the nodal set
By applying a reverse version of the infinite dimensional Lyapnunov-Schimdt reduction method (see Del Pino et. al. [30], [31], [32], [33], [34]), when the interfaces in the Allen-Cahn equation are clustering, we were able to reduce the stability condition in the Allen-Cahn equation to a corresponding one for Toda system
Δfk=e−√2(fk−fk−1)−e−√2(fk+1−fk). | (4.1) |
With such a connection between Allen-Cahn equation and Toda system, various results about stable solutions of Toda system can be transferred to the Allen-Cahn equation. For example, in Wang-Wei [90,91], Farina's integral estimate in [44] and the
−Δu=eu, | (4.2) |
were used to establish a curvature estimate for level sets of solutions to singularly perturbed Allen-Cahn equation. (More precisely, we need the corresponding results for Toda system (4.1), but which are direct generalizations of the results about Liouville equation, see [89].) In Gui-Wang-Wei [57], the Liouville theorem in Dancer-Farina [23] about finite Morse index solutions to (4.2) was used to establish the following result.
Theorem 4.2. Suppose
Here we do not state the
The "finite Morse index implies finite ends" property is expected to be true in any dimension, but by now only this axially symmetric case has been proven. Note that, although Theorem 4.2 is a result in high dimensions, the axial symmetry makes the problem essentially two dimensional. This allows us to prove the quadratic decay for curvatures of
In Cao-Shen-Zhu [11], it was shown that stable minimal hypersurfaces in
Problem 6. Can we prove a topological finiteness result for ends of finite Morse index solutions to Allen-Cahn equation (2.3)?
Since we only want a topological finiteness, this should hold for any dimension
In general, our understanding of finite Morse index solutions in higher dimensions is very lacking. Anyway, we do not know too much about stable solutions in dimensions
Problem 7. Assuming the stable De Giorgi conjecture in dimensions
The three dimensional case can be proved as in [90] and [57], but this approach does not work in dimensions
The author's interest in stable and finite Morse index solutions was largely intrigued by N. Dancer and Y. Du about ten years ago. They taught me a lot about stable and finite Morse index solutions when I was a postdoc at Sydney University. Several problems collected in this paper were communicated to the author during various occasions from Juan Davila, Louis Dupaigne, Zongming Guo, Xia Huang, Yong Liu, Yoshihiro Tonegawa, Juncheng Wei and Dong Ye over a long period. I am grateful to them for sharing their insights on these problems. My research has been supported by the National Natural Science Foundation of China (No. 11871381).
[1] | M. Abramowitz and I. Stegun (eds.), Handbook of Mathematical Functions, Dover, New York, 1972. |
[2] | M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel and W. Zaremba, Hindsight experience replay, in Advances in Neural Information Processing Systems, (2017), 5048–5058. |
[3] | A. S. Berahas, L. Cao, K. Choromanskiv and K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization, arXiv: 1905.01332. |
[4] | G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540. |
[5] | K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization, NeurIPS. |
[6] | K. Choromanski, A. Pacchiano, J. Parker-Holder and Y. Tang, Provably robust blackbox optimization for reinforcement learning, arXiv: 1903.02993. |
[7] | E. Conti, V. Madhavan, F. P. Such, J. Lehman, K. O. Stanley and J. Clune, Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents, NIPS. |
[8] | E. Coumans and Y. Bai, Pybullet, a python module for physics simulation for games, robotics and machine learning, GitHub Repository. |
[9] | P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu and P. Zhokhov, Openai Baselines, https://github.com/openai/baselines, 2017. |
[10] | A. D. Flaxman, A. T. Kalai and H. B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, Proceedings of the 16th Annual ACM-SIAM symposium on Discrete Algorithms, 385–394, ACM, New York, (2005). |
[11] | S. Fujimoto, H. Van Hoof and D. Meger, Addressing function approximation error in actor-critic methods, arXiv preprint, arXiv: 1802.09477. |
[12] |
N. Hansen, The CMA evolution strategy: A comparing review, in Towards a new Evolutionary Computation, Springer, 192 (2006), 75–102. doi: 10.1007/3-540-32494-1_4
![]() |
[13] |
Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation (2001) 9: 159-195. ![]() |
[14] | T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, Continuous control with deep reinforcement learning, ICLR. |
[15] | N. Maheswaranathan, L. Metz, G. Tucker, D. Choi and J. Sohl-Dickstein, Guided evolutionary strategies: Augmenting random search with surrogate gradients, Proceedings of the 36th International Conference on Machine Learning. |
[16] | F. Meier, A. Mujika, M. M. Gauy and A. Steger, Improving gradient estimation in evolutionary strategies with past descent directions, Optimization Foundations for Reinforcement Learning Workshop at NeurIPS. |
[17] | V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, ICML, 1928–1937. |
[18] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, 518 (2015), 529-533. doi: 10.1038/nature14236
![]() |
[19] | P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan et al., Ray: A distributed framework for emerging {AI} applications, in 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), (2018), 561–577. |
[20] |
Random gradient-free minimization of convex functions. Found. Comput. Math. (2017) 17: 527-566. ![]() |
[21] | A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, Automatic differentiation in pytorch., |
[22] |
A. Quarteroni, R. Sacco and F. Saleri, Numerical Mathematics, Texts in Applied Mathematics, 37. Springer-Verlag, Berlin, 2007. doi: 10.1007/b98885
![]() |
[23] | T. Salimans, J. Ho, X. Chen, S. Sidor and I. Sutskever, From complexity to simplicity as a scalable alternative to reinforcement learning, arXiv preprint, arXiv: 1703.03864. |
[24] | J. Schulman, S. Levine, P. Abbeel, M. I. Jordan and P. Moritz, Trust region policy optimization, ICML, 1889–1897. |
[25] | J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal policy optimization algorithms, arXiv preprint, arXiv: 1707.06347. |
[26] |
Parameter-exploring policy gradients. Neural Networks (2010) 23: 551-559. ![]() |
[27] | Robot skill learning: From reinforcement learning to evolution strategies. Paladyn Journal of Behavioral Robotics (2013) 4: 49-61. |
[28] |
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. v. d. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, et al., Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489. doi: 10.1038/nature16961
![]() |
[29] | F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint, arXiv: 1712.06567. |
[30] | R. S. Sutton and A. G. Barto (eds.), Reinforcement Learning: An introduction, Second edition. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2018. |
[31] |
Fast buckling load numerical prediction method for imperfect shells under axial compression based on pod and vibration correlation technique. Composite Structures (2020) 252: 112721. ![]() |
[32] |
K. Tian, Z. Li, L. Huang, K. Du, L. Jiang and B. Wang, Enhanced variable-fidelity surrogate-based optimization framework by gaussian process regression and fuzzy clustering, Comput. Methods Appl. Mech. Engrg., 366 (2020), 113045, 19 pp. doi: 10.1016/j.cma.2020.113045
![]() |
[33] | J. Zhang, H. Tran, D. Lu and G. Zhang, Enabling long-range exploration in minimization of multimodal functions, Proceedings of 37th on Uncertainty in Artificial Intelligence (UAI). |
1. | Sk Irsad Ali, Anjan Das, Apoorva Agrawal, Shubharaj Mukherjee, Maudud Ahmed, P M G Nambissan, Samiran Mandal, Atis Chandra Mandal, Characterization, spectroscopic investigation of defects by positron annihilation, and possible application of synthesized PbO nanoparticles* , 2021, 30, 1674-1056, 026103, 10.1088/1674-1056/abd2a9 |