Reinforcement learning in optimization problems. Applications to geophysical data inversion

Paolo Dell'Aversana; Paolo Dell'Aversana

doi:10.3934/geosci.2022027

AIMS Geosciences

2022, Volume 8, Issue 3: 488-502. doi: 10.3934/geosci.2022027

Previous Article Next Article

Research article Special Issues

Reinforcement learning in optimization problems. Applications to geophysical data inversion

Paolo Dell'Aversana ^,

Eni S.p.A., San Donato Milanese, Milan, Italy

Received: 06 May 2022 Revised: 18 July 2022 Accepted: 31 July 2022 Published: 08 August 2022

In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly coupled with the Reinforcement Learning workflow, allows expanding the exploration of the model-space, minimizing the misfit between observed and predicted responses and limiting the problem of local minima of the cost function. In order to prove the feasibility of our methodology, we tested it using synthetic geo-electric data and a seismic refraction data set available in the public domain.
- reinforcement learning,
- geophysical inversion,
- optimization,
- refraction seismic,
- electric tomography,
- machine learning
Citation: Paolo Dell'Aversana. Reinforcement learning in optimization problems. Applications to geophysical data inversion[J]. AIMS Geosciences, 2022, 8(3): 488-502. doi: 10.3934/geosci.2022027

Related Papers:

Abstract

In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly coupled with the Reinforcement Learning workflow, allows expanding the exploration of the model-space, minimizing the misfit between observed and predicted responses and limiting the problem of local minima of the cost function. In order to prove the feasibility of our methodology, we tested it using synthetic geo-electric data and a seismic refraction data set available in the public domain.

References

[1]	Boyd SP, Vandenberghe L (2004) Convex Optimization, Cambridge University Press, 129.
[2]	Tarantola A (2005) Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM. https://doi.org/10.1137/1.9780898717921
[3]	Horst R, Tuy H (1996) Global Optimization: Deterministic Approaches, Springer.
[4]	Neumaier A (2004) Complete Search in Continuous Global Optimization and Constraint Satisfaction. Acta Numerica 13: 271–369. https://doi.org/10.1017/S0962492904000194 doi: 10.1017/S0962492904000194
[5]	Raschka S, Mirjalili V (2017) Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, PACKT Books.
[6]	Russell S, Norvig P (2016) Artificial Intelligence: A Modern approach, Pearson Education, Inc.
[7]	Ravichandiran S (2020) Deep Reinforcement Learning with Python, Packt Publishing.
[8]	Duan Y, Chen X, Houthooft R, et al. (2016) Benchmarking deep reinforcement learning for continuous control. ICML 48: 1329–1338. https://arXiv.org/abs/1604.06778
[9]	Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6: 503–556.
[10]	Geramifard A, Dann C, Klein RH, et al. (2015) RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research. J Mach Learn Res 16: 1573–1578.
[11]	Lample G, Chaplot DS (2017) Playing FPS Games with Deep Reinforcement Learning. AAAI 2140–2146. https://doi.org/10.48550/arXiv.1609.05521 doi: 10.48550/arXiv.1609.05521
[12]	Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. Proc Eleventh Int Conf Mach Learn, 157–163. https://doi.org/10.1016/b978-1-55860-335-6.50027-1 doi: 10.1016/b978-1-55860-335-6.50027-1
[13]	Nagabandi A, Kahn G, Fearing RS, et al. (2018) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. ICRA, 7559–7566. https://doi.org/10.1109/ICRA.2018.8463189 doi: 10.1109/ICRA.2018.8463189
[14]	Ribeiro C, Szepesvári C (1996) Q-learning combined with spreading: Convergence and results. Proc ISRF-IEE Int Conf Intell Cognit Syst, 32–36.
[15]	Rücker C, Günther T, Wagner FM (2017) pyGIMLi: an open-source library for modelling and inversion in geophysics. Comput Geosci 109: 106–123. https://doi.org/10.1016/j.cageo.2017.07.011 doi: 10.1016/j.cageo.2017.07.011

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)