Research article Special Issues

An improved version of Polak-Ribière-Polyak conjugate gradient method with its applications in neural networks training

  • Published: 19 August 2025
  • Due to their simplicity, low memory requirements, strong convergence properties, and ability to solve problems of high dimensions, the conjugate gradient (CG) methods are widely used to solve linear and non-linear unconstrained optimization problems. The Polak-Ribière-Polyak (PRP) is considered as one of the most efficient CG methods in practical computation. However, theoretically, its convergence properties are poor. Therefore, many variants of PRP with good numerical results and good convergence properties have been developed, such as Gilbert and Nocedal method (PRP$ ^+ $), Wei-Yau-Liu method (WYL), and Yousif et al. method (OPRP). In this paper, based on PRP$ ^+ $ and OPRP methods, we proposed another modified version of PRP that inherits all the convergence properties of PRP$ ^+ $ and OPRP and has improved numerical results. To show the efficiency and robustness of the new modified method in practice, it was compared with PRP$ ^+ $, WYL, and OPRP when they are all applied under the strong Wolfe line search. At the same time, the new method was applied in deep learning to obtain ideal parameters of some neural network (NN) models during the training process.

    Citation: Osman Omer Osman Yousif, Raouf Ziadi, Abdulgader Z. Almaymuni, Mohammed A. Saleh. An improved version of Polak-Ribière-Polyak conjugate gradient method with its applications in neural networks training[J]. Electronic Research Archive, 2025, 33(8): 4799-4815. doi: 10.3934/era.2025216

    Related Papers:

  • Due to their simplicity, low memory requirements, strong convergence properties, and ability to solve problems of high dimensions, the conjugate gradient (CG) methods are widely used to solve linear and non-linear unconstrained optimization problems. The Polak-Ribière-Polyak (PRP) is considered as one of the most efficient CG methods in practical computation. However, theoretically, its convergence properties are poor. Therefore, many variants of PRP with good numerical results and good convergence properties have been developed, such as Gilbert and Nocedal method (PRP$ ^+ $), Wei-Yau-Liu method (WYL), and Yousif et al. method (OPRP). In this paper, based on PRP$ ^+ $ and OPRP methods, we proposed another modified version of PRP that inherits all the convergence properties of PRP$ ^+ $ and OPRP and has improved numerical results. To show the efficiency and robustness of the new modified method in practice, it was compared with PRP$ ^+ $, WYL, and OPRP when they are all applied under the strong Wolfe line search. At the same time, the new method was applied in deep learning to obtain ideal parameters of some neural network (NN) models during the training process.



    加载中


    [1] M. R. Hestenes, E. Steifel, Method of conjugate gradient for solving linear equations, J. Res. Nat. Bur. Stand., 49 (1952), 409–436.
    [2] R. Fletcher, C. Reeves, Function minimization by conjugate gradients, Comput. J., 7 (1964), 149–154. https://doi.org/10.1093/comjnl/7.2.149 doi: 10.1093/comjnl/7.2.149
    [3] B. T. Polyak, The conjugate gradient method in extreme problems, USSR Comput. Math. Math. Phys., 9 (1969), 94–112. https://doi.org/10.1016/0041-5553(69)90035-4 doi: 10.1016/0041-5553(69)90035-4
    [4] E. Polak, G. Ribière, Note sur la convergence de directions conjugées, Rev. Fr. Inf. Rech. Opér. Ser. Rouge, 3 (1969), 35–43. https://doi.org/10.1051/m2an/196903R100351 doi: 10.1051/m2an/196903R100351
    [5] R. Fletcher, Unconstrained optimization, in Practical Method of Optimization, Wiley, (2000), 1–136. Available from: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118723203.
    [6] Y. Liu, C. Storey, Efficient generalized conjugate gradient algorithms part 1: Theory, J. Optim. Theory Appl., 69 (1991), 129–137. https://doi.org/10.1007/BF00940464 doi: 10.1007/BF00940464
    [7] Y. H. Dai, Y. Yuan, A nonlinear conjugate gradient with a strong global convergence properties, SIAM J. Optim., 10 (1999), 177–182. https://doi.org/10.1137/S1052623497318992 doi: 10.1137/S1052623497318992
    [8] A. Abdelrahman, O. O. O. Yousif, M. Mogtaba, M. K. Elbahir, Global convergence of nonlinear conjugate gradient coefficients with inexact line search, Sci. J. King Faisal Univ. Basic Appl. Sci., 22 (2021), 86–91. https://doi.org/10.37575/b/sci/210058 doi: 10.37575/b/sci/210058
    [9] A. Abdelrahman, M. Mohammed, O. O. O. Yousif, M. K. Elbahir, Nonlinear conjugate gradient coefficients with exact and strong Wolfe line searches techniques, J. Math., 2022 (2022), 1383129. https://doi.org/10.1155/2022/1383129 doi: 10.1155/2022/1383129
    [10] O. Omer, M. Rivale, M. Mamat, Z. Amani, A new conjugate gradient method with sufficient descent without any line search for unconstrained optimization, AIP Conf. Proc., 1643 (2015), 602–608. https://doi.org/10.1063/1.4907500 doi: 10.1063/1.4907500
    [11] P. Wolfe, Convergence conditions for ascent methods, SIAM Rev., 11 (1969), 226–235. https://doi.org/10.1137/1011036 doi: 10.1137/1011036
    [12] P. Wolfe, Convergence conditions for ascent methods. Ⅱ: Some corrections, SIAM Rev., 13 (1971), 185–188. https://doi.org/10.1137/1013035 doi: 10.1137/1013035
    [13] O. Omer, M. Mamat, A. Abashar, M. Rivale, The global convergence properties of a conjugate gradient method, AIP Conf. Proc., 1602 (2014), 286–295. http://doi.org/10.1063/1.4882501 doi: 10.1063/1.4882501
    [14] O. Omer, M. Rivale, M. Mamat, A. Abdalla, A new conjugate gradient method and its global convergence under the exact line search, AIP Conf. Proc., 1635 (2014), 639–646. https://doi.org/10.1063/1.4903649 doi: 10.1063/1.4903649
    [15] O. O. O. Yousif, A. Abdelrahman, M. Mohammed, M. A. Saleh, A sufficient condition for the global convergence of conjugate gradient methods for solving unconstrained optimisation problems, Sci. J. King Faisal Univ. Basic Appl. Sci., 23 (2022), 106–112. https://doi.org/10.37575/b/sci/220013 doi: 10.37575/b/sci/220013
    [16] O. O. O. Yousif, M. A. Saleh, Another modified version of RMIL conjugate gradient method, Appl. Numer. Math., 202 (2024), 120–126. https://doi.org/10.1016/j.apnum.2024.04.014 doi: 10.1016/j.apnum.2024.04.014
    [17] O. O. O. Yousif, R. Ziadi, M. A. Saleh, A. Z. Almaymuni, Another updated parameter for the Hestenes-Stiefel conjugate gradient method, Int. J. Anal. Appl., 23 (2025), 10. https://doi.org/10.28924/2291-8639-23-2025-10 doi: 10.28924/2291-8639-23-2025-10
    [18] Y. Zhao, F. Wu, J. Pang, W. Zhong, New heterogeneous comprehensive learning particle swarm optimizer enhanced with low-discrepancy sequences and conjugate gradient method, Swarm Evol. Comput., 93 (2025), 101848. https://doi.org/10.1016/j.swevo.2025.101848 doi: 10.1016/j.swevo.2025.101848
    [19] G. Zoutendijk, Nonlinear programming computational methods, Integer Nonlinear Program., 1970 (1970), 37–86.
    [20] M. Al-Baali, Descent property and global convergence of Fletcher-Reeves method with inexact line search, IMA J. Numer. Anal., 5 (1985), 121–124. https://doi.org/10.1093/imanum/5.1.121 doi: 10.1093/imanum/5.1.121
    [21] M. J. D. Powell, Convergence properties of algorithm for nonlinear optimization, SIAM Rev., 28 (1986), 487–500. https://doi.org/10.1137/1028154 doi: 10.1137/1028154
    [22] W. W. Hager, H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search, SIAM J. Optim., 16 (2005), 170–192. https://doi.org/10.1137/030601880 doi: 10.1137/030601880
    [23] G. Yuan, X. Lu, A modified PRP conjugate gradient method, Ann. Oper. Res., 166 (2009), 73–90. https://doi.org/10.1007/s10479-008-0420-4 doi: 10.1007/s10479-008-0420-4
    [24] L. Zhang, An improved Wei-Yao-Liu nonlinear conjugate gradient method for optimization computation, Appl. Math. Comput., 215 (2009), 2269–2274. https://doi.org/10.1016/j.amc.2009.08.016 doi: 10.1016/j.amc.2009.08.016
    [25] Z. Wei, G. Li, L. Qi, New nonlinear conjugate gradient formulas for large-scale unconstrained optimizations problems, Appl. Math. Comput., 179 (2006), 407–430. https://doi.org/10.1016/j.amc.2005.11.150 doi: 10.1016/j.amc.2005.11.150
    [26] J. C. Gilbert, J. Nocedal, Global convergence properties of conjugate gradient methods for optimization, SIAM J. Optim., 2 (1992), 21–42. https://doi.org/10.1137/0802003 doi: 10.1137/0802003
    [27] Z. Wei, S. Yao, L. Liu, The convergence properties of some new conjugate gradient methods, Appl. Math. Comput., 183 (2006), 1341–1350. https://doi.org/10.1016/j.amc.2006.05.150 doi: 10.1016/j.amc.2006.05.150
    [28] O. O. O. Yousif, M. A. Y. Mohammed, M. A. Saleh, M. K. Elbashir, A criterion for the global convergence of conjugate gradient methods under strong Wolfe line search, J. King Saud Univ. Sci., 34 (2022), 1–7. https://doi.org/10.1016/j.jksus.2022.102281 doi: 10.1016/j.jksus.2022.102281
    [29] N. Andrei, An unconstrained optimization test functions collection, Adv. Modell. Optim., 10 (2008), 147–161.
    [30] M. Molga, C. Smutnicki, Test functions for optimization needs, 101 (2005), 32.
    [31] W. X. Zhu, A class of filled functions irrelevant to the number of local minimizers for global optimization (in Chinese), J. Syst. Sci. Math. Sci., 22 (2004), 406–413.
    [32] S. Mishra, Some new test functions for global optimization and performance of repulsive particle swam method, SSRN, (2006), 1–24. https://doi.org/10.2139/ssrn.926132
    [33] I. A. R. Moghrabi, Minimization of extended quadratic functions with inexact line searches, J. Korean Soc. Ind. Appl. Math., 9 (2005), 55–61.
    [34] E. Dolan, J. J. Moré, Benchmarking optimization software with performance profile, Math. Program., 91 (2002), 201–213. https://doi.org/10.1007/s101070100263 doi: 10.1007/s101070100263
    [35] M. Roohi, S. Mirzajani, A. R. Haghighi, A. Basse-O'Connor, Robust design of two-level non-integer SMC based on deep soft actor-critic for synchronization of chaotic fractional order memristive neural networks, Fractal Fract., 8 (2024), 548. https://doi.org/10.3390/fractalfract8090548 doi: 10.3390/fractalfract8090548
    [36] M. Roohi, C. Zhang, M. Taheri, A. Basse-O'Connor, Synchronization of fractional-order delayed neural networks using dynamic-free adaptive sliding mode control, Fractal Fract., 7 (2023), 682. https://doi.org/10.3390/fractalfract7090682 doi: 10.3390/fractalfract7090682
    [37] M. Roohi, C. Zhang, Y. Chen, Adaptive model-free synchronization of different fractional-order neural networks with an application in cryptography, Nonlinear Dyn., 100 (2020), 3979–4001. https://doi.org/10.1007/s11071-020-05719-y doi: 10.1007/s11071-020-05719-y
    [38] P. P. van der Smagt, Minimisation methods for training feedforward neural networks, Neural Networks, 7 (1994), 1–11. https://doi.org/10.1016/0893-6080(94)90052-3 doi: 10.1016/0893-6080(94)90052-3
    [39] L. Deng, The MNIST Database of handwritten digit images for machine learning research, IEEE Signal Process. Mag., 29 (2012), 141–142. https://doi.org/10.1109/MSP.2012.2211477 doi: 10.1109/MSP.2012.2211477
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(532) PDF downloads(33) Cited by(1)

Article outline

Figures and Tables

Figures(5)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog