The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms

Can Zhang; Yichuan Shao; Haijing Sun; Lei Xing; Qian Zhao; Le Zhang; Can Zhang; Yichuan Shao; Haijing Sun; Lei Xing; Qian Zhao; Le Zhang

doi:10.3934/mbe.2024054

Mathematical Biosciences and Engineering

2024, Volume 21, Issue 1: 1270-1285. doi: 10.3934/mbe.2024054

Previous Article Next Article

Research article Special Issues

The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms

1.
School of Information Engineering, Shenyang University, Shenyang 110044, China
2.
School of Intelligent Science & Engineering, Shenyang University, Shenyang 110044, China
3.
School of Chemistry and Chemical Engineering, University of Surrey, GU2 7XH, United Kingdom
4.
School of Science, Shenyang University of Technology, Shenyang 110044, China

Academic Editor: Shangce Gao

Received: 13 October 2023 Revised: 04 December 2023 Accepted: 13 December 2023 Published: 26 December 2023

The Adam algorithm is a common choice for optimizing neural network models. However, its application often brings challenges, such as susceptibility to local optima, overfitting and convergence problems caused by unstable learning rate behavior. In this article, we introduce an enhanced Adam optimization algorithm that integrates Warmup and cosine annealing techniques to alleviate these challenges. By integrating preheating technology into traditional Adam algorithms, we systematically improved the learning rate during the initial training phase, effectively avoiding instability issues. In addition, we adopt a dynamic cosine annealing strategy to adaptively adjust the learning rate, improve local optimization problems and enhance the model's generalization ability. To validate the effectiveness of our proposed method, extensive experiments were conducted on various standard datasets and compared with traditional Adam and other optimization methods. Multiple comparative experiments were conducted using multiple optimization algorithms and the improved algorithm proposed in this paper on multiple datasets. On the MNIST, CIFAR10 and CIFAR100 datasets, the improved algorithm proposed in this paper achieved accuracies of 98.87%, 87.67% and 58.88%, respectively, with significant improvements compared to other algorithms. The experimental results clearly indicate that our joint enhancement of the Adam algorithm has resulted in significant improvements in model convergence speed and generalization performance. These promising results emphasize the potential of our enhanced Adam algorithm in a wide range of deep learning tasks.
- deep learning,
- Adam algorithm,
- Warmup,
- cosine annealing strategy,
- local optimum
Citation: Can Zhang, Yichuan Shao, Haijing Sun, Lei Xing, Qian Zhao, Le Zhang. The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms[J]. Mathematical Biosciences and Engineering, 2024, 21(1): 1270-1285. doi: 10.3934/mbe.2024054

Related Papers:

Abstract

The Adam algorithm is a common choice for optimizing neural network models. However, its application often brings challenges, such as susceptibility to local optima, overfitting and convergence problems caused by unstable learning rate behavior. In this article, we introduce an enhanced Adam optimization algorithm that integrates Warmup and cosine annealing techniques to alleviate these challenges. By integrating preheating technology into traditional Adam algorithms, we systematically improved the learning rate during the initial training phase, effectively avoiding instability issues. In addition, we adopt a dynamic cosine annealing strategy to adaptively adjust the learning rate, improve local optimization problems and enhance the model's generalization ability. To validate the effectiveness of our proposed method, extensive experiments were conducted on various standard datasets and compared with traditional Adam and other optimization methods. Multiple comparative experiments were conducted using multiple optimization algorithms and the improved algorithm proposed in this paper on multiple datasets. On the MNIST, CIFAR10 and CIFAR100 datasets, the improved algorithm proposed in this paper achieved accuracies of 98.87%, 87.67% and 58.88%, respectively, with significant improvements compared to other algorithms. The experimental results clearly indicate that our joint enhancement of the Adam algorithm has resulted in significant improvements in model convergence speed and generalization performance. These promising results emphasize the potential of our enhanced Adam algorithm in a wide range of deep learning tasks.

References

[1]	Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 521 (2015), 436–444. https://doi.org/10.1038/nature14539 doi: 10.1038/nature14539
[2]	A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, B. Recht, The marginal value of adaptive gradient methods in machine learning, in Advances in Neural Information Processing Systems, 30 (2017).
[3]	S. J. Reddi, S. Kale, S. Kumar, On the convergence of Adam and beyond, preprint, arXiv: 1904.09237.
[4]	I. Loshchilov, F. Hutter, Fixing weight decay regularization in Adam, 2018. Available from: https://openreview.net/forum?id = rk6qdGgCZ.
[5]	Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, et al., Large batch optimization for deep learning: Training BERT in 76 minutes, preprint, arXiv: 1904.00962.
[6]	S. J. Reddi, S. Kale, S. Kumar, On the convergence of Adam and beyond, preprint arXiv: 1904.09237.
[7]	L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, et al., On the variance of the adaptive learning rate and beyond, preprint, arXiv: 1908.03265.
[8]	J. Zhuang, T. Tang, Y. Ding, S. C. Tatikonda, N. Dvornek, X. Papademetris, et al., AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients, in Advances in Neural Information Processing System, 33 (2020), 18795–18806.
[9]	W. Ilboudo, T. Kobayashi, K. Sugimoto, Robust stochastic gradient descent with student-t distribution based first-order momentum, IEEE Trans. Neural Networks Learn. Syst., 33 (2020), 1324–1337. https://doi.org/10.1109/TNNLS.2020.3041755 doi: 10.1109/TNNLS.2020.3041755
[10]	T. Dozat, Incorporating Nesterov Momentum into Adam, 2016. Available from: https://openreview.net/forum?id = OM0jvwB8jIp57ZJjtNEZ.
[11]	L. Luo, Y. Xiong, Y. Liu, X. Sun, Adaptive gradient methods with dynamic bound of learning rate, preprint, arXiv: 1902.09843.
[12]	G. Mordido, P. Malviya, A. Baratin, S. Chandar, Lookbehind optimizer: k steps back, 1 step forward, preprint, arXiv: 2307.16704.
[13]	M. Reyad, A. M. Sarhan, M. Arafa, A modified Adam algorithm for deep neural network optimization, Neural Comput. Appl., 2023 (2023), 1–18. https://doi.org/10.1007/s00521-023-08568-z doi: 10.1007/s00521-023-08568-z
[14]	X. Chen, C. Liang, D. Huang, E. Real, K. Wang, Y. Liu, et al., Symbolic discovery of optimization algorithms, preprint, arXiv: 2302.06675.
[15]	R. Tian, A. P. Parikh, Amos: An Adam-style optimizer with adaptive weight decay towards model-oriented scale, preprint, arXiv: 2210.11693.
[16]	M. Liu, D. Yao, Z. Liu, J. Guo, J. Chen, An improved Adam optimization algorithm combining adaptive coefficients and composite gradients based on randomized block coordinate descent, Comput. Intell. Neurosci., 2023 (2023), 1–13. https://doi.org/10.1155/2023/4765891 doi: 10.1155/2023/4765891
[17]	X. Xie, P. Zhou, H. Li, Z. Lin, S. Yan, Adan: Adaptive Nesterov momentum algorithm for faster optimizing deep models, preprint, arXiv: 2208.06677.
[18]	D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980.
[19]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 770–778.

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)