As the scale of optimization problems expands, the performance of the alternating direction method of multipliers (ADMM) exhibits a significant downward trend. In this paper, aiming at solving nonconvex, nonsmooth optimization problems under large-scale linear constraints, we proposed a unified framework of novel stochastic inexact ADMMs incorporating inertial terms and Bregman distances. By fusing the Bregman distance with inertial acceleration techniques, the framework not only covers stochastic gradient descent and existing variance-reduced gradient estimation techniques such as the stochastic variance-reduced gradient and stochastic recursive gradient, but also allows for a more flexible double-step strategy in convergence analysis. Without depending on the Kurdyka–Łojasiewicz property and under some suitable mild conditions, we demonstrated global convergence of this unified framework, and showed that it achieves a sublinear convergence rate of $ \mathcal{O}(1/{\mathbb K}) $, where $ {\mathbb K} $ is the number of iterations. Further, under error bound conditions, the linear convergence rate of the stochastic inexact ADMMs was established. Finally, the effectiveness of stochastic inexact ADMMs for solving some nonsmooth and nonconvex problems was verified by numerical experiments on the graphically guided fusion LASSO problems.
Citation: Yi-xin Yang, Heng-you Lan, Lin-cheng Jiang. Novel inertial stochastic Bregman inexact ADMMs for solving large-scale nonconvex and nonsmooth optimization without relying on the Kurdyka–Łojasiewicz property[J]. AIMS Mathematics, 2025, 10(10): 24804-24835. doi: 10.3934/math.20251099
As the scale of optimization problems expands, the performance of the alternating direction method of multipliers (ADMM) exhibits a significant downward trend. In this paper, aiming at solving nonconvex, nonsmooth optimization problems under large-scale linear constraints, we proposed a unified framework of novel stochastic inexact ADMMs incorporating inertial terms and Bregman distances. By fusing the Bregman distance with inertial acceleration techniques, the framework not only covers stochastic gradient descent and existing variance-reduced gradient estimation techniques such as the stochastic variance-reduced gradient and stochastic recursive gradient, but also allows for a more flexible double-step strategy in convergence analysis. Without depending on the Kurdyka–Łojasiewicz property and under some suitable mild conditions, we demonstrated global convergence of this unified framework, and showed that it achieves a sublinear convergence rate of $ \mathcal{O}(1/{\mathbb K}) $, where $ {\mathbb K} $ is the number of iterations. Further, under error bound conditions, the linear convergence rate of the stochastic inexact ADMMs was established. Finally, the effectiveness of stochastic inexact ADMMs for solving some nonsmooth and nonconvex problems was verified by numerical experiments on the graphically guided fusion LASSO problems.
| [1] |
F. H. Clarke, Generalized gradients and applications, Trans. Am. Math. Soc., 205 (1975), 247–262. https://doi.org/10.2307/1997202 doi: 10.2307/1997202
|
| [2] | T. Chan, S. Esedoglu, F. Park, A. Yip, Total variation image restoration: Overview and recent developments, In: Handbook of Mathematical Models in Computer Vision, New York: Springer, 2006, 17–31. https://doi.org/10.1007/0-387-28831-7_2 |
| [3] |
F. Iutzeler, J. Malick, Nonsmoothness in machine learning: Specific structure, proximal identification, and applications, Set-Valued Var. Anal., 28 (2020), 661–678. https://doi.org/10.1007/s11228-020-00561-1 doi: 10.1007/s11228-020-00561-1
|
| [4] | D. Davis, B. Edmunds, M. Udell, The sound of apalm clapping: Faster nonsmooth nonconvex optimization with stochastic asynchronous palm, In: Advances in Neural Information Processing Systems, 29 (2016), 226–234. |
| [5] |
C. J. Hsieh, M. A. Sustik, I. S. Dhillon, P. Ravikumar, QUIC: Quadratic approximation for sparse inverse covariance estimation, J. Mach. Learn. Res., 15 (2014), 2911–2947. https://doi.org/10.5555/2627435.2697058 doi: 10.5555/2627435.2697058
|
| [6] |
S. Kim, K. A. Sohn, E. P. Xing, A multivariate regression approach to association analysis of a quantitative trait network, Bioinformatics, 25 (2009), i204–i212. https://doi.org/10.1093/bioinformatics/btp218 doi: 10.1093/bioinformatics/btp218
|
| [7] |
Y. Liu, F. Shang, H. Liu, L. Kong, L. Jiao, Z. Lin, Accelerated variance reduction stochastic ADMM for large-scale machine learning, IEEE Trans. Pattern Anal. Mach. Intell., 43 (2020), 4242–4255. https://doi.org/10.1109/TPAMI.2020.3000512 doi: 10.1109/TPAMI.2020.3000512
|
| [8] |
Y. Cheung, J. Lou, Proximal average approximated incremental gradient descent for composite penalty regularized empirical risk minimization, Mach. Learn., 106 (2017), 595–622. https://doi.org/10.1007/s10994-016-5609-1 doi: 10.1007/s10994-016-5609-1
|
| [9] |
L. Liu, C. Han, T. Guo, S. Liao, An inertial stochastic Bregman generalized alternating direction method of multipliers for nonconvex and nonsmooth optimization, Expert Syst. Appl., 276 (2025), 126939. https://doi.org/10.1016/j.eswa.2025.126939 doi: 10.1016/j.eswa.2025.126939
|
| [10] | P. Gong, C. Zhang, Z. Lu, J. Huang, J. Ye, A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, In: International Conference on Machine Learning, 2013, 37–45. |
| [11] | M. R. Metel, A. Takeda, Stochastic proximal methods for non-smooth non-convex constrained sparse optimization, J. Mach. Learn. Res., 22 (2021), 1–36. |
| [12] |
R. G. Crespo, E. Verdú, M. Khari, A. K. Garg, Gesture recognition of RGB and RGB-D static images using convolutional neural networks, Int. J. Interact. Multimed. Artif. Intell., 5 (2019), 22–27. https://doi.org/10.9781/ijimai.2019.09.002 doi: 10.9781/ijimai.2019.09.002
|
| [13] |
Y. Tian, Y. Zhang, H. Zhang, Recent advances in stochastic gradient descent in deep learning, Mathematics, 11 (2023), 682. https://doi.org/10.3390/math11030682 doi: 10.3390/math11030682
|
| [14] | W. Zhong, J. Kwok, Fast stochastic alternating direction method of multipliers, In: International Conference on Machine Learning, PMLR, (2014), 46–54. |
| [15] | T. Suzuki, Stochastic dual coordinate ascent with alternating direction method of multipliers, In: International Conference on Machine Learning, PMLR, 2014,736–744. |
| [16] | R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, In: Advances in Neural Information Processing Systems, New York, 26 (2013), 315–323. |
| [17] | L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, In: International Conference on Machine Learning, PMLR, 2017, 2613–2621. |
| [18] |
Y. Zhao, M. Li, X. Pan, J. Tan, Partial symmetric regularized alternating direction method of multipliers for non-convex split feasibility problems, AIMS Math., 10 (2025), 3041–3061. https://doi.org/10.3934/math.2025142 doi: 10.3934/math.2025142
|
| [19] |
M. Yashtini, Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization, J. Glob. Optim., 84 (2022), 913–939. https://doi.org/10.1007/s10898-022-01174-8 doi: 10.1007/s10898-022-01174-8
|
| [20] | F. Huang, S. Chen, Z. Lu, Stochastic alternating direction method of multipliers with variance reduction for nonconvex optimization, preprint paper, 2016. https://doi.org/10.48550/arXiv.1610.02758 |
| [21] |
F. Bian, J. Liang, X. Zhang, A stochastic alternating direction method of multipliers for non-smooth and non-convex optimization, Inverse Probl., 37 (2021), 075009. https://doi.org/10.1088/1361-6420/ac0966 doi: 10.1088/1361-6420/ac0966
|
| [22] | Y. Zeng, Z. Wang, J. Bai, X. Shen, An accelerated variance reduction stochastic ADMM for nonconvex nonsmooth optimization, In: 2022 China Automation Congress (CAC), Institute of Electrical and Electronics Engineers, 2022, 4853–4858. https://doi.org/10.1109/CAC57257.2022.10055828 |
| [23] |
Y. Zeng, Z. Wang, J. Bai, X. Shen, An accelerated stochastic ADMM for nonconvex and nonsmooth finite-sum optimization, Automatica, 163 (2024), 111554. https://doi.org/10.1016/j.automatica.2024.111554 doi: 10.1016/j.automatica.2024.111554
|
| [24] |
J. C. Bai, F. M. Bian, X. K. Chang, L. Du, Accelerated stochastic Peaceman-Rachford method for empirical risk minimization, J. Oper. Res. Soc., 11 (2023), 783–807. https://doi.org/10.1007/s40305-023-00470-8 doi: 10.1007/s40305-023-00470-8
|
| [25] |
J. Bai, D. Han, H. Sun, H. Zhang, Convergence on a symmetric accelerated stochastic ADMM with larger stepsizes, CSIAM Trans. Appl. Math., 3 (2022), 448–479. https://doi.org/10.1103/physrevd.105.032007 doi: 10.1103/physrevd.105.032007
|
| [26] | F. Wang, Z. Xu, H. K. Xu, Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems, preprint paper, 2014. https://doi.org/10.48550/arXiv.1410.8625 |
| [27] |
M. Chao, Z. Deng, J. Jian, Convergence of linear Bregman ADMM for nonconvex and nonsmooth problems with nonseparable structure, Complexity, 2020 (2020), 6237942. https://doi.org/10.1155/2020/6237942 doi: 10.1155/2020/6237942
|
| [28] |
P. J. Liu, J. B. Jian, B. He, X. Z. Jiang, Convergence of Bregman Peaceman-Rachford splitting method for nonconvex nonseparable optimization, J. Oper. Res. Soc. China, 11 (2023), 707–733. https://doi.org/10.1007/s40305-022-00411-x doi: 10.1007/s40305-022-00411-x
|
| [29] |
L. Tan, K. Guo, Bregman ADMM: A new algorithm for nonconvex nonconvex wuth linear constraints, J. Nonlinear Var. Anal., 9 (2025), 179–196. https://doi.org/10.23952/jnva.9.2025.2.02 doi: 10.23952/jnva.9.2025.2.02
|
| [30] |
P. J. Liu, J. B. Jian, H. Shao, X. Q. Wang, J. W. Xu, X. Y. Wu, A Bregman-style improved ADMM and its linearized version in the nonconvex setting: convergence and rate analyses, J. Oper. Res. Soc. China., 12 (2024), 298–340. https://doi.org/10.1007/s40305-023-00535-8 doi: 10.1007/s40305-023-00535-8
|
| [31] |
B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Comput. Math. Math. Phys., 4 (1964), 1–17. https://doi.org/10.1016/0041-5553(64)90137-5 doi: 10.1016/0041-5553(64)90137-5
|
| [32] |
L. T. K. Hien, D. Papadimitriou, An inertial ADMM for a class of nonconvex composite optimization with nonlinear coupling constraints, J. Glob. Optim., 89 (2024), 927–948. https://doi.org/10.1007/s10898-024-01382-4 doi: 10.1007/s10898-024-01382-4
|
| [33] | F. Huang, S. Chen, H. Huang, Faster stochastic alternating direction method of multipliers for nonconvex optimization, In: International Conference on Machine Learning, PMLR, 2019, 2839–2848. |
| [34] | Z. Lin, H. Li, C. Fang, Accelerated stochastic algorithms, In: Accelerated Optimization for Machine Learning, 2020,137–207. https://doi.org/10.1007/978-981-15-2910-8_5 |
| [35] |
M. Chao, Y. Geng, Y. Zhao, A method of inertial regularized ADMM for separable nonconvex optimization problems, Soft Comput., 27 (2023), 16741–16757. https://doi.org/10.1007/s00500-023-09017-8 doi: 10.1007/s00500-023-09017-8
|
| [36] |
J. Bai, M. Zhang, H. Zhang, An inexact ADMM for separable nonconvex and nonsmooth optimization, Comput. Optim. Appl., 90 (2025), 445–479. https://doi.org/10.1007/s10589-024-00643-y doi: 10.1007/s10589-024-00643-y
|
| [37] | Y. Zeng, J. Bai, S. Wang, Z. Wang, A unified inexact stochastic admm for composite nonconvex and nonsmooth optimization, preprint paper, 2024. https://doi.org/10.48550/arXiv.2403.02015 |
| [38] |
J. Yin, C. Tang, J. Jian, Q. Huang, A partial Bregman ADMM with a general relaxation factor for structured nonconvex and nonsmooth optimization, J. Glob. Optim., 89 (2024), 899–926. https://doi.org/10.1007/s10898-024-01384-2 doi: 10.1007/s10898-024-01384-2
|
| [39] | H. Robbins, S. Monro, A stochastic approximation method, Ann. Math. Statist., 1951,400–407. https://doi.org/10.1109/TSMC.1971.4308316 |
| [40] | R. T. Rockafellar, R. J. B. Wets, Variational analysis, Berlin: Springer Science & Business Media, 1998. https://doi.org/10.1007/978-3-642-02431-3 |
| [41] | M. L. N. Gonçalves, J. G. Melo, R. D. C. Monteiro, Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems, Pac. J. Optim., 15 (2019), 379–398. |
| [42] |
D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximation, Comput. Math. Appl., 2 (1976), 17–40. https://doi.org/10.1016/0898-1221(76)90003-1 doi: 10.1016/0898-1221(76)90003-1
|
| [43] | R. Glowinski, A. Marroco, Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires, Rev. Fr. Autom. Inform. Rech. Opér., Anal. Numér., 9 (1975), 41–76. https://doi.org/10.1051/m2an/197509R200411 |
| [44] | F. Huang, S. Chen, Mini-batch stochastic ADMMs for nonconvex nonsmooth optimization, preprint paper, 2018. https://doi.org/10.48550/arXiv.1802.03284 |
| [45] | S. Zheng, J. T. Kwok, Fast-and-Light stochastic ADMM, In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, Palo Alto, California: AAAI Press, 2016, 2407–2613. |
| [46] |
S. J. Wright, R. D. Nowak, M. A. T. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., 57 (2009), 2479–2493. https://doi.org/10.1109/ICASSP.2008.4518374 doi: 10.1109/ICASSP.2008.4518374
|
| [47] |
C. C. Chang, C. J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol., 2 (2011), 1–27. https://doi.org/10.1145/1961189.1961199 doi: 10.1145/1961189.1961199
|