Distributed stochastic optimization algorithm based on Markov sampling

Hao Zan; Dan Chen; Hao Zan; Dan Chen

doi:10.3934/math.2026166

AIMS Mathematics

2026, Volume 11, Issue 2: 4123-4146. doi: 10.3934/math.2026166

Previous Article Next Article

Research article

Distributed stochastic optimization algorithm based on Markov sampling

Hao Zan ,
Dan Chen ^,

Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650500, China

Received: 11 October 2025 Revised: 19 January 2026 Accepted: 28 January 2026 Published: 10 February 2026
MSC : 90C15, 62L20, 68W15

With the rapid proliferation of big data and high-dimensional datasets, distributed estimation methods have become indispensable for large-scale statistical inference. However, traditional centralized distributed computing frameworks often suffer from communication bottlenecks and privacy risks. To address these challenges, we propose a novel distributed stochastic optimization algorithm, termed gradient-based Markov subsampling Gradient Tracking with Variance Reduction (GMS-GT-VR). This algorithm seamlessly integrates gradient tracking and variance reduction techniques with a data-driven gradient-based Markov subsampling (GMS) strategy to solve large-scale distributed optimization problems over multi-agent networks efficiently. By leveraging a lightweight coordination mechanism for adaptive sampling, GMS-GT-VR effectively reduces both communication overhead and computational complexity, while achieving accelerated convergence. We rigorously establish its theoretical convergence rate of $ \mathcal{O}(1/k) $ and conduct a detailed complexity analysis. To validate the theoretical findings, we perform extensive experiments on large-scale datasets. The results consistently demonstrate that GMS-GT-VR outperforms existing variance-reduction methods in terms of communication efficiency and convergence speed, achieving a significant reduction in the number of iterations required.
- distributed stochastic optimization algorithms,
- variance reduction techniques,
- Markov chain,
- resampling methods
Citation: Hao Zan, Dan Chen. Distributed stochastic optimization algorithm based on Markov sampling[J]. AIMS Mathematics, 2026, 11(2): 4123-4146. doi: 10.3934/math.2026166

Related Papers:

Abstract

With the rapid proliferation of big data and high-dimensional datasets, distributed estimation methods have become indispensable for large-scale statistical inference. However, traditional centralized distributed computing frameworks often suffer from communication bottlenecks and privacy risks. To address these challenges, we propose a novel distributed stochastic optimization algorithm, termed gradient-based Markov subsampling Gradient Tracking with Variance Reduction (GMS-GT-VR). This algorithm seamlessly integrates gradient tracking and variance reduction techniques with a data-driven gradient-based Markov subsampling (GMS) strategy to solve large-scale distributed optimization problems over multi-agent networks efficiently. By leveraging a lightweight coordination mechanism for adaptive sampling, GMS-GT-VR effectively reduces both communication overhead and computational complexity, while achieving accelerated convergence. We rigorously establish its theoretical convergence rate of $ \mathcal{O}(1/k) $ and conduct a detailed complexity analysis. To validate the theoretical findings, we perform extensive experiments on large-scale datasets. The results consistently demonstrate that GMS-GT-VR outperforms existing variance-reduction methods in terms of communication efficiency and convergence speed, achieving a significant reduction in the number of iterations required.

References

[1]	M. I. Jordan, J. D. Lee, Y. Yang, Communication-efficient distributed statistical inference, J. Amer. Statist. Assoc., 114 (2019), 668–681. https://doi.org/10.1080/01621459.2018.1429274 doi: 10.1080/01621459.2018.1429274
[2]	J. D. Lee, Q. Lin, T. Ma, T. Yang, Distributed stochastic variance reduced gradient methods by sampling extra data with replacement, J. Mach. Learn. Res., 18 (2017), 1–43. Available from: http://jmlr.org/papers/v18/16-640.html
[3]	J. D. Lee, Q. Liu, Y. Sun, J. E. Taylor, Communication-efficient sparse regression, J. Mach. Learn. Res., 18 (2017), 1–30. Available from: http://jmlr.org/papers/v18/16-002.html
[4]	N. Emirov, G. Song, Q. Sun, A divide-and-conquer algorithm for distributed optimization on networks, Appl. Comput. Harmon. Anal., 70 (2024), 101623. https://doi.org/10.1016/j.acha.2023.101623 doi: 10.1016/j.acha.2023.101623
[5]	Y. Gao, W. Liu, H. Wang, X. Wang, Y. Yan, R. Zhang, A review of distributed statistical inference, Stat. Theory Relat. Fields, 6 (2022), 89–99. https://doi.org/10.1080/24754269.2021.1974158 doi: 10.1080/24754269.2021.1974158
[6]	T. Zhao, M. Kolar, H. Liu, A general framework for robust testing and confidence regions in high-dimensional quantile regression, arXiv preprint arXiv: 1412.8724, 2015. Available from: https://arXiv.org/abs/1412.8724
[7]	J. Wang, T. Zhang, Improved optimization of finite sums with minibatch stochastic variance reduced proximal iterations, arXiv preprint arXiv: 1706.07001, 2017. Available from: https://arXiv.org/abs/1706.07001
[8]	T. Li, A. Kyrillidis, L. Liu, C. Caramanis, Approximate Newton-based statistical inference using only stochastic gradients, arXiv preprint arXiv: 1805.08920, 2019. Available from: https://arXiv.org/abs/1805.08920
[9]	X. Chen, W. Liu, Y. Zhang, First-order Newton-type estimator for distributed estimation and inference, J. Amer. Statist. Assoc., 117 (2022), 1858–1874. https://doi.org/10.1080/01621459.2021.1891925 doi: 10.1080/01621459.2021.1891925
[10]	A. Defazio, F. Bach, S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Adv. Neural Inf. Process. Syst. (eds. Z. Ghahramani, et al.), vol. 27. Curran Associates, Inc., 2014. Available from: https://proceedings.neurips.cc/paper/2014/file/937964195d6fb3a55cd7cc578165f058-Paper.pdf
[11]	L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, in Proc. Mach. Learn. Res. (eds. D. Precup, Y. W. Teh), vol. 70, PMLR, 2017, 2613–2621. Available from: https://proceedings.mlr.press/v70/nguyen17b.html
[12]	J. Lei, U. V. Shanbhag, Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes, Optim. Methods Softw., 37 (2022), 264–294. https://doi.org/10.1080/10556788.2020.1746963 doi: 10.1080/10556788.2020.1746963
[13]	R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst. (eds. C. Burges, et al.), vol. 26. Curran Associates, Inc., 2013. Available from: https://proceedings.neurips.cc/paper/2013/file/ac1dd209cbcc5e5d1c6e28598e8cbbe8-Paper.pdf
[14]	H. Sun, S. Lu, M. Hong, Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking, in Proc. Mach. Learn. Res. (eds. H. D. Ⅲ, A. Singh), vol. 119, PMLR, 2020, 9217–9228. Available from: https://proceedings.mlr.press/v119/sun20a.html
[15]	R. Xin, U. A. Khan, S. Kar, Fast decentralized nonconvex finite-sum optimization with recursive variance reduction, SIAM J. Optim., 32 (2022), 1–28. https://doi.org/10.1137/20M1361158 doi: 10.1137/20M1361158
[16]	R. Xin, U. A. Khan, S. Kar, A fast randomized incremental gradient method for decentralized nonconvex optimization, IEEE Trans. Autom. Control, 67 (2022), 5150–5165. https://doi.org/10.1109/TAC.2021.3122586 doi: 10.1109/TAC.2021.3122586
[17]	X. Jiang, X. Zeng, J. Sun, J. Chen, Distributed stochastic gradient tracking algorithm with variance reduction for non-convex optimization, IEEE Trans. Neural Netw. Learn. Syst., 34 (2023), 5310–5321. https://doi.org/10.1109/TNNLS.2022.3170944 doi: 10.1109/TNNLS.2022.3170944
[18]	H. Li, Z. Lin, Accelerated gradient tracking over time-varying graphs for decentralized optimization, J. Mach. Learn. Res., 25 (2024), 1–52. Available from: http://jmlr.org/papers/v25/21-0475.html
[19]	J. Chen, J. Feng, W. Gao, K. Wei, Decentralized natural policy gradient with variance reduction for collaborative multi-agent reinforcement learning, J. Mach. Learn. Res., 25 (2024), 1–49. Available from: http://jmlr.org/papers/v25/22-1036.html
[20]	J. Xia, G. Chen, J. H. Park, H. Shen, G. Zhuang, Dissipativity-based sampled-data control for fuzzy switched Markovian jump systems, IEEE Trans. Fuzzy Syst., 29 (2021), 1325–1339. https://doi.org/10.1109/TFUZZ.2020.2970856 doi: 10.1109/TFUZZ.2020.2970856
[21]	G. Zhuang, S. Xu, J. Xia, Q. Ma, Z. Zhang, Non-fragile delay feedback control for neutral stochastic Markovian jump systems with time-varying delays, Appl. Math. Comput., 355 (2019), 21–32. https://doi.org/10.1016/j.amc.2019.02.057 doi: 10.1016/j.amc.2019.02.057
[22]	X. Liang, J. Xia, G. Chen, H. Zhang, Z. Wang, Dissipativity-based sampled-data control for fuzzy Markovian jump systems, Appl. Math. Comput., 361 (2019), 552–564. https://doi.org/10.1016/j.amc.2019.05.038 doi: 10.1016/j.amc.2019.05.038
[23]	T. Gong, Q. Xi, C. Xu, Robust gradient-based Markov subsampling, Proc. AAAI Conf. Artif. Intell., 34 (2020), 4004–4011. https://doi.org/10.1609/aaai.v34i04.5817 doi: 10.1609/aaai.v34i04.5817
[24]	P. Di Lorenzo, G. Scutari, NEXT: In-network nonconvex optimization, IEEE Trans. Signal Inf. Process. Netw., 2 (2016), 120–136. https://doi.org/10.1109/TSIPN.2016.2524588 doi: 10.1109/TSIPN.2016.2524588
[25]	D. Rudolf, Explicit error bounds for Markov chain Monte Carlo, Diss. Math., 485 (2012), 1–93. http://dx.doi.org/10.4064/dm485-0-1 doi: 10.4064/dm485-0-1

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)