With the rapid proliferation of big data and high-dimensional datasets, distributed estimation methods have become indispensable for large-scale statistical inference. However, traditional centralized distributed computing frameworks often suffer from communication bottlenecks and privacy risks. To address these challenges, we propose a novel distributed stochastic optimization algorithm, termed gradient-based Markov subsampling Gradient Tracking with Variance Reduction (GMS-GT-VR). This algorithm seamlessly integrates gradient tracking and variance reduction techniques with a data-driven gradient-based Markov subsampling (GMS) strategy to solve large-scale distributed optimization problems over multi-agent networks efficiently. By leveraging a lightweight coordination mechanism for adaptive sampling, GMS-GT-VR effectively reduces both communication overhead and computational complexity, while achieving accelerated convergence. We rigorously establish its theoretical convergence rate of $ \mathcal{O}(1/k) $ and conduct a detailed complexity analysis. To validate the theoretical findings, we perform extensive experiments on large-scale datasets. The results consistently demonstrate that GMS-GT-VR outperforms existing variance-reduction methods in terms of communication efficiency and convergence speed, achieving a significant reduction in the number of iterations required.
Citation: Hao Zan, Dan Chen. Distributed stochastic optimization algorithm based on Markov sampling[J]. AIMS Mathematics, 2026, 11(2): 4123-4146. doi: 10.3934/math.2026166
With the rapid proliferation of big data and high-dimensional datasets, distributed estimation methods have become indispensable for large-scale statistical inference. However, traditional centralized distributed computing frameworks often suffer from communication bottlenecks and privacy risks. To address these challenges, we propose a novel distributed stochastic optimization algorithm, termed gradient-based Markov subsampling Gradient Tracking with Variance Reduction (GMS-GT-VR). This algorithm seamlessly integrates gradient tracking and variance reduction techniques with a data-driven gradient-based Markov subsampling (GMS) strategy to solve large-scale distributed optimization problems over multi-agent networks efficiently. By leveraging a lightweight coordination mechanism for adaptive sampling, GMS-GT-VR effectively reduces both communication overhead and computational complexity, while achieving accelerated convergence. We rigorously establish its theoretical convergence rate of $ \mathcal{O}(1/k) $ and conduct a detailed complexity analysis. To validate the theoretical findings, we perform extensive experiments on large-scale datasets. The results consistently demonstrate that GMS-GT-VR outperforms existing variance-reduction methods in terms of communication efficiency and convergence speed, achieving a significant reduction in the number of iterations required.
| [1] |
M. I. Jordan, J. D. Lee, Y. Yang, Communication-efficient distributed statistical inference, J. Amer. Statist. Assoc., 114 (2019), 668–681. https://doi.org/10.1080/01621459.2018.1429274 doi: 10.1080/01621459.2018.1429274
|
| [2] | J. D. Lee, Q. Lin, T. Ma, T. Yang, Distributed stochastic variance reduced gradient methods by sampling extra data with replacement, J. Mach. Learn. Res., 18 (2017), 1–43. Available from: http://jmlr.org/papers/v18/16-640.html |
| [3] | J. D. Lee, Q. Liu, Y. Sun, J. E. Taylor, Communication-efficient sparse regression, J. Mach. Learn. Res., 18 (2017), 1–30. Available from: http://jmlr.org/papers/v18/16-002.html |
| [4] |
N. Emirov, G. Song, Q. Sun, A divide-and-conquer algorithm for distributed optimization on networks, Appl. Comput. Harmon. Anal., 70 (2024), 101623. https://doi.org/10.1016/j.acha.2023.101623 doi: 10.1016/j.acha.2023.101623
|
| [5] |
Y. Gao, W. Liu, H. Wang, X. Wang, Y. Yan, R. Zhang, A review of distributed statistical inference, Stat. Theory Relat. Fields, 6 (2022), 89–99. https://doi.org/10.1080/24754269.2021.1974158 doi: 10.1080/24754269.2021.1974158
|
| [6] | T. Zhao, M. Kolar, H. Liu, A general framework for robust testing and confidence regions in high-dimensional quantile regression, arXiv preprint arXiv: 1412.8724, 2015. Available from: https://arXiv.org/abs/1412.8724 |
| [7] | J. Wang, T. Zhang, Improved optimization of finite sums with minibatch stochastic variance reduced proximal iterations, arXiv preprint arXiv: 1706.07001, 2017. Available from: https://arXiv.org/abs/1706.07001 |
| [8] | T. Li, A. Kyrillidis, L. Liu, C. Caramanis, Approximate Newton-based statistical inference using only stochastic gradients, arXiv preprint arXiv: 1805.08920, 2019. Available from: https://arXiv.org/abs/1805.08920 |
| [9] |
X. Chen, W. Liu, Y. Zhang, First-order Newton-type estimator for distributed estimation and inference, J. Amer. Statist. Assoc., 117 (2022), 1858–1874. https://doi.org/10.1080/01621459.2021.1891925 doi: 10.1080/01621459.2021.1891925
|
| [10] | A. Defazio, F. Bach, S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in Adv. Neural Inf. Process. Syst. (eds. Z. Ghahramani, et al.), vol. 27. Curran Associates, Inc., 2014. Available from: https://proceedings.neurips.cc/paper/2014/file/937964195d6fb3a55cd7cc578165f058-Paper.pdf |
| [11] | L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, SARAH: A novel method for machine learning problems using stochastic recursive gradient, in Proc. Mach. Learn. Res. (eds. D. Precup, Y. W. Teh), vol. 70, PMLR, 2017, 2613–2621. Available from: https://proceedings.mlr.press/v70/nguyen17b.html |
| [12] |
J. Lei, U. V. Shanbhag, Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes, Optim. Methods Softw., 37 (2022), 264–294. https://doi.org/10.1080/10556788.2020.1746963 doi: 10.1080/10556788.2020.1746963
|
| [13] | R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Adv. Neural Inf. Process. Syst. (eds. C. Burges, et al.), vol. 26. Curran Associates, Inc., 2013. Available from: https://proceedings.neurips.cc/paper/2013/file/ac1dd209cbcc5e5d1c6e28598e8cbbe8-Paper.pdf |
| [14] | H. Sun, S. Lu, M. Hong, Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking, in Proc. Mach. Learn. Res. (eds. H. D. Ⅲ, A. Singh), vol. 119, PMLR, 2020, 9217–9228. Available from: https://proceedings.mlr.press/v119/sun20a.html |
| [15] |
R. Xin, U. A. Khan, S. Kar, Fast decentralized nonconvex finite-sum optimization with recursive variance reduction, SIAM J. Optim., 32 (2022), 1–28. https://doi.org/10.1137/20M1361158 doi: 10.1137/20M1361158
|
| [16] |
R. Xin, U. A. Khan, S. Kar, A fast randomized incremental gradient method for decentralized nonconvex optimization, IEEE Trans. Autom. Control, 67 (2022), 5150–5165. https://doi.org/10.1109/TAC.2021.3122586 doi: 10.1109/TAC.2021.3122586
|
| [17] |
X. Jiang, X. Zeng, J. Sun, J. Chen, Distributed stochastic gradient tracking algorithm with variance reduction for non-convex optimization, IEEE Trans. Neural Netw. Learn. Syst., 34 (2023), 5310–5321. https://doi.org/10.1109/TNNLS.2022.3170944 doi: 10.1109/TNNLS.2022.3170944
|
| [18] | H. Li, Z. Lin, Accelerated gradient tracking over time-varying graphs for decentralized optimization, J. Mach. Learn. Res., 25 (2024), 1–52. Available from: http://jmlr.org/papers/v25/21-0475.html |
| [19] | J. Chen, J. Feng, W. Gao, K. Wei, Decentralized natural policy gradient with variance reduction for collaborative multi-agent reinforcement learning, J. Mach. Learn. Res., 25 (2024), 1–49. Available from: http://jmlr.org/papers/v25/22-1036.html |
| [20] |
J. Xia, G. Chen, J. H. Park, H. Shen, G. Zhuang, Dissipativity-based sampled-data control for fuzzy switched Markovian jump systems, IEEE Trans. Fuzzy Syst., 29 (2021), 1325–1339. https://doi.org/10.1109/TFUZZ.2020.2970856 doi: 10.1109/TFUZZ.2020.2970856
|
| [21] |
G. Zhuang, S. Xu, J. Xia, Q. Ma, Z. Zhang, Non-fragile delay feedback control for neutral stochastic Markovian jump systems with time-varying delays, Appl. Math. Comput., 355 (2019), 21–32. https://doi.org/10.1016/j.amc.2019.02.057 doi: 10.1016/j.amc.2019.02.057
|
| [22] |
X. Liang, J. Xia, G. Chen, H. Zhang, Z. Wang, Dissipativity-based sampled-data control for fuzzy Markovian jump systems, Appl. Math. Comput., 361 (2019), 552–564. https://doi.org/10.1016/j.amc.2019.05.038 doi: 10.1016/j.amc.2019.05.038
|
| [23] |
T. Gong, Q. Xi, C. Xu, Robust gradient-based Markov subsampling, Proc. AAAI Conf. Artif. Intell., 34 (2020), 4004–4011. https://doi.org/10.1609/aaai.v34i04.5817 doi: 10.1609/aaai.v34i04.5817
|
| [24] |
P. Di Lorenzo, G. Scutari, NEXT: In-network nonconvex optimization, IEEE Trans. Signal Inf. Process. Netw., 2 (2016), 120–136. https://doi.org/10.1109/TSIPN.2016.2524588 doi: 10.1109/TSIPN.2016.2524588
|
| [25] |
D. Rudolf, Explicit error bounds for Markov chain Monte Carlo, Diss. Math., 485 (2012), 1–93. http://dx.doi.org/10.4064/dm485-0-1 doi: 10.4064/dm485-0-1
|