In this manuscript, we consider well-known multi-task learning (MTL) models from the literature for linear regression problems, such as clustered MTL or weakly constrained MTL. We propose novel reformulations of the training problem for these models, based on mixed-integer quadratic programming (MIQP) techniques. We show that our approach allows to drive the optimization process up to certified global optimality, exploiting popular off-the-shelf software solvers. By computational experiments on both synthetic and real-world datasets, we show that this strategy generally leads to improvements in terms of the predictive performance of the models, if compared to the classical local optimization techniques, based on alternating minimization strategies, that are usually employed. We also suggest a number of possible extensions of our model that should further improve the quality of the obtained regressors, introducing, for example, sparsity and features selection elements.
Citation: Matteo Lapucci, Davide Pucci. Mixed-integer quadratic programming reformulations of multi-task learning models[J]. Mathematics in Engineering, 2023, 5(1): 1-16. doi: 10.3934/mine.2023020
[1] | Abdennasser Chekroun, Mohammed Nor Frioui, Toshikazu Kuniya, Tarik Mohammed Touaoula . Global stability of an age-structured epidemic model with general Lyapunov functional. Mathematical Biosciences and Engineering, 2019, 16(3): 1525-1553. doi: 10.3934/mbe.2019073 |
[2] | C. Connell McCluskey . Global stability of an epidemic model with delay and general nonlinear incidence. Mathematical Biosciences and Engineering, 2010, 7(4): 837-850. doi: 10.3934/mbe.2010.7.837 |
[3] | Cunjuan Dong, Changcheng Xiang, Wenjin Qin, Yi Yang . Global dynamics for a Filippov system with media effects. Mathematical Biosciences and Engineering, 2022, 19(3): 2835-2852. doi: 10.3934/mbe.2022130 |
[4] | Hui Cao, Yicang Zhou, Zhien Ma . Bifurcation analysis of a discrete SIS model with bilinear incidence depending on new infection. Mathematical Biosciences and Engineering, 2013, 10(5&6): 1399-1417. doi: 10.3934/mbe.2013.10.1399 |
[5] | Pan Yang, Jianwen Feng, Xinchu Fu . Cluster collective behaviors via feedback pinning control induced by epidemic spread in a patchy population with dispersal. Mathematical Biosciences and Engineering, 2020, 17(5): 4718-4746. doi: 10.3934/mbe.2020259 |
[6] | Yoichi Enatsu, Yukihiko Nakata . Stability and bifurcation analysis of epidemic models with saturated incidence rates: An application to a nonmonotone incidence rate. Mathematical Biosciences and Engineering, 2014, 11(4): 785-805. doi: 10.3934/mbe.2014.11.785 |
[7] | Fang Wang, Juping Zhang, Maoxing Liu . Dynamical analysis of a network-based SIR model with saturated incidence rate and nonlinear recovery rate: an edge-compartmental approach. Mathematical Biosciences and Engineering, 2024, 21(4): 5430-5445. doi: 10.3934/mbe.2024239 |
[8] | Thomas Torku, Abdul Khaliq, Fathalla Rihan . SEINN: A deep learning algorithm for the stochastic epidemic model. Mathematical Biosciences and Engineering, 2023, 20(9): 16330-16361. doi: 10.3934/mbe.2023729 |
[9] | Ke Guo, Wanbiao Ma . Global dynamics of an SI epidemic model with nonlinear incidence rate, feedback controls and time delays. Mathematical Biosciences and Engineering, 2021, 18(1): 643-672. doi: 10.3934/mbe.2021035 |
[10] | Cheng-Cheng Zhu, Jiang Zhu . Spread trend of COVID-19 epidemic outbreak in China: using exponential attractor method in a spatial heterogeneous SEIQR model. Mathematical Biosciences and Engineering, 2020, 17(4): 3062-3087. doi: 10.3934/mbe.2020174 |
In this manuscript, we consider well-known multi-task learning (MTL) models from the literature for linear regression problems, such as clustered MTL or weakly constrained MTL. We propose novel reformulations of the training problem for these models, based on mixed-integer quadratic programming (MIQP) techniques. We show that our approach allows to drive the optimization process up to certified global optimality, exploiting popular off-the-shelf software solvers. By computational experiments on both synthetic and real-world datasets, we show that this strategy generally leads to improvements in terms of the predictive performance of the models, if compared to the classical local optimization techniques, based on alternating minimization strategies, that are usually employed. We also suggest a number of possible extensions of our model that should further improve the quality of the obtained regressors, introducing, for example, sparsity and features selection elements.
[1] | A. Agarwal, S. Gerber, H. Daume, Learning multiple tasks using manifold regularization, In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2010, 46–54. |
[2] | A. Ahmed, A. Das, A. J. Smola, Scalable hierarchical multitask learning algorithms for conversion optimization in display advertising, In: Proceedings of the 7th ACM international conference on Web search and data mining, 2014,153–162. |
[3] | R. K. Ando, T. Zhang, P. Bartlett, A framework for learning predictive structures from multiple tasks and unlabeled data, J. Mach. Learn. Res., 6 (2005), 1817–1853. |
[4] |
A. Argyriou, T. Evgeniou, M. Pontil, Convex multi-task feature learning, Mach. Learn., 73 (2008), 243–272. http://dx.doi.org/10.1007/s10994-007-5040-8 doi: 10.1007/s10994-007-5040-8
![]() |
[5] | J. Bai, K. Zhou, G. Xue, H. Zha, G. Sun, B. Tseng, et al., Multi-task learning for learning to rank in web search, In: Proceedings of the 18th ACM conference on Information and knowledge management, 2009, 1549–1552. http://dx.doi.org/10.1145/1645953.1646169 |
[6] | A. Barzilai, K. Crammer, Convex multi-task learning by clustering, In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015, 65–73. |
[7] | D. P. Bertsekas, J. N. Tsitsiklis, Parallel and distributed computation: numerical methods, NJ: Prentice hall Englewood Cliffs, 1989. |
[8] |
D. Bertsimas, A. King, Logistic regression: From art to science, Statist. Sci., 32 (2017), 367–384. http://dx.doi.org/10.1214/16-STS602 doi: 10.1214/16-STS602
![]() |
[9] |
D. Bertsimas, A. King, R. Mazumder, Best subset selection via a modern optimization lens, Ann. Statist., 44 (2016), 813–852. http://dx.doi.org/10.1214/15-AOS1388 doi: 10.1214/15-AOS1388
![]() |
[10] | S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Now Publishers Inc, 2011. http://dx.doi.org/10.1561/2200000016 |
[11] |
R. H. Byrd, P. Lu, J. Nocedal, C. Zhu, A limited memory algorithm for bound constrained optimization, SIAM J. Sci. Comput., 16 (1995), 1190–1208. http://dx.doi.org/10.1137/0916069 doi: 10.1137/0916069
![]() |
[12] |
R. Caruana, Multitask learning, Mach. Learn., 28 (1997), 41–75. http://dx.doi.org/10.1023/A:1007379606734 doi: 10.1023/A:1007379606734
![]() |
[13] | J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structures for robust multi-task learning, In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011, 42–50. |
[14] |
E. Civitelli, M. Lapucci, F. Schoen, A. Sortino, An effective procedure for feature subset selection in logistic regression based on information criteria, Comput. Optim. Appl., 80 (2001), 1–32. http://dx.doi.org/10.1007/s10589-021-00288-1 doi: 10.1007/s10589-021-00288-1
![]() |
[15] |
L. Di Gangi, M. Lapucci, F. Schoen, A. Sortino, An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series, Comput. Optim. Appl., 74 (2019), 919–948. http://dx.doi.org/10.1007/s10589-019-00134-5 doi: 10.1007/s10589-019-00134-5
![]() |
[16] | T. Evgeniou, C. A. Micchelli, M. Pontil, Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6 (2005), 615–637. |
[17] | T. Evgeniou, M. Pontil, Regularized multi–task learning, In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004,109–117. http://dx.doi.org/10.1145/1014052.1014067 |
[18] |
H. Goldstein, Multilevel modelling of survey data, Journal of the Royal Statistical Society. Series D (The Statistician), 40 (1991), 235–244. http://dx.doi.org/10.2307/2348496 doi: 10.2307/2348496
![]() |
[19] |
A. Gómez, O. A. Prokopyev, A mixed-integer fractional optimization approach to best subset selection, INFORMS J. Comput., 33 (2021), 551–565. http://dx.doi.org/10.1287/ijoc.2020.1031 doi: 10.1287/ijoc.2020.1031
![]() |
[20] | Gurobi Optimization, LLC, Gurobi optimizer reference manual. Available from: https://www.gurobi.com/documentation/9.5/refman/index.html. |
[21] | L. Han, Y. Zhang, Learning multi-level task groups in multi-task learning, In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, 2638–2644. |
[22] | Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou, R. Maia, Fusion of multiple parameterisations for dnn-based sinusoidal speech synthesis with multi-task learning, In: Sixteenth annual conference of the international speech communication association, 2015. |
[23] | A. Jalali, S. Sanghavi, C. Ruan, P. Ravikumar, A dirty model for multi-task learning, In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, 2010,964–972. |
[24] | Z. Kang, K. Grauman, F. Sha, Learning with whom to share in multi-task feature learning, In: Proceedings of the 28 th International Conference on Machine Learning, 2011, 1–8. |
[25] |
Q. Liu, Q. Xu, V. W. Zheng, H. Xue, Z. Cao, Q. Yang, Multi-task learning for cross-platform sirna efficacy prediction: an in-silico study, BMC Bioinformatics, 11 (2010), 181. http://dx.doi.org/10.1186/1471-2105-11-181 doi: 10.1186/1471-2105-11-181
![]() |
[26] |
X. Lu, Y. Wang, X. Zhou, Z. Zhang, Z. Ling, Traffic sign recognition via multi-modal tree-structure embedded multi-task learning, IEEE Trans. Intell. Transp. Syst., 18 (2016), 960–972. http://dx.doi.org/10.1109/TITS.2016.2598356 doi: 10.1109/TITS.2016.2598356
![]() |
[27] |
R. Miyashiro, Y. Takano, Mixed integer second-order cone programming formulations for variable selection in linear regression, Eur. J. Oper. Res., 247 (2015), 721–731. http://dx.doi.org/10.1016/j.ejor.2015.06.081 doi: 10.1016/j.ejor.2015.06.081
![]() |
[28] |
T. Sato, Y. Takano, R. Miyashiro, A. Yoshise, Feature subset selection for logistic regression via mixed integer optimization, Comput. Optim. Appl., 64 (2016), 865–880. http://dx.doi.org/10.1007/s10589-016-9832-2 doi: 10.1007/s10589-016-9832-2
![]() |
[29] | C. Su, F. Yang, S. Zhang, Q. Tian, L. S. Davis, W. Gao, Multi-task learning with low rank attribute embedding for person re-identification, In: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, 3739–3747. http://dx.doi.org/10.1109/ICCV.2015.426 |
[30] | J. Torres, G. Bai, J. Wang, L. Zhao, C. Vaca, C. Abad, Sign-regularized multi-task learning, 2021, arXiv: 2102.11191. |
[31] |
A. Tsanas, M. Little, P. McSharry, L. Ramig, Accurate telemonitoring of parkinson's disease progression by noninvasive speech tests, IEEE Trans. Biomed. Eng., 57 (2010), 884–893. http://dx.doi.org/10.1109/TBME.2009.2036000 doi: 10.1109/TBME.2009.2036000
![]() |
[32] | H. Wang, F. Nie, H. Huang, S. Risacher, C. Ding, A. J. Saykin, et al., Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance, In: 2011 International Conference on Computer Vision, 2011,557–562. http://dx.doi.org/10.1109/ICCV.2011.6126288 |
[33] | H. Wang, F. Nie, H. Huang, J. Yan, S. Kim, S. Risacher, et al., High-order multi-task feature learning to identify longitudinal phenotypic markers for alzheimer's disease progression prediction, In: Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012, 1277–1285. |
[34] | J. Wang, L. Zhao, L. Wu, Multi-convex inequality-constrained alternating direction method of multipliers, 2019, arXiv: 1902.10882. |
[35] | C. Williams, E. V. Bonilla, K. M. Chai, Multi-task gaussian process prediction, In: Proceedings of the 20th International Conference on Neural Information Processing Systems, 2007,153–160. |
[36] | Z. Wu, C. Valentini-Botinhao, O. Watts, S. King, Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, 4460–4464. http://dx.doi.org/10.1109/ICASSP.2015.7178814 |
[37] |
Q. Xu, S. J. Pan, H. H. Xue, Q. Yang, Multitask learning for protein subcellular location prediction, IEEE/ACM Trans. Comput. Bi, 8 (2010), 748–759. http://dx.doi.org/10.1109/TCBB.2010.22 doi: 10.1109/TCBB.2010.22
![]() |
[38] | T. Zhang, B. Ghanem, S. Liu, N. Ahuja, Robust visual tracking via multi-task sparse learning, In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, 2042–2049. http://dx.doi.org/10.1109/CVPR.2012.6247908 |
[39] | Y. Zhang, Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng., in press. http://dx.doi.org/10.1109/TKDE.2021.3070203 |
[40] |
Y. Zhang, D.-Y. Yeung, A regularization approach to learning task relationships in multitask learning, ACM Trans. Knowl. Discov. Data, 8 (2014), 1–31. http://dx.doi.org/10.1145/2538028 doi: 10.1145/2538028
![]() |
[41] | J. Zheng, L. M. Ni, Time-dependent trajectory regression on road networks via multi-task learning, In: Proceedings of the twenty-seventh AAAI conference on artificial intelligence, 2013, 1048–1055. |
[42] | J. Zhou, J. Chen, J. Ye, Clustered multi-task learning via alternating structure optimization, Adv. Neural Inf. Process. Syst., 24 (2011), 702–710. |
[43] |
Q. Zhou, Q. Zhao, Flexible clustered multi-task learning by learning representative tasks, IEEE Trans. Pattern Anal., 38 (2015), 266–278. http://dx.doi.org/10.1109/TPAMI.2015.2452911 doi: 10.1109/TPAMI.2015.2452911
![]() |
1. | Hai-Feng Huo, Fang-Fang Cui, Hong Xiang, Dynamics of an SAITS alcoholism model on unweighted and weighted networks, 2018, 496, 03784371, 249, 10.1016/j.physa.2018.01.003 | |
2. | Shouying Huang, Fengde Chen, Yanhong Zhang, Global analysis of epidemic spreading with a general feedback mechanism on complex networks, 2019, 2019, 1687-1847, 10.1186/s13662-019-2095-3 | |
3. | Xinxin Cheng, Yi Wang, Gang Huang, Dynamics of a competing two-strain SIS epidemic model with general infection force on complex networks, 2021, 59, 14681218, 103247, 10.1016/j.nonrwa.2020.103247 | |
4. | Hong Xiang, Fang-Fang Cui, Hai-Feng Huo, Analysis of the SAITS alcoholism model on scale-free networks with demographic and nonlinear infectivity, 2019, 13, 1751-3758, 621, 10.1080/17513758.2019.1683629 | |
5. | Yucui Wu, Zhipeng Zhang, Limei Song, Chengyi Xia, Global stability analysis of two strains epidemic model with imperfect vaccination and immunity waning in a complex network, 2024, 179, 09600779, 114414, 10.1016/j.chaos.2023.114414 |