Research article

A machine learning framework for QSPR modeling of drug-like compounds using graph invariants

  • Published: 28 October 2025
  • MSC : 05C10, 05C90

  • Quantitative structure property relationship (QSPR) is a computational modeling approach that correlates the chemical structure of compounds with their physicochemical or biological properties. Accurate estimation of physicochemical and other biological parameters of drug molecules is a critical factor in drug discovery. In the present work, we developed a graph-based QSPR model for molecular structures which employed molecular structural invariants as predicting features. Degree and distance topological indices were derived from molecular graphs and combined with random forest (RF), gradient boosting, and multiple line regression (MLR) for prediction of predictive performance on the diverse drug datasets. The proposed RF model obtained an approximate 18–25% improvement in $ R^2 $ and a reduction of about 30% in RMSE over the classical linear regression models, showing better generalization performance. In the context of drug screening, the model accurately predicted early physicochemical properties including molar refractivity and polarizability, rendering it a tool to assess rapidly compounds for neurological and anticancer therapeutics. In addition, the computational model was about 15 times faster on average than existing QSPR approaches, achieving its excellent efficiency and applicability. The final results demonstrated that the molecular structural invariants functioned as good descriptors for generating reliable, interpretable, and predictive QSPR models relevant to early-stage drug discovery.

    Citation: Ebraheem Alzahrani, Muhammad Farhan Hanif. A machine learning framework for QSPR modeling of drug-like compounds using graph invariants[J]. AIMS Mathematics, 2025, 10(10): 24651-24690. doi: 10.3934/math.20251093

    Related Papers:

  • Quantitative structure property relationship (QSPR) is a computational modeling approach that correlates the chemical structure of compounds with their physicochemical or biological properties. Accurate estimation of physicochemical and other biological parameters of drug molecules is a critical factor in drug discovery. In the present work, we developed a graph-based QSPR model for molecular structures which employed molecular structural invariants as predicting features. Degree and distance topological indices were derived from molecular graphs and combined with random forest (RF), gradient boosting, and multiple line regression (MLR) for prediction of predictive performance on the diverse drug datasets. The proposed RF model obtained an approximate 18–25% improvement in $ R^2 $ and a reduction of about 30% in RMSE over the classical linear regression models, showing better generalization performance. In the context of drug screening, the model accurately predicted early physicochemical properties including molar refractivity and polarizability, rendering it a tool to assess rapidly compounds for neurological and anticancer therapeutics. In addition, the computational model was about 15 times faster on average than existing QSPR approaches, achieving its excellent efficiency and applicability. The final results demonstrated that the molecular structural invariants functioned as good descriptors for generating reliable, interpretable, and predictive QSPR models relevant to early-stage drug discovery.



    加载中


    [1] R. Balakrishnan, K. Ranganathan, A textbook of graph theory, 2 Eds., Springer Science & Business Media, 2012. https://doi.org/10.1007/978-1-4614-4529-6
    [2] S. Wagner, H. Wang, Introduction to chemical graph theory, New York: Chapman and Hall/CRC, 2018. https://doi.org/10.1201/9780429450532
    [3] A. T. Balaban, O. Ivanciuc, Historical development of topological indices, In: Topological Indices and Related Descriptors in QSAR and QSPR, CRC Press, 1999, 31–68. https://doi.org/10.1201/9781482296945-8
    [4] X. Zhang, M. J. Saif, N. Idrees, S. Kanwal, S. Parveen, F. Saeed, QSPR analysis of drugs for treatment of schizophrenia using topological indices, ACS Omega, 8 (2023), 41417–41426. http://dx.doi.org/10.1021/acsomega.3c06293 doi: 10.1021/acsomega.3c06293
    [5] M. Adnan, S. A. U. H. Bokhary, G. Abbas, T. Iqbal, Degree-based topological indices and QSPR analysis of antituberculosis drugs, J. Chem., 2022, 5748626. http://dx.doi.org/10.1155/2022/5748626
    [6] X. Shi, S. Kosari, M. Ghods, N. Kheirkhahan, Innovative approaches in QSPR modelling using topological indices for the development of cancer treatments, PLoS One, 20 (2025), e0317507. http://dx.doi.org/10.1371/journal.pone.0317507 doi: 10.1371/journal.pone.0317507
    [7] M. C. Shanmukha, B. Kirana, A. Usha, K. C. Shilpa, Computational insights and study of drugs for dry eye disease through QSPR and MCDM approaches using topological indices, Sci. Rep., 15 (2025), 22245. https://doi.org/10.1038/s41598-025-04174-2 doi: 10.1038/s41598-025-04174-2
    [8] W. Ahmed, T. Ashraf, S. Zaman, K. Ali, A. Hussain, M. B. Belay, Molecular graphs and entropy based QSPR analysis of drugs by using machine learning, Discover Computing, 28 (2025), 78. https://doi.org/10.1007/s10791-025-09578-2 doi: 10.1007/s10791-025-09578-2
    [9] S. Roy, Quantitative structure property relationship study of postpartum depression medications using topological indices and regression models, Ain Shams Eng. J., 16 (2025), 103194. http://dx.doi.org/10.1016/j.asej.2024.103194 doi: 10.1016/j.asej.2024.103194
    [10] W. Ahmed, S. Zaman, Q. M. Tawhari, A. Ahmad, A. N. Koam, Molecular insight into anti-biofilm agents: Unraveling mechanisms with topological descriptors and QSPR analysis, J. Mol. Eng. Mater., 13 (2025), 1. http://dx.doi.org/10.1142/S2251237325500030 doi: 10.1142/S2251237325500030
    [11] M. Abid, K. Ali, M. I. Qureshi, H. Sultana, M. Z. S. Ahmed, Computational analysis of molecular descriptors for anti-tuberculosis drugs used in tuberculosis treatment through quantitative structureproperty relationships, Commun. Math. Biol. Neu., 2025, (2025).
    [12] D. Paul, M. Arockiaraj, D. A. Emilet, A. B. Greeni, A. A. Kalaam, Molecular descriptor-based QSPR analysis of physicochemical properties in neuromuscular drugs, Mod. Phys. Lett. B, 39 (2025), 2550155. https://doi.org/10.1142/S0217984925501556 doi: 10.1142/S0217984925501556
    [13] H. Qin, M. Rehman, M. F. Hanif, M. Y. Bhatti, M. K. Siddiqui, M. A. Fiidow, A python approach for prediction of physicochemical properties of anti-arrhythmia drugs using topological descriptors, Sci. Rep., 15 (2025), 1742. https://doi.org/10.1038/s41598-025-85352-0 doi: 10.1038/s41598-025-85352-0
    [14] J. B. Liu, X. Wang, J. Cao, The coherence and properties analysis of balanced $2^{p}$-ary tree networks, IEEE T. Netw. Sci. Eng., 11 (2024), 4719–4728. https://doi.org/10.1109/TNSE.2024.3395710 doi: 10.1109/TNSE.2024.3395710
    [15] J. B. Liu, X. Wang, L. Hua, J. Cao, L. Chen, The coherence and robustness analysis for a family of unbalanced networks, IEEE T Signal Inf. Pr., (2025). http://dx.doi.org/10.1109/TSIPN.2025.3555164
    [16] Q. Lai, Y. You, Frequency-wavelet adaptive basis network for long-term time series forecasting, Eng. Appl. Artif. Intel., 161 (2025), 112161. http://dx.doi.org/10.1016/j.engappai.2025.112161 doi: 10.1016/j.engappai.2025.112161
    [17] Q. Lai, P. Chen, Unveiling node relationships for traffic forecasting: A self-supervised approach with MixGT, Inform. Fusion, 120 (2025), 103070. http://dx.doi.org/10.1016/j.inffus.2025.103070 doi: 10.1016/j.inffus.2025.103070
    [18] Q. Lai, P. Chen, LEISN: A long explicitimplicit spatio-temporal network for traffic flow forecasting, Expert Syst. Appl., 245 (2024), 123139. http://dx.doi.org/10.1016/j.eswa.2024.123139 doi: 10.1016/j.eswa.2024.123139
    [19] R. Ismail, S. Hanif, M. F. Hanif, M. K. Siddiqui, Predictive modeling of heat of formation in titanium tetraboride through degree-based topological indices and rational curve fitting, Eur. Phys. J. Plus, 140 (2025), 849. https://doi.org/10.1140/epjp/s13360-025-06803-1 doi: 10.1140/epjp/s13360-025-06803-1
    [20] W. E. Ahmed, M. F. Hanif, E. Alzahrani, O. A. Fiidow, Predicting bone cancer drugs properties through topological indices and machine learning, Sci. Rep., 15 (2025), 31150. https://doi.org/10.1038/s41598-025-16497-1 doi: 10.1038/s41598-025-16497-1
    [21] H. Qin, A. F. Hashem, M. F. Hanif, O. A. Fiidow, Graph theoretic and machine learning approaches in molecular property prediction of bladder cancer therapeutics, Sci. Rep., 15 (2025), 28025. https://doi.org/10.1038/s41598-025-14175-w doi: 10.1038/s41598-025-14175-w
    [22] W. E. Ahmed, M. F. Hanif, M. K. Siddiqui, B. Gegbe, Advanced QSPR modeling of profens using machine learning and molecular descriptors for NSAID analysis, Sci. Rep., 15 (2025), 26356. https://doi.org/10.1038/s41598-025-09878-z doi: 10.1038/s41598-025-09878-z
    [23] L. Huang, K. H. Alhulwah, M. F. Hanif, M. K. Siddiqui, A. S. Ikram, On QSPR analysis of glaucoma drugs using machine learning with XGBoost and regression models, Comput. Biol. Med., 187 (2025), 109731. http://dx.doi.org/10.1016/j.compbiomed.2024.109731 doi: 10.1016/j.compbiomed.2024.109731
    [24] H. Qin, M. Hussain, M. F. Hanif, M. K. Siddiqui, Z. Hussain, M. A. Fiidow, On QSPR analysis of pulmonary cancer drugs using python-driven topological modeling, Sci. Rep., 15 (2025), 3965. https://doi.org/10.1038/s41598-025-88419-0 doi: 10.1038/s41598-025-88419-0
    [25] J. Wei, M. F. Hanif, H. Mahmood, M. K. Siddiqui, M. Hussain, QSPR analysis of diverse drugs using linear regression for predicting physical properties, Polycyclic Aromat. Comp., 44 (2024), 4850–4870. https://doi.org/10.1080/10406638.2023.2257848 doi: 10.1080/10406638.2023.2257848
    [26] D. Ren, C. Wang, X. Wei, Y. Zhang, S. Han, W. Xu, Harmonizing physical and deep learning modeling: A computationally efficient and interpretable approach for property prediction, Scripta Mater., 255 (2025), 116350. https://doi.org/10.1016/j.scriptamat.2024.116350 doi: 10.1016/j.scriptamat.2024.116350
    [27] L. Zhou, Z. Li, J. Yang, G. Tian, F. Liu, H. Wen, et al., Revealing drug-target interactions with computational models and algorithms, Molecules, 24 (2019), 1714. http://dx.doi.org/10.3390/molecules24091714 doi: 10.3390/molecules24091714
    [28] W. Xie, Z. Liu, D. Fang, W. Wu, S. Ma, S. Tan, et al., 3D-QSAR and molecular docking studies of aminopyrimidine derivatives as novel three-targeted inhibitors, J. Mol. Struct., 1185 (2019), 240–258. https://doi.org/10.1016/j.molstruc.2019.02.071 doi: 10.1016/j.molstruc.2019.02.071
    [29] R. Zhou, Z. Lu, H. Luo, J. Xiang, M. Zeng, M. Li, NEDD: A network embedding based method for predicting drug-disease associations, BMC Bioinformatics, 21 (2020), 387. https://doi.org/10.1186/s12859-020-03682-4 doi: 10.1186/s12859-020-03682-4
    [30] I. Gutman, O. E. Polansky, Mathematical concepts in organic chemistry, Springer Science and Business Media, 2012.
    [31] C. T. Martnez, J. A. M. Bermudez, J. M. Rodraguez, J. M. Sigarreta, Computational and analytical studies of the harmonic index in Erdnyi models, MATCH Commun. Math. Co., 85 (2021), 395–426.
    [32] B. Furtula, I. Gutman, A forgotten topological index, J. Math. Chem., 53 (2015), 1184–1190. https://doi.org/10.1007/s10910-015-0480-z doi: 10.1007/s10910-015-0480-z
    [33] W. Zhao, M. C. Shanmukha, A. Usha, M. R. Farahani, K. C. Shilpa, Computing SS index of certain dendrimers, J. Math., 2021. http://dx.doi.org/10.1155/2021/7483508
    [34] E. Estrada, L. Torres, L. Rodriguez, I. Gutman, An atom-bond connectivity index: Modelling the enthalpy of formation of alkanes, Indian J. Chem. A, 37 (1998), 849–855.
    [35] M. Randic, Characterization of molecular branching, J. Am. Chem. Soc., 97 (1975), 6609–6615. http://dx.doi.org/10.1021/ja00856a001 doi: 10.1021/ja00856a001
    [36] S. Vujoevic, G. Popivoda, A. K. Vukicevic, B. Furtula, R. krekovski, Arithmetic geometric index and its relations with geometric arithmetic index, Appl. Math. Comput., 391 (2021), 125706. http://dx.doi.org/10.1016/j.amc.2020.125706 doi: 10.1016/j.amc.2020.125706
    [37] G. V. Rajasekharaiah, U. P. Murthy, Hyper-Zagreb indices of graphs and its applications, J. Algebr. Comb. Discrete Struct. Appl., 8 (2021), 9–22.
    [38] V. R. Kulli, Nirmala index, Int. J. Math. Trends Technol.(IJMTT), 67 (2021), 8–12. https://doi.org/10.14445/22315373/IJMTT-V67I3P502 doi: 10.14445/22315373/IJMTT-V67I3P502
    [39] G. Guaiana, C. Barbui, M. Hotopf, Amitriptyline for depression, Cochrane Db. Syst. Rev., 3 (2007). hhttps://doi.org/10.1002/14651858.CD004186.pub2
    [40] L. Bertilsson, Clinical pharmacokinetics of carbamazepine, Clin. Pharmacokinet., 3 (1978), 128–143. http://dx.doi.org/10.2165/00003088-197803020-00002 doi: 10.2165/00003088-197803020-00002
    [41] D. J. Greenblatt, M. D. Allen, J. S. Harmatz, R. I. Shader, Diazepam disposition determinants, Clin. Pharmacol. Ther., 27 (1980), 301–313. https://doi.org/10.1038/clpt.1980.40 doi: 10.1038/clpt.1980.40
    [42] A. Burns, M. Rossor, J. Hecker, S. Gauthier, H. Petit, H. J. Mller, et al., The effects of donepezil in Alzheimers disease results from a multinational trial, Dement. Ger. Cogn., 10 (1999), 237–244. http://dx.doi.org/10.1159/000017133 doi: 10.1159/000017133
    [43] A. Clark, The clinical application of ergotamine (tyramine), Biochem. J., 5 (1911), 236. http://dx.doi.org/10.1042/bj0050236 doi: 10.1042/bj0050236
    [44] T. A. Glauser, A. Cnaan, S. Shinnar, D. G. Hirtz, D. Dlugos, D. Masur, et al., Ethosuximide, valproic acid, and lamotrigine in childhood absence epilepsy, New Engl. J. Med., 362 (2010), 790–799. http://dx.doi.org/10.1056/NEJMoa0902014 doi: 10.1056/NEJMoa0902014
    [45] O. Aktas, P. Kry, B. Kieseier, H. P. Hartung, Fingolimod is a potential novel therapy for multiple sclerosis, Nat. Rev. Neurol., 6 (2010), 373–382. http://dx.doi.org/10.1038/nrneurol.2010.76 doi: 10.1038/nrneurol.2010.76
    [46] L. J. Scott, K. L. Goa, Galantamine: A review of its use in Alzheimers disease, Drugs, 60 (2000), 1095–1122. https://doi.org/10.2165/00003495-200060050-00008 doi: 10.2165/00003495-200060050-00008
    [47] A. W. Peck, Clinical pharmacology of lamotrigine, Epilepsia, 32 (1991), S9–S12. https://doi.org/10.1111/j.1528-1157.1991.tb05883.x doi: 10.1111/j.1528-1157.1991.tb05883.x
    [48] J. J. Cereghino, V. Biton, B. A. Khalil, F. Dreifuss, L. J. Gauer, I. Leppik, Levetiracetam for partial seizures: Results of a double-blind, randomized clinical trial, Neurology, 55 (2000), 236–242. http://dx.doi.org/10.1212/WNL.55.2.236 doi: 10.1212/WNL.55.2.236
    [49] D. J. Greenblatt, Clinical pharmacokinetics of oxazepam and lorazepam, Clin. Pharmacokinet., 6 (1981), 89–105. https://doi.org/10.2165/00003088-198106020-00001 doi: 10.2165/00003088-198106020-00001
    [50] E. Martin, T. N. Tozer, L. B. Sheiner, S. Riegelman, The clinical pharmacokinetics of phenytoin, J. Pharmacokinet. Biop., 5 (1977), 579–596. https://doi.org/10.1007/BF01059685 doi: 10.1007/BF01059685
    [51] Parkinson study group, Pramipexole vs levodopa as initial treatment for Parkinson disease: A randomized controlled trial, JAMA, 284 (2000), 1931–1938. http://dx.doi.org/10.1001/jama.284.15.1931 doi: 10.1001/jama.284.15.1931
    [52] B. N. C. Prichard, P. M. S. Gillam, Treatment of hypertension with propranolol, Brit. Med. J., 1 (1969), 7–16. http://dx.doi.org/10.1136/bmj.1.5635.7 doi: 10.1136/bmj.1.5635.7
    [53] C. M. Spencer, S. Noble, Rivastigmine: A review of its use in Alzheimers disease, Drug. Aging, 13 (1998), 391–411. https://doi.org/10.2165/00002512-199813050-00005 doi: 10.2165/00002512-199813050-00005
    [54] C. G. Dahlof, A. M. Rapoport, F. D. Sheftell, C. R. Lines, Rizatriptan in the treatment of migraine, Clin. Ther., 21 (1999), 1823–1836. https://doi.org/10.1016/S0149-2918(00)86731-4 doi: 10.1016/S0149-2918(00)86731-4
    [55] C. H. Adler, K. D. Sethi, R. A. Hauser, T. L. Davis, J. P. Hammerstad, J. Bertoni, et al., Ropinirole for the treatment of early Parkinson's disease, Neurology, 49 (1997), 393–399. http://dx.doi.org/10.1212/WNL.49.2.393 doi: 10.1212/WNL.49.2.393
    [56] J. Birks, L. Flicker, Selegiline for Alzheimers disease, Cochrane Db. Syst. Rev., 2003. https://doi.org/10.1002/14651858.CD000442
    [57] C. M. Perry, A. Markham, Sumatriptan: An updated review of its use in migraine, Drugs, 55 (1998), 889–922. https://doi.org/10.2165/00003495-199855060-00020 doi: 10.2165/00003495-199855060-00020
    [58] M. L. Crismon, Tacrine: First drug approved for Alzheimers disease, Ann. Pharmacother., 28 (1994), 744–751. https://doi.org/10.1177/106002809402800612 doi: 10.1177/106002809402800612
    [59] A. M. Rabie, Teriflunomide: A possible effective drug for the comprehensive treatment of COVID-19, Curr. Res. Pharmacol. Drug Discov., 2 (2021), 100055. http://dx.doi.org/10.1016/j.crphar.2021.100055 doi: 10.1016/j.crphar.2021.100055
    [60] J. L. Brandes, J. R. Saper, M. Diamond, J. R. Couch, D. W. Lewis, J. Schmitt, MIGR-002 Study Group, Topiramate for migraine prevention: A randomized controlled trial, JAMA, 291 (2004), 965–973. http://dx.doi.org/10.1001/jama.291.8.965 doi: 10.1001/jama.291.8.965
    [61] J. G. Rittig, Q. H. Gao, M. Dahmen, A, Mitsos, A, M. Schweidtmann, Graph neural networks for molecular structure-property prediction, In: Machine learning and hybrid modelling for reaction engineering: Theory and applications, Royal Society of Chemistry, 26 (2023), 159–181. https://doi.org/10.1039/BK9781837670178-00159
    [62] C. Brozos, J. G. Rittig, S. Bhattacharya, E. Akanny, C. Kohlmann, A. Mitsos, Graph neural networks for surfactant multi-property prediction, Colloid. Surfaces A, 694 (2024), 134133. https://doi.org/10.1016/j.colsurfa.2024.134133 doi: 10.1016/j.colsurfa.2024.134133
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(365) PDF downloads(26) Cited by(0)

Article outline

Figures and Tables

Figures(19)  /  Tables(21)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog