MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion

Hanming Zhai; Xiaojun Lv; Zhiwen Hou; Xin Tong; Fanliang Bu; Hanming Zhai; Xiaojun Lv; Zhiwen Hou; Xin Tong; Fanliang Bu

doi:10.3934/mbe.2023630

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 8: 14096-14116. doi: 10.3934/mbe.2023630

Previous Article Next Article

Research article Special Issues

MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion

1.
School of Information Network Security, People's Public Security University of China, Beijing 100038, China
2.
Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China

Received: 22 April 2023 Revised: 30 May 2023 Accepted: 04 June 2023 Published: 25 June 2023

With the rise of multi-modal methods, multi-modal knowledge graphs have become a better choice for storing human knowledge. However, knowledge graphs often suffer from the problem of incompleteness due to the infinite and constantly updating nature of knowledge, and thus the task of knowledge graph completion has been proposed. Existing multi-modal knowledge graph completion methods mostly rely on either embedding-based representations or graph neural networks, and there is still room for improvement in terms of interpretability and the ability to handle multi-hop tasks. Therefore, we propose a new method for multi-modal knowledge graph completion. Our method aims to learn multi-level graph structural features to fully explore hidden relationships within the knowledge graph and to improve reasoning accuracy. Specifically, we first use a Transformer architecture to separately learn about data representations for both the image and text modalities. Then, with the help of multimodal gating units, we filter out irrelevant information and perform feature fusion to obtain a unified encoding of knowledge representations. Furthermore, we extract multi-level path features using a width-adjustable sliding window and learn about structural feature information in the knowledge graph using graph convolutional operations. Finally, we use a scoring function to evaluate the probability of the truthfulness of encoded triplets and to complete the prediction task. To demonstrate the effectiveness of the model, we conduct experiments on two publicly available datasets, FB15K-237-IMG and WN18-IMG, and achieve improvements of 1.8 and 0.7%, respectively, in the Hits@1 metric.
- knowledge graph completion,
- multi-modal knowledge graph,
- link prediction,
- multi-modal feature fusion,
- graph neural network,
- transformer
Citation: Hanming Zhai, Xiaojun Lv, Zhiwen Hou, Xin Tong, Fanliang Bu. MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion[J]. Mathematical Biosciences and Engineering, 2023, 20(8): 14096-14116. doi: 10.3934/mbe.2023630

Related Papers:

Abstract

With the rise of multi-modal methods, multi-modal knowledge graphs have become a better choice for storing human knowledge. However, knowledge graphs often suffer from the problem of incompleteness due to the infinite and constantly updating nature of knowledge, and thus the task of knowledge graph completion has been proposed. Existing multi-modal knowledge graph completion methods mostly rely on either embedding-based representations or graph neural networks, and there is still room for improvement in terms of interpretability and the ability to handle multi-hop tasks. Therefore, we propose a new method for multi-modal knowledge graph completion. Our method aims to learn multi-level graph structural features to fully explore hidden relationships within the knowledge graph and to improve reasoning accuracy. Specifically, we first use a Transformer architecture to separately learn about data representations for both the image and text modalities. Then, with the help of multimodal gating units, we filter out irrelevant information and perform feature fusion to obtain a unified encoding of knowledge representations. Furthermore, we extract multi-level path features using a width-adjustable sliding window and learn about structural feature information in the knowledge graph using graph convolutional operations. Finally, we use a scoring function to evaluate the probability of the truthfulness of encoded triplets and to complete the prediction task. To demonstrate the effectiveness of the model, we conduct experiments on two publicly available datasets, FB15K-237-IMG and WN18-IMG, and achieve improvements of 1.8 and 0.7%, respectively, in the Hits@1 metric.

References

[1]	A. Shoeibi, N. Ghassemi, M. Khodatars, P. Moridian, A. Khosravi, A. Zare, et al., Automatic diagnosis of schizophrenia and attention deficit hyperactivity disorder in rs-fmri modality using convolutional autoencoder model and interval type-2 fuzzy regression, Cognit. Neurodyn., (2022), 1–23. https://doi.org/10.1007/s11571-022-09897-w doi: 10.1007/s11571-022-09897-w
[2]	A. Shoeibi, M. Khodatars, M. Jafari, N. Ghassemi, P. Moridian, R. Alizadesani, et al., Diagnosis of brain diseases in fusion of neuroimaging modalities using deep learning: A review, Inf. Fusion, 2022. https://doi.org/10.1016/j.inffus.2022.12.010 doi: 10.1016/j.inffus.2022.12.010
[3]	A. Shoeibi, M. Rezaei, N. Ghassemi, Z. Namadchian, A. Zare, J. M. Gorriz, Automatic diagnosis of schizophrenia in eeg signals using functional connectivity features and cnn-lstm model, in Artificial Intelligence in Neuroscience: Affective Analysis and Health Applications: 9th International Work-Conference on the Interplay Between Natural and Artificial Computation, (2022), 63–73. https://doi.org/10.1007/978-3-031-06242-1_7
[4]	P. Moridian, N. Ghassemi, M. Jafari, S. Salloum-Asfar, D. Sadeghi, M. Khodatars, et al., Automatic autism spectrum disorder detection using artificial intelligence methods with mri neuroimaging: A review, Front. Mol. Neurosci., 15 (2022), 999605. https://doi.org/10.3389/fnmol.2022.999605 doi: 10.3389/fnmol.2022.999605
[5]	M. Khodatars, A. Shoeibi, D. Sadeghi, N. Ghaasemi, M. Jafari, P. Moridian, et al., Deep learning for neuroimaging-based diagnosis and rehabilitation of autism spectrum disorder: a review, Comput. Biol. Med., 139 (2021), 104949. https://doi.org/10.1016/j.compbiomed.2021.104949 doi: 10.1016/j.compbiomed.2021.104949
[6]	S. Wang, Z. Chen, S. Du, Z. Lin, Learning deep sparse regularizers with applications to multi-view clustering and semi-supervised classification, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2021), 5042–5055. https://doi.gor/10.1109/TPAMI.2021.3082632 doi: 10.1109/TPAMI.2021.3082632
[7]	S. Du, Z. Liu, Z. Chen, W. Yang, S. Wang, Differentiable bi-sparse multi-view co-clustering, IEEE Trans. Signal Process., 69 (2021), 4623–4636. https://doi.org/10.1109/TSP.2021.3101979 doi: 10.1109/TSP.2021.3101979
[8]	Z. Chen, L. Fu, J. Yao, W. Guo, C. Plant, S. Wang, Learnable graph convolutional network and feature fusion for multi-view learning, Inf. Fusion, 95 (2023), 109–119. https://doi.org/10.1016/j.inffus.2023.02.013 doi: 10.1016/j.inffus.2023.02.013
[9]	Z. Fang, S. Du, X. Lin, J. Yang, S. Wang, Y. Shi, Dbo-net: Differentiable bi-level optimization network for multi-view clustering, Inf. Sci., 626 (2023), 572–585. https://doi.org/10.1016/j.ins.2023.01.071 doi: 10.1016/j.ins.2023.01.071
[10]	S. Xiao, S. Du, Z. Chen, Y. Zhang, S. Wang, Dual fusion-propagation graph neural network for multi-view clustering, IEEE Trans. Multimedia, 2023. https://doi.org/10.1109/TMM.2023.3248173 doi: 10.1109/TMM.2023.3248173
[11]	K. Liang, Y. Liu, S. Zhou, X. Liu, W. Tu, Relational symmetry based knowledge graph contrastive learning, preprint, arXiv: 2211.10738. https://doi.org/10.48550/arXiv.2211.10738
[12]	S. Di, Q. Yao, Y. Zhang, L. Chen, Efficient relation-aware scoring function search for knowledge graph embedding, in 2021 IEEE 37th International Conference on Data Engineering (ICDE), IEEE, (2021), 1104–1115. https://doi.org/10.1109/ICDE51399.2021.00100
[13]	Y. Zhang, Q. Yao, W. Dai, L. Chen, Autosf: Searching scoring functions for knowledge graph embedding, in 2020 IEEE 36th International Conference on Data Engineering (ICDE), IEEE, (2020), 433–444. https://doi.org/10.1109/ICDE48307.2020.00044
[14]	P. Pezeshkpour, L. Chen, S. Singh, Embedding multimodal relational data for knowledge base completion, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018), 3208–3218. https://doi.org/10.18653/v1/D18-1359
[15]	Y. Zhao, X. Cai, Y. Wu, H. Zhang, Y. Zhang, G. Zhao, et al., Mose: Modality split and ensemble for multimodal knowledge graph completion, preprint, arXiv: 2210.08821. https://doi.org/10.48550/arXiv.2210.08821
[16]	S. Zheng, W. Wang, J. Qu, H. Yin, W. Chen, L. Zhao, Mmkgr: Multi-hop multi-modal knowledge graph reasoning, preprint, arXiv: 2209.01416. https://doi.org/10.48550/arXiv.2209.01416
[17]	Z. Cao, Q. Xu, Z. Yang, Y. He, X. Cao, Q. Huang, Otkge: Multi-modal knowledge graph embeddings via optimal transport, Adv. Neural Inf. Process. Syst., 35 (2022), 39090–39102.
[18]	S. Liang, A. Zhu, J. Zhang, J. Shao, Hyper-node relational graph attention network for multi-modal knowledge graph completion, ACM Trans. Multimedia Comput. Commun. Appl., 19 (2023), 1–21. https://doi.org/10.1145/3545573 doi: 10.1145/3545573
[19]	L. A. Galárraga, C. Teflioudi, K. Hose, F. Suchanek, Amie: association rule mining under incomplete evidence in ontological knowledge bases, in Proceedings of the 22nd international conference on World Wide Web, (2013), 413–422. https://doi.org/10.1145/2488388.2488425
[20]	P. G. Omran, K. Wang, Z. Wang, An embedding-based approach to rule learning in knowledge graphs, IEEE Trans. Knowl. Data Eng., 33 (2019), 1348–1359. https://doi.org/10.1109/TKDE.2019.2941685 doi: 10.1109/TKDE.2019.2941685
[21]	F. Yang, Z. Yang, W. W. Cohen, Differentiable learning of logical rules for knowledge base reasoning, Adv. Neural Inf. Process. Syst., 30 (2017).
[22]	A. Neelakantan, B. Roth, A. McCallum, Compositional vector space models for knowledge base completion, preprint, arXiv: 1504.06662. https://doi.org/10.48550/arXiv.1504.06662
[23]	W. Chen, W. Xiong, X. Yan, W. Wang, Variational knowledge graph reasoning, preprint, arXiv: 1803.06581. https://doi.org/10.48550/arXiv.1803.06581
[24]	X. V. Lin, C. Xiong, R. Socher, Multi-hop knowledge graph reasoning with reward shaping, preprint, arXiv: 1808.10568. https://doi.org/10.48550/arXiv.1808.10568
[25]	W. Xiong, T. Hoang, W. Y. Wang, {D}eep{P}ath: A reinforcement learning method for knowledge graph reasoning, in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Copenhagen, Denmark, (2017), 564–573. https://doi.org/10.18653/v1/D17-1060
[26]	A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational data, Adv. Neural Inf. Process. Syst., 26 (2013).
[27]	Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in Proceedings of the AAAI Conference on Artificial Intelligence, 29 (2015). https://doi.org/10.1609/aaai.v29i1.9491
[28]	Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating on hyperplanes, in Proceedings of the AAAI Conference on Artificial Intelligence, 28 (2014). https://doi.org/10.1609/aaai.v28i1.8870
[29]	S. Amin, S. Varanasi, K. A. Dunfield, G. Neumann, Lowfer: Low-rank bilinear pooling for link prediction, in International Conference on Machine Learning, PMLR, (2020), 257–268.
[30]	I. Balažević, C. Allen, T. Hospedales, Tucker: Tensor factorization for knowledge graph completion, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (2019), 5185–5194. https://doi.org/10.18653/v1/D19-1522
[31]	M. Nickel, V. Tresp, H. P. Kriegel, A three-way model for collective learning on multi-relational data, in Icml, 11 (2011), 3104482–3104584.
[32]	R. Socher, D. Chen, C. D. Manning, A. Ng, Reasoning with neural tensor networks for knowledge base completion, Adv. Neural Inf. Process. Syst., 26 (2013).
[33]	T. Dettmers, P. Minervini, P. Stenetorp, S. Riedel, Convolutional 2d knowledge graph embeddings, in Proceedings of the AAAI Conference on Artificial Intelligence, 32 (2018). https://doi.org/10.1609/aaai.v32i1.11573
[34]	S. Vashishth, S. Sanyal, V. Nitin, N. Agrawal, P. Talukdar, Interacte: Improving convolution-based knowledge graph embeddings by increasing feature interactions, in Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 3009–3016. https://doi.org/10.1609/aaai.v34i03.5694
[35]	M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, M. Welling, Modeling relational data with graph convolutional networks, in The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, Springer, (2018), 593–607. https://doi.org/10.1007/978-3-319-93417-4_38
[36]	C. Shang, Y. Tang, J. Huang, J. Bi, X. He, B. Zhou, End-to-end structure-aware convolutional networks for knowledge base completion, in Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 3060–3067. https://doi.org/10.1609/aaai.v33i01.33013060
[37]	Z. Zhu, Z. Zhang, L. P. Xhonneux, J. Tang, Neural bellman-ford networks: A general graph neural network framework for link prediction, Adv. Neural Inf. Process. Syst., 34 (2021), 29476–29490.
[38]	Y. Zhang, Q. Yao, Knowledge graph reasoning with relational digraph, in Proceedings of the ACM Web Conference 2022, (2022), 912–924. https://doi.org/10.1145/3485447.3512008
[39]	L. H. Li, M. Yatskar, D. Yin, C. J. Hsieh, K. W. Chang, Visualbert: A simple and performant baseline for vision and language, preprint, arXiv: 1908.03557. https://doi.org/10.48550/arXiv.1908.03557
[40]	A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, et al., Learning transferable visual models from natural language supervision, in International Conference on Machine Learning, PMLR, (2021), 8748–8763.
[41]	Z. Y. Dou, Y. Xu, Z. Gan, J. Wang, S. Wang, L. Wang, et al., An empirical study of training end-to-end vision-and-language transformers, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 18166–18176. https://doi.org/10.48550/arXiv.2111.02387
[42]	M. Yasunaga, A. Bosselut, H. Ren, X. Zhang, C. D. Manning, P. S. Liang, et al., Deep bidirectional language-knowledge graph pretraining, Adv. Neural Inf. Process. Syst., 35 (2022), 37309–37323.
[43]	X. Pan, T. Ye, D. Han, S. Song, G. Huang, Contrastive language-image pre-training with knowledge graphs, preprint, arXiv: 2210.08901. https://doi.org/10.48550/arXiv.2210.08901
[44]	R. Xie, Z. Liu, H. Luan, M. Sun, Image-embodied knowledge representation learning, in Proceedings of the 26th International Joint Conference on Artificial Intelligence, preprint, arXiv: 1609.07028. https://doi.org/10.48550/arXiv.1609.07028
[45]	Z. Wang, L. Li, Q. Li, D. Zeng, Multimodal data enhanced representation learning for knowledge graphs, in 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, (2019), 1–8. https://doi.org/10.1109/IJCNN.2019.8852079
[46]	W. Wilcke, P. Bloem, V. de Boer, R. van t Veer, F. van Harmelen, End-to-end entity classification on multimodal knowledge graphs, preprint, arXiv: 2003.12383. https://doi.org/10.48550/arXiv.2003.12383
[47]	N. Zhang, L. Li, X. Chen, X. Liang, S. Deng, H. Chen, Multimodal analogical reasoning over knowledge graphs, preprint, arXiv: 2210.00312. https://doi.org/10.48550/arXiv.2210.00312
[48]	D. Xu, T. Xu, S. Wu, J. Zhou, E. Chen, Relation-enhanced negative sampling for multimodal knowledge graph completion, in Proceedings of the 30th ACM International Conference on Multimedia, (2022), 3857–3866. https://doi.org/10.1145/3503161.3548388
[49]	A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
[50]	J. Devlin, M. W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, preprint, arXiv: 1810.04805. https://doi.org/10.48550/arXiv.1810.04805
[51]	G. Jawahar, B. Sagot, D. Seddah, What does bert learn about the structure of language?, in ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, 2019.
[52]	X. Chen, N. Zhang, L. Li, S. Deng, C. Tan, C. Xu, et al., Hybrid transformer with multi-level fusion for multimodal knowledge graph completion, in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2022), 904–915. https://doi.org/10.1145/3477495.3531992
[53]	B. Yang, W. Yih, X. He, J. Gao, L. Deng, Embedding entities and relations for learning and inference in knowledge bases, preprint, arXiv: 1412.6575. https://doi.org/10.48550/arXiv.1412.6575
[54]	T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, G. Bouchard, Complex embeddings for simple link prediction, in International Conference on Machine Learning, PMLR, (2016), 2071–2080.
[55]	Z. Sun, Z. H. Deng, J. Y. Nie, J. Tang, Rotate: Knowledge graph embedding by relational rotation in complex space, preprint, arXiv: 1902.10197. https://doi.org/10.48550/arXiv.1902.10197
[56]	H. Mousselly-Sergieh, T. Botschen, I. Gurevych, S. Roth, A multimodal translation-based approach for knowledge graph representation learning, in Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, (2018), 225–234. https://doi.org/10.18653/v1/S18-2027
[57]	M. Wang, S. Wang, H. Yang, Z. Zhang, X. Chen, G. Qi, Is visual context really helpful for knowledge graph? a representation learning perspective, in Proceedings of the 29th ACM International Conference on Multimedia, (2021), 2735–2743. https://doi.org/10.1145/3474085.3475470

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)