Linguistic summarisation of multiple entities in RDF graphs

Elizaveta Zimina; Kalervo Järvelin; Jaakko Peltonen; Aarne Ranta; Kostas Stefanidis; Jyrki Nummenmaa; Elizaveta Zimina; Kalervo Järvelin; Jaakko Peltonen; Aarne Ranta; Kostas Stefanidis; Jyrki Nummenmaa

doi:10.3934/aci.2024001

Applied Computing and Intelligence

2024, Volume 4, Issue 1: 1-18. doi: 10.3934/aci.2024001

Previous Article Next Article

Research article

Linguistic summarisation of multiple entities in RDF graphs

1.
Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
2.
Department of Computer Science and Engineering, University of Gothenburg, Sweden

Received: 27 November 2023 Revised: 23 January 2024 Accepted: 25 January 2024 Published: 02 February 2024

Methods for producing summaries from structured data have gained interest due to the huge volume of available data in the Web. Simultaneously, there have been advances in natural language generation from Resource Description Framework (RDF) data. However, no efforts have been made to generate natural language summaries for groups of multiple RDF entities. This paper describes the first algorithm for summarising the information of a set of RDF entities in the form of human-readable text. The paper also proposes an experimental design for the evaluation of the summaries in a human task context. Experiments were carried out comparing machine-made summaries and summaries written by humans, with and without the help of machine-made summaries. We develop criteria for evaluating the content and text quality of summaries of both types, as well as a function measuring the agreement between machine-made and human-written summaries. The experiments indicated that machine-made natural language summaries can substantially help humans in writing their own textual descriptions of entity sets within a limited time.
- entity summarisation,
- linguistic summarisation,
- linked data,
- RDF,
- text generation,
- natural language
Citation: Elizaveta Zimina, Kalervo Järvelin, Jaakko Peltonen, Aarne Ranta, Kostas Stefanidis, Jyrki Nummenmaa. Linguistic summarisation of multiple entities in RDF graphs[J]. Applied Computing and Intelligence, 2024, 4(1): 1-18. doi: 10.3934/aci.2024001

Related Papers:

Abstract

Methods for producing summaries from structured data have gained interest due to the huge volume of available data in the Web. Simultaneously, there have been advances in natural language generation from Resource Description Framework (RDF) data. However, no efforts have been made to generate natural language summaries for groups of multiple RDF entities. This paper describes the first algorithm for summarising the information of a set of RDF entities in the form of human-readable text. The paper also proposes an experimental design for the evaluation of the summaries in a human task context. Experiments were carried out comparing machine-made summaries and summaries written by humans, with and without the help of machine-made summaries. We develop criteria for evaluating the content and text quality of summaries of both types, as well as a function measuring the agreement between machine-made and human-written summaries. The experiments indicated that machine-made natural language summaries can substantially help humans in writing their own textual descriptions of entity sets within a limited time.

References

[1]	V. Christophides, V. Efthymiou, K. Stefanidis, Entity resolution in the Web of data, Synthesis lectures on the Semantic Web: theory and technology, Morgan & Claypool Publishers, 2015. https://doi.org/10.1007/978-3-031-79468-1
[2]	H. Shah, P. Fränti, Combining statistical, structural, and linguistic features for keyword extraction from web pages, Applied computing and intelligence, 2 (2022), 115–132. https://doi.org/10.3934/aci.2022007 doi: 10.3934/aci.2022007
[3]	G. Cheng, T. Tran, Y. Qu, RELIN: relatedness and informativeness-based centrality for entity summarization, The Semantic Web–ISWC 2011, The Semantic Web–ISWC 2011: 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I 10, (2011), 114–129. https://doi.org/10.1007/978-3-642-25073-6_8
[4]	A. Thalhammer, A. Rettinger, Browsing DBPedia entities with summaries, The Semantic Web: ESWC 2014 Satellite Events, (2014), 511–515. https://doi.org/10.1007/978-3-319-11955-7_76
[5]	A. Thalhammer, N. Lasierra, A. Rettinger, LinkSUM: using link analysis to summarize entity data, International Conference on Web Engineering, (2016), 244–261. https://doi.org/10.1007/978-3-319-38791-8_14
[6]	G. Cheng, D. Xu, Y. Qu, Summarizing entity descriptions for effective and efficient human-centered entity linking, Proceedings of the 24th International Conference on World Wide Web, (2015), 184–194. https://doi.org/10.1145/2736277.2741094
[7]	G. Cheng, D. Xu, Y. Qu, C3d+ p: a summarization method for interactive entity resolution, Web Semantics: Science, Services and Agents on the World Wide Web, 35 (2015), 203–213. https://doi.org/10.1016/j.websem.2015.05.004 doi: 10.1016/j.websem.2015.05.004
[8]	J. Huang, W. Hu, H. Li, Y. Qu, Automated comparative table generation for facilitating human intervention in multi-entity resolution, The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, (2018), 585–594.
[9]	K. Gunaratna, A. H. Yazdavar, K. Thirunarayan, A. Sheth, G. Cheng, Relatedness-based multi-entity summarization, Proceedings of the Twenty-national Joint Conference on Artificial Intelligence, (2017), 1060–1066. https://doi.org/10.24963/ijcai.2017/147
[10]	G. Troullinou, H. Kondylakis, K. Stefanidis, D. Plexousakis, Exploring RDFS KBs using summaries, The Semantic Web – ISWC, (2018), 268–284. https://doi.org/10.1007/978-3-030-00671-6_16
[11]	A. Aker, R. Gaizauskas, Generating descriptive multi-document summaries of geo-located entities using entity type models, J. Assoc. Inf. Sci. Tech., 66 (2015), 721–738. https://doi.org/10.1002/asi.23211 doi: 10.1002/asi.23211
[12]	H.Chen, J. Kuo, S. Huang, C. Lin, H. Wung, A summarization system for Chinese news from multiple sources, J. Am. Soc. Inf. Sci. Tech., 54 (2003), 1224–1236. https://doi.org/10.1002/asi.10315 doi: 10.1002/asi.10315
[13]	E. Baralis, L. Cagliero, S. Jabeen, A. Fiori, S. Shah, Multi-document summarization based on the Yago ontology, Expert Syst. Appl. 40 (2013), 6976–6984. https://doi.org/10.1016/j.eswa.2013.06.047
[14]	K. Gunaratna, K. Thirunarayan, A. Sheth, FACES: diversity-aware entity summarization using incremental hierarchical conceptual clustering, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, (2015), 116–122. https://doi.org/10.1609/aaai.v29i1.9180
[15]	M. Sydow, M. Pikuła, R. Schenkel, The notion of diversity in graphical entity summarisation on semantic knowledge graphs, J. Intell. Inf. Syst., 41 (2013), 109–149. https://doi.org/10.1007/s10844-013-0239-6 doi: 10.1007/s10844-013-0239-6
[16]	B. Schäfer, P. Ristoski, H. Paulheim, What is special about Bethlehem, Pennsylvania? Identifying unusual facts about DBpedia entities, Proceedings of the ISWC 2015 Posters & Demonstrations Track, 2015.
[17]	N. Yan, S. Hasani, A. Asudeh, C. Li, Generating preview tables for entity graphs, Proceedings of the 2016 International Conference on Management of Data, (2016), 1797–1811. https://doi.org/10.1145/2882903.2915221
[18]	D. Xu, G. Cheng, Y. Qu, Facilitating human intervention in coreference resolution with comparative entity summaries, The Semantic Web: Trends and Challenges, ESWC 2014, Lecture Notes in Computer Science, (2014), 535–549. https://doi.org/10.1007/978-3-319-07443-6_36
[19]	D. Wei, Y. Liu, F. Zhu, L. Zang, W. Zhou, J. Han, et al., ESA: Entity Summarization with Attention, arXiv preprint arXiv: 1905.10625, 2019.
[20]	Q. Liu, G. Cheng, Y. Qu, DeepLENS: Deep Learning for Entity Summarization, arXiv preprint arXiv: 2003.03736, 2020.
[21]	Q. Liu, Y. Chen, G. Cheng, E. Kharlamov, J. Li, Y. Qu, Entity Summarization with User Feedback, ESWC 2020: The Semantic Web, (2020), 376–392. https://doi.org/10.1007/978-3-030-49461-2_22
[22]	A. Chisholm, W. Radford, B. Hachey, Learning to generate one-sentence biographies from Wikidata, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, (2017), 633–642. https://doi.org/10.18653/v1/E17-1060
[23]	R. Lebret, D. Grangier, M. Auli, Neural Text Generation from Structured Data with Application to the Biography Domain, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, (2016), 1203–1213. https://doi.org/10.18653/v1/D16-1128
[24]	P. Vougiouklis, H. Elsahar, L. Kaffee, C. Gravier, F. Laforest, J. Hare, et al., Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples, Journal of Web Semantics, 52 (2018), 1–15. https://doi.org/10.1016/j.websem.2018.07.002 doi: 10.1016/j.websem.2018.07.002
[25]	C. Jumel, A. Louis, J. C. K. Cheung, TESA: A Task in Entity Semantic Aggregation for Abstractive Summarization, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, (2020), 8031–8050. https://doi.org/10.18653/v1/2020.emnlp-main.646
[26]	A. R. Fabbri, W. Kryściński, B. McCann, C. Xiong, R. Socher, D. Radev, SummEval: Re-evaluating Summarization Evaluation, Transactions of the Association for Computational Linguistics, 9 (2021), 391–409. https://doi.org/10.1162/tacl_a_00373 doi: 10.1162/tacl_a_00373
[27]	E. Zimina, J. Nummenmaa, K. Järvelin, J. Peltonen, K. Stefanidis, H. Hyyrö, GQA: grammatical question answering for RDF data, Semantic Web Challenges: 5th SemWebEval Challenge at ESWC, (2018), 82–97. https://doi.org/10.1007/978-3-030-00072-1_8
[28]	T. Saracevic, Measuring the degree of agreement between searchers, Proceedings of the 47th Annual Meeting of the American Society for Information Science, 21 (1984), 227–230.
[29]	M. Azmy, P. Shi, I. Ilyas, J. Lin, Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia, Proceedings of the 27th international conference on computational linguistics (2018), 2093–2103.
[30]	T. Tanon, D. Vrandečić, S. Schaffert, T. Steiner, L. Pintscher, From Freebase to Wikidata: The Great Migration, Proceedings of the 25th International Conference on World Wide Web, (2016), 1419–1428.
[31]	M. Dubey, D. Banerjee, A. Abdelkawi, J. Lehmann, LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia, International Semantic Web Conference, (2019), 69–78. https://doi.org/10.1007/978-3-030-30796-7_5
[32]	M. Damova, D. Dannélls, R. Enache, M. Mateva, A. Ranta, Multilingual Natural Language Interaction with Semantic Web Knowledge Bases and Linked Open Data, in Towards the Multilingual Semantic Web: Principles, Methods and Applications, Buitelaar, P., Cimiano, P., Eds., Springer Berlin Heidelberg, (2014), 211–226. https://doi.org/10.1007/978-3-662-43585-4_13
[33]	D. Dannélls, Multilingual text generation from structured formal representations. PhD Thesis. University of Gothenburg, 2012.

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Applied Computing and Intelligence

Metrics

Article views(2594) PDF downloads(112) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(9)

Applied Computing and Intelligence

Linguistic summarisation of multiple entities in RDF graphs

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Applied Computing and Intelligence

Linguistic summarisation of multiple entities in RDF graphs

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog