Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.
Citation: Tinghuai Ma, Hongmei Wang, Yuwei Zhao, Yuan Tian, Najla Al-Nabhan. Topic-based automatic summarization algorithm for Chinese short text[J]. Mathematical Biosciences and Engineering, 2020, 17(4): 3582-3600. doi: 10.3934/mbe.2020202
[1] | Fulian Yin, Jiahui Lv, Xiaojian Zhang, Xinyu Xia, Jianhong Wu . COVID-19 information propagation dynamics in the Chinese Sina-microblog. Mathematical Biosciences and Engineering, 2020, 17(3): 2676-2692. doi: 10.3934/mbe.2020146 |
[2] | Xinyun Zhang, Tao Ding . Co-occurrence word model for news media hotspot mining-text mining method design. Mathematical Biosciences and Engineering, 2024, 21(4): 5411-5429. doi: 10.3934/mbe.2024238 |
[3] | Fulian Yin, Xueying Shao, Jianhong Wu . Nearcasting forwarding behaviors and information propagation in Chinese Sina-Microblog. Mathematical Biosciences and Engineering, 2019, 16(5): 5380-5394. doi: 10.3934/mbe.2019268 |
[4] | Feng Li, Mingfeng Jiang, Hongzeng Xu, Yi Chen, Feng Chen, Wei Nie, Li Wang . Data governance and Gensini score automatic calculation for coronary angiography with deep-learning-based natural language extraction. Mathematical Biosciences and Engineering, 2024, 21(3): 4085-4103. doi: 10.3934/mbe.2024180 |
[5] | Wanru Du, Xiaochuan Jing, Quan Zhu, Xiaoyin Wang, Xuan Liu . A cross-modal conditional mechanism based on attention for text-video retrieval. Mathematical Biosciences and Engineering, 2023, 20(11): 20073-20092. doi: 10.3934/mbe.2023889 |
[6] | Fulian Yin, Hongyu Pang, Lingyao Zhu, Peiqi Liu, Xueying Shao, Qingyu Liu, Jianhong Wu . The role of proactive behavior on COVID-19 infordemic in the Chinese Sina-Microblog: a modeling study. Mathematical Biosciences and Engineering, 2021, 18(6): 7389-7401. doi: 10.3934/mbe.2021365 |
[7] | Ruirui Han, Zhichang Zhang, Hao Wei, Deyue Yin . Chinese medical event detection based on event frequency distribution ratio and document consistency. Mathematical Biosciences and Engineering, 2023, 20(6): 11063-11080. doi: 10.3934/mbe.2023489 |
[8] | Chaofan Li, Qiong Liu, Kai Ma . DCCL: Dual-channel hybrid neural network combined with self-attention for text classification. Mathematical Biosciences and Engineering, 2023, 20(2): 1981-1992. doi: 10.3934/mbe.2023091 |
[9] | Jia Yu, Huiling Peng, Guoqiang Wang, Nianfeng Shi . A topical VAEGAN-IHMM approach for automatic story segmentation. Mathematical Biosciences and Engineering, 2024, 21(7): 6608-6630. doi: 10.3934/mbe.2024289 |
[10] | Yuqi Chen, Xianyong Li, Weikai Zhou, Yajun Du, Yongquan Fan, Dong Huang, Xiaoliang Chen . A hot topic diffusion approach based on the independent cascade model and trending search lists in online social networks. Mathematical Biosciences and Engineering, 2023, 20(6): 11260-11280. doi: 10.3934/mbe.2023499 |
Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.
Globally, there has been an increase in the search for “clean, socially acceptable methods of generating power” [1,2]. This is because of two interrelated factors: (1) the global requirement for electricity, which is expected to increase in the next 15 years, and (2) the promise by different countries to reduce their CO2 emissions in the same time frame.[8] Some of the efforts undertaken by countries to increase the share of renewable energy in their energy consumption include the setting of feed-in tariffs (FITs). While FITs are targeted at wind and solar energies, the generation of electricity from waves, tidal currents and tides has received renewed interest. [3]
There are different patterns of ocean energy: tidal and currents, waves, salinity, and thermal gradients [4,5]. Tidal energy from waves can be converted into electrical energy [6]. The generation of tidal energy depends on the tide altitude and tide velocity; higher tide altitudes and velocities will lead to higher electricity generation [7]. The kinetic and potential energy associated with ocean waves can be used at onshore or offshore sites using different wave-energy converter technologies [5,6].
The outflow rate of water contained in the chamber was then measured by an accurate and correctly calibrated velocity gauge. The experimental results were then applied, indicating a chamber design and geometry, which provides optimal energy output.
The experimental setup consisted of an open channel of dimension 18 × 0.5 × 0.4 m3 (Figure 1), equipped with a controllable wave generator placed at one end of the channel (Figure 2). An artificial seashore protected from returning waves was situated at the other end. An oscillating water column (OWC) was made from acrylic within an iron frame.
Suitably sealed material was used to prevent air and water leakage through the iron frame. As indicated, a tube was mounted at the side of the chamber to guide the air exiting the chamber toward the intake of the turbine.
All experimental readings were taken at the steady state factors. Water velocity readings were registered by a velocity gauge with a digital manometer (both instruments were accurately calibrated in the fluid mechanics laboratory).
The wave generator is based on Airy’s linear wave theory [Airy-1845].
The free-surface elevation can be defined as follows:
η=Acos(kx−wt+φ) |
(1) |
The power of the outputted air is calculated from the following relationship:
Pw=(P+12ρV2)VA |
(2) |
where Pw is the water pressure at the inside of the channel, r is the water density, V is the internal flowing water velocity, and A is the area of the internal water model.
The water surface elevation varies at different location and are different among the front wall, center and rear wall of OWC in a wave period. The wave height in the chamber is greater than the incident wave amplitude as shown in Figures 4(a) and 4(h). The maximum height of the wave amplitude in the chamber is shown in Figures 4(b) and 6(e). The lowest run-down wave amplitude in the chamber is shown in Figures 4(c) and 4(g). The water level in the chamber wall begins to rise, as shown in Figure 4(d). One can estimate wave energy from wave height, which can be computed from wave peak, figure 4(f).
As clearly shown in Figure 5,when the frequency increases, the energy remains the same.
Table.1 can be further deduced that the inclined wave height may be adjusted to augment the wave energy, and thus the overall efficiency of an OWC, but wave height = 0.03 m, 0.05 m are small because incident wave be prevented outside of OWC. Table.2 can be further deduced that the inclined wave frequency may be adjusted to augment the wave energy, and thus the overall efficiency of an OWC.
Wave frequency = 0.78(sec), depth = 0.25(m) | |||||
Wave height (m) | 0.01 | 0.03 | 0.05 | 0.07 | 0.09 |
Average power (W) | 0.000092 | 0.2015034219 | 0.000059 | 0.000281 | 0.000763 |
Wave height = 0.05(sec), depth = 0.25(m) | |||||
Wave frequency(sec) | 0.58 | 0.66 | 0.78 | 1.01 | 1.74 |
Average power(W) | 0.000152 | 0.000332 | 0.000052 | 0.000083 | 0.000021 |
The use of renewable energy is becoming indispensable to support people's way of life, lower emissions of greenhouse gases, and decrease the consumption of natural resources. The water flow in the OWC chamber was determined using pressure and velocity measurements. The water velocities in the chamber during the upward motion of water as a result of wave impact in the column are always larger than those during the downward motion. The data acquired from the video sequences has been used in conjunction with an energy model of the OWC to lend useful insight into its performance. Of course, wave-energy technologies are still to a lesser or greater degree experience. However, the potential for improvement of the wave-energy conversion technologies is large. The survivability and the reliability of many devices, especially for offshore operation, are yet to be demonstrated. Average power is height when wave frequency is small but wave height is height.
Authors state no conflicts of interest.
[1] |
S. L. Lo, R. Chiong, D. Cornforth, An unsupervised multilingual approach for online social media topic identification, Expert Syst. Appl., 81 (2017), 282-298. doi: 10.1016/j.eswa.2017.03.029
![]() |
[2] |
J. F. Yeh, Y. S. Tan, C. H. Lee, Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation, Neurocomputing, 216 (2016), 310-318. doi: 10.1016/j.neucom.2016.08.017
![]() |
[3] | J. Christensen, Mausam, S. Soderland, O. Etzioni, Towards coherent multi-document summarization, Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2013, 1163-1173. Available from: https://www.aclweb.org/anthology/N13-1136/. |
[4] |
E. Lloret, M. Palomar, Towards automatic tweet generation: A comparative study from the text summarization perspective in the journalism genre, Expert Syst. Appl., 40 (2013), 6624-6630. doi: 10.1016/j.eswa.2013.06.021
![]() |
[5] |
G. Yang, D. Wen, Kinshuk, N. S. Chen, E. Sutinen, A novel contextual topic model for multidocument summarization, Expert Syst. Appl., 42 (2015), 1340-1352. doi: 10.1016/j.eswa.2014.09.015
![]() |
[6] | I. Mani, M. T. Maybury, Advances in Automatic Text Summarization, (MITRE Corporation) Cambridge, The MIT Press, (1999). |
[7] | J. M. Torres-Moreno, Automatic Text Summarization, John Wiley and Sons, 2014. |
[8] | A. Nenkova, K. McKeown, A survey of text summarization techniques, Min. Text Data, 2012 (2012), 43-76. |
[9] |
T. Ma, Y. Zhao, H. Zhou, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Natural disaster topic extraction in sina microblogging based on graph analysis, Expert Syst. Appl., 115 (2019), 346-355. doi: 10.1016/j.eswa.2018.08.010
![]() |
[10] |
T. Ma, Q. Liu, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, LGIEM: Global and local node influence based community detection, Future Gener. Comput. Syst., 105 (2020), 533-546. doi: 10.1016/j.future.2019.12.022
![]() |
[11] | T. Ma, H. Rong, Y. Hao, J. Cao, Y. Tian, M. A. Al-Rodhaan, A Novel Sentiment Polarity Detection Framework for Chinese, IEEE Trans. Affective Comput., 2019. |
[12] |
A. Kazantseva, S. Szpakowicz, Summarizing short stories, Comput. Linguist., 36 (2010), 71-109. doi: 10.1162/coli.2010.36.1.36102
![]() |
[13] | M. T. Khan, M. Durrani, S. Khalid, F. Aziz, Online knowledge-based model for big data topic extraction, Comput. Intell. Neurosci., 2016 (2016), 1-10. |
[14] | Indra, E. Winarko, R. Pulungan, Trending topics detection of Indonesian tweets using BN-grams and Doc-p, J. King Saud Univ. Comput. Inf. Sci., 31 (2019), 266-274. |
[15] |
W. M. Wang, Z. Li, J. W. Wang, Z. H. Zheng, How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds, Expert Syst. Appl., 90 (2017), 439-463. doi: 10.1016/j.eswa.2017.08.040
![]() |
[16] |
M. Moradi, N. Ghadiri, Different approaches for identifying important concepts in probabilistic biomedical text summarization, Artif. Intell. Med., 84 (2018), 101-116. doi: 10.1016/j.artmed.2017.11.004
![]() |
[17] | R. Yan, L. Kong, C. Huang, X. Wan, X. Li, Y. Zhang, Timeline generation through evolutionary trans-temporal summarization, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011,433-443. Available from: https://www.aclweb.org/anthology/D11-1040/. |
[18] |
W. Liu, X. Luo, J. Zhang, R. Xue, R. Xu, Semantic summary automatic generation in news event, Concurrency Comput. Pract. Exp., 29 (2017), e4287. doi: 10.1002/cpe.4287
![]() |
[19] |
D. Zhou, D. Zhong, A semi-supervised learning framework for biomedical event extraction based on hidden topics, Artif. Intell. Med., 64 (2015), 51-58. doi: 10.1016/j.artmed.2015.03.004
![]() |
[20] | W. Xiong, D. Litman, Empirical analysis of exploiting review helpfulness for extractive summarization of online reviews, In Proceedings of coling 2014, the 25th international conference on computational linguistics: Technical papers, 2014, 1985-1995. Available from: https://www.aclweb.org/anthology/C14-1187/. |
[21] |
Z. Wu, L. Lei, G. Li, H. Huang, C. Zheng, E. Chen, et al., A topic modeling based approach to novel document automatic summarization, Expert Syst. Appl., 84 (2017), 12-23. doi: 10.1016/j.eswa.2017.04.054
![]() |
[22] | A. Barrera, R. Verma, Combining syntax and semantics for automatic extractive single-document summarization, In International Conference on Intelligent Text Processing and Computational Linguistics, 2012,366-377. Available from: https://link.springer.com/chapter/10.1007/978-3-642-28601-8_31. |
[23] | F. Barrios, F. López, L. Argerich, R. Wachenchauzer, Variations of the similarity function of textrank for automated summarization, preprint, arXiv1602.03606, 2016. |
[24] |
C. Fang, D. Mu, Z. Deng, Z. Wu, Word-sentence co-ranking for automatic extractive text summarization, Expert Syst. Appl., 72 (2017), 189-195. doi: 10.1016/j.eswa.2016.12.021
![]() |
[25] |
M. Schinas, S. Papadopoulos, Y. Kompatsiaris, P. A. Mitkas, Mgraph: Multimodal event summarization in social media using topic models and graph-based ranking, Int. J. Multimedia Inf. Retr., 5 (2016), 51-69. doi: 10.1007/s13735-015-0089-9
![]() |
[26] |
F. Ye, X. Xu, Automatic multi-document summarization based on keyword density and sentenceword graphs, J. Shanghai Jiaotong Univ. Sci., 23 (2018), 584-592. doi: 10.1007/s12204-018-1957-2
![]() |
[27] |
W. Xie, F. Zhu, J. Jiang, E. P. Lim, K. Wang, Topicsketch: Real-time bursty topic detection from twitter, IEEE Trans. Knowl. Data Eng., 28 (2016), 2216-2229. doi: 10.1109/TKDE.2016.2556661
![]() |
[28] | X. Yang, P. Jin, X. Chen, The construction of a kind of chat corpus in chinese word segmentation, In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015,168-172. Available from: https://ieeexplore.ieee.org/document/7397448. |
[29] | D. Yan, E. Hua, B. Hu, An improved single-pass algorithm for chinese microblog topic detection and tracking, In 2016 IEEE International Congress on Big Data (BigData Congress), 2016,251-258. Available from: https://ieeexplore.ieee.org/abstract/document/7584945. |
[30] | C. C. Birant, O. Aktas, Rule-based turkish text summarizer (RB-TTS), Adv. Electr. Comput. Eng., 18 (2018), 113-119. |
[31] |
A. Abdi, N. Idris, R. M. Alguliev, R. M. Aliguliyev, Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems, Inf. Process. Manage., 51 (2015), 340-358. doi: 10.1016/j.ipm.2015.02.001
![]() |
[32] |
H. Rong, T. Ma, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Deep Rolling: A Novel Emotion Prediction Model for a Multi-Participant Communication Context, Inf. Sci., 488 (2019), 158-180. doi: 10.1016/j.ins.2019.03.023
![]() |
1. | Ting-Huai Ma, Xin Yu, Huan Rong, A comprehensive transfer news headline generation method based on semantic prototype transduction, 2022, 20, 1551-0018, 1195, 10.3934/mbe.2023055 | |
2. | Senqi Yang, Xuliang Duan, Xi Wang, Dezhao Tang, Zeyan Xiao, Yan Guo, Extractive text summarization model based on advantage actor-critic and graph matrix methodology, 2022, 20, 1551-0018, 1488, 10.3934/mbe.2023067 | |
3. | Kangjie Cao, Weijun Cheng, Yiya Hao, Yichao Gan, Ruihuan Gao, Junxu Zhu, Jinyao Wu, DMSeqNet-mBART: A state-of-the-art Adaptive-DropMessage enhanced mBART architecture for superior Chinese short news text summarization, 2024, 257, 09574174, 125095, 10.1016/j.eswa.2024.125095 | |
4. | Lina Hou, 2024, Algorithm for Automatic Abstract Generation of Russian Text Under ChatGpt System, 979-8-3503-1860-9, 1, 10.1109/ICDCECE60827.2024.10548397 |
Wave frequency = 0.78(sec), depth = 0.25(m) | |||||
Wave height (m) | 0.01 | 0.03 | 0.05 | 0.07 | 0.09 |
Average power (W) | 0.000092 | 0.2015034219 | 0.000059 | 0.000281 | 0.000763 |
Wave height = 0.05(sec), depth = 0.25(m) | |||||
Wave frequency(sec) | 0.58 | 0.66 | 0.78 | 1.01 | 1.74 |
Average power(W) | 0.000152 | 0.000332 | 0.000052 | 0.000083 | 0.000021 |