ACC

Article http://dx.doi.org/10.26855/acc.2024.04.008

Mongolian Automatic Text Summarization Method Based on Pre-trained Model and Improved TextRank

TOTAL VIEWS: 362

Yongshun Han, Qintu Si*, Siriguleng Wang

College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot, Inner Mongolia, China.

*Corresponding author: Qintu Si

Published: May 16,2024

Abstract

At present, there is limited research on automatic Mongolian text summarization, especially using mainstream methods. The existing TextRank algorithm only considers the similarity between sentences, ignoring the characteristics of the sentence itself. In this paper, a Mongolian automatic text summarization method called IMNUBERT-mnTextRank, based on a pre-trained model and an enhanced TextRank algorithm, is proposed. The information from the Mongolian external knowledge base is incorporated into the TextRank algorithm in the form of sentence vectors to enhance the accuracy of similarity calculations between sentences. The process of calculating sentence weights is optimized by considering sentence features such as sentence position, similarity to the title, keyword coverage rate, and Mongolian conjunctions. Finally, the weight of each sentence is obtained through algorithm iteration. After sorting the sentences, the top two are selected for the summary. Experimental results show that, compared with the TextRank algorithm, the Rouge-1, Rouge-2, and Rouge-L indicators of the proposed method have improved by 0.183, 0.179, and 0.199, respectively. Consequently, the quality of the generated Mongolian summarization is enhanced.

References

[1] Gambhir M, Gupta V. Recent automatic text summarization techniques: A survey [J]. Artificial Intelligence Review, 2017, 47(1):1-66.

[2] Li JP, Zhang C, Chen XJ. A Survey on Automatic Text Summarization [J]. Journal of Computer Research and Development, 2021, 58(01):1-21.

[3] Mihalcea R, Tarau P. TextRank: Bringing order into texts [C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. ACL, 2004: 404-411,

[4] Wang Y X, Han B, Gao R. Automatic Extraction of Text Summarization Based on Improved TextRank [J]. Computer Applications and Software, 2021, 38(06):155-160.

[5] Li W, Yan X D, Xie X Q. An Improved TextRank for Tibetan Summarization [J], Journal of Chinese Information Processing, 2020, 34(9): 36-43. 

[6] Zhu B B, Luo F, Luo Y J. An Automatic Text Summarization Algorithm Based on Clause Extraction [J/OL] Journal of East China University of Science and Technology, 2024, 1-7.

[7] Xu F, Peng J J, Iu J. Automatic News Summarization Model Based on Multi-feature TextRank [J]. Computer Systems & Applications, 2023, 32(02):242-249. DOI:10.15888/j.cnki.csa.008913. 

[8] Li C C. Study of Mongolian automatic text summarization based on natural language understanding [D]. Beijing University of Posts and Telecommunications, 2005. 

[9] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [J]. Proceedings of NAACL-HLT 2019, pages 4171-4186. Minneapolis, Minnesota, June 2-June 7, 2019.

[10] Baxendale PB. Machine-made index for technical literature—An experiment. IBM Journal of Research and Development, 1958, 2(4): 354-361. [doi:10.1147/rd.24.0354] arXiv preprint arXiv:1810.04805, 2018.

[11] Yan XD, Wang YQ, Huang S. A dataset of Tibetan text summarization [J]. Chinese Journal of Scientific Data, 2012, 7(02):43-49. 

[12] Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. 2004. 

[13] Yang Z Q, Xu Z H, Cui Y M, et al. CINO: A Chinese minority pre-trained language model [C]//Proceedings of the 29th International Conference on Computational Linguistics (COLING). Gyeongju, Republic of Korea: International Committee on Computational Linguistics, 2022:3937-3949.

How to cite this paper

Mongolian Automatic Text Summarization Method Based on Pre-trained Model and Improved TextRank

How to cite this paper: Yongshun Han, Qintu Si, Siriguleng Wang. (2024) Mongolian Automatic Text Summarization Method Based on Pre-trained Model and Improved TextRank. Advances in Computer and Communication5(2), 141-147.

DOI: http://dx.doi.org/10.26855/acc.2024.04.008