five

Automatic Thai news summarization using deep learning

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2021.556
下载链接
链接失效反馈
官方服务:
资源简介:
Nowadays, there are a lot of textual data available on the internet, and their number is continuously growing every day. Nevertheless, the data available on internet are redundancy. Therefore, it is time-consuming and laborious to find the information for similar data manually. It is necessary to provide the better mechanism to extract the useful and significant information quickly and effectively. Text summarization is thus one of the methods that can solve such problem.This research proposed automatic Thai news summarization based on a hybrid approach for a single document. We combined both extractive and abstractive summarization approaches to improve the performance of the model. In addition, we simply augmented dataset by using the original documents of the training dataset to augment to be the input documents and summaries. The augmented economic dataset is unlabeled data. We added the unlabeled data in the training dataset because we would like to give the model to learn language model to improve the grammatical structure of the output summary. Besides, we study how the document length and word position affects the performance of the deep learning models.According to the results, we found that our proposed model obtained ROUGE-1 = 0.6456, ROUGE-2 = 0.4108, and ROUGE-L = 0.6372. The model can generate the output summary that is readable and grammatically correct. For studying the document length and word position affects the performance of the deep learning models, we found that the deep learning models can summarize a short document better than a long document. Regrading words position, the deep learning models work well in the original documents that have import words appear in the beginning of the original document.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作