TN-SUM: A Tibetan Text Summarization Dataset
收藏科学数据银行2024-12-30 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=93cb8e2aa1034be58b098bef334a3e3e
下载链接
链接失效反馈官方服务:
资源简介:
Automatic text summarization is an important research direction in the field of natural language processing, contributing to addressing information overload and enhancing the accessibility and comprehensibility of textual data. Tibetan, as one of China's minority languages, falls under the category of low-resource languages, characterized by its unique writing system and grammatical structure. In comparison to major languages such as Chinese and English, research on Tibetan text summarization lags behind, primarily due to the absence of large-scale available datasets. To bridge this gap, we employed web scraping techniques to collect 20,000 authentic Tibetan news articles from various Tibetan news portals. Each article's headline was used as the summary, resulting in the creation of a diverse and rich Tibetan text summarization dataset, named TN-SUM. This dataset aims to cater to the needs of researchers and promote the advancement of Tibetan text summarization in the field of automatic text summarization.
提供机构:
Minzu University of China; 中央民族大学
创建时间:
2024-01-02



