Vietnamese News Dataset for Multi-task Learning on Keyword Extraction and Summarization (Version 1.0)

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/dvmw3fj5j7

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains 32,521 Vietnamese news articles curated for multi-task learning (MTL) applications in natural language processing (NLP), specifically targeting abstractive summarization and keyword extraction tasks. The dataset is structured in JSON, CSV and XLS format and contains six fields: id, title, content, summary, keywords, and topic. Each record provides: - A short title of the article. - The full news content, in cleaned raw-text form (not tokenized), ranging from 100 to 1,500 words, with an average of 662 words. - A human-written abstractive summary of the article, averaging 31 words, typically ranging from 20 to 60 words. - A list of 1 to 10 manually selected keywords, with an average of 4.2 keywords per article. - A list of one or more topics indicating the thematic domain (e.g., education, healthcare, politics...). This dataset enables benchmarking and development of multi-task models that can jointly learn summarization and keyword extraction.

创建时间：

2025-07-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集