five

Vietnamese News Dataset for Multi-task Learning on Keyword Extraction and Summarization (Version 1.0)

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/dvmw3fj5j7
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 32,521 Vietnamese news articles curated for multi-task learning (MTL) applications in natural language processing (NLP), specifically targeting abstractive summarization and keyword extraction tasks. The dataset is structured in JSON, CSV and XLS format and contains six fields: id, title, content, summary, keywords, and topic. Each record provides: - A short title of the article. - The full news content, in cleaned raw-text form (not tokenized), ranging from 100 to 1,500 words, with an average of 662 words. - A human-written abstractive summary of the article, averaging 31 words, typically ranging from 20 to 60 words. - A list of 1 to 10 manually selected keywords, with an average of 4.2 keywords per article. - A list of one or more topics indicating the thematic domain (e.g., education, healthcare, politics...). This dataset enables benchmarking and development of multi-task models that can jointly learn summarization and keyword extraction.
创建时间:
2025-07-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作