CNN/DailyMail 新闻文章数据集

超神经2024-08-08 更新2024-12-14 收录

下载链接：

https://hyper.ai/cn/datasets/33251

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含 CNN 和 Daily Mail 记者撰写的 30 多万篇独特新闻文章。当前版本支持提取和抽象摘要，但原始版本是为机器阅读和理解以及抽象问答而创建的。该数据集的目的是帮助开发能够用一两句话概括长段落文本的模型，此任务对于高效呈现大量文本的信息非常有用。

This dataset contains over 300,000 unique news articles authored by journalists from CNN and the Daily Mail. While the current version supports extractive and abstractive summarization, the original iteration was developed for machine reading and comprehension as well as abstractive question answering. The goal of this dataset is to aid the development of models capable of summarizing long paragraph-level texts in one or two sentences, a task that is highly valuable for efficiently conveying information from large volumes of textual data.

创建时间：

2024-08-06

搜集汇总

数据集介绍

背景与挑战

背景概述

CNN/DailyMail新闻文章数据集包含超过30万篇CNN和Daily Mail的新闻文章，旨在支持机器阅读、问答及摘要任务，特别是开发模型用简洁语句概括长文本。数据集分为训练、验证和测试三部分，文章覆盖2007年至2015年期间，当前版本提供非匿名数据以优化摘要功能。

以上内容由遇见数据集搜集并总结生成