The AQUAINT Corpus of English News Text

Name: The AQUAINT Corpus of English News Text
Creator: OpenDataLab
Published: 2026-05-24 11:30:42
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/The_AQUAINT_Corpus_of_English_News_etc

下载链接

链接失效反馈

官方服务：

资源简介：

英语新闻文本数据集的AQUAINT语料库由英语newswire文本数据组成，主要用于文本摘要。该数据集包括来自新华社、纽约时报新闻社和美联社世界流新闻社的大约3.75亿个单词，并由语言数据联盟 (LDC) 为国家标准与技术研究所 (NIST) 的AQUAINT项目编制的官方基准评估。英语新闻文本数据集的AQUAINT语料库由2002年9月宾夕法尼亚大学发布，主要出版商是David Graff。

The AQUAINT corpus, an English news text dataset, consists of English newswire text data and is primarily used for text summarization. This dataset contains approximately 375 million words from Xinhua News Agency, The New York Times News Service, and Associated Press WorldStream News Service, and serves as an official benchmark evaluation compiled by the Linguistic Data Consortium (LDC) for the AQUAINT project of the National Institute of Standards and Technology (NIST). The AQUAINT corpus, an English news text dataset, was released by the University of Pennsylvania in September 2002, with David Graff as the primary publisher.

提供机构：

OpenDataLab

创建时间：

2023-04-20

搜集汇总

数据集介绍