CACAPO dataset

Name: CACAPO dataset
Creator: DataverseNL
Published: 2025-07-03 00:40:07
License: 暂无描述

DataCite Commons2025-07-03 更新2025-04-09 收录

下载链接：

https://dataverse.nl/citation?persistentId=doi:10.34894/LIBYHP

下载链接

链接失效反馈

官方服务：

资源简介：

The Combinations of Aligned data-sentenCes from nAturally PrOduced texts (hereafter: CACAPO) dataset is a dataset for data-to-text generation. The dataset contains over 20,000 sentences from automatically scraped news reports for the sports, weather, stock, and incidents domain in English and Dutch, aligned with relevant attribute-value paired data. To our knowledge, this is the first dataset based on “naturally occurring” human-written texts (i.e., texts that were not collected in a task-based setting), that covers various domains, as well as multiple languages.

提供机构：

DataverseNL

创建时间：

2022-07-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集