five

CACAPO dataset

收藏
DataverseNL2022-08-02 更新2026-05-11 收录
下载链接:
https://dataverse.nl/citation?persistentId=doi:10.34894/LIBYHP
下载链接
链接失效反馈
官方服务:
资源简介:
The Combinations of Aligned data-sentenCes from nAturally PrOduced texts (hereafter: CACAPO) dataset is a dataset for data-to-text generation. The dataset contains over 20,000 sentences from automatically scraped news reports for the sports, weather, stock, and incidents domain in English and Dutch, aligned with relevant attribute-value paired data. To our knowledge, this is the first dataset based on “naturally occurring” human-written texts (i.e., texts that were not collected in a task-based setting), that covers various domains, as well as multiple languages.
提供机构:
Tilburg University, Tilburg School of Humanities and Digital Sciences, Department of Cognitive Science and Artificial Intelligence; Tilburg University, Tilburg School of Humanities and Digital Sciences, Department of Communication and Cognition
创建时间:
2022-01-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作