CACAPO dataset
收藏DataCite Commons2025-07-03 更新2025-04-09 收录
下载链接:
https://dataverse.nl/citation?persistentId=doi:10.34894/LIBYHP
下载链接
链接失效反馈官方服务:
资源简介:
The <b>C</b>ombinations of <b>A</b>ligned data-senten<b>C</b>es from n<b>A</b>turally <b>P</b>r<b>O</b>duced texts (hereafter: CACAPO) dataset is a dataset for data-to-text generation. The dataset contains over 20,000 sentences from automatically scraped news reports for the sports, weather, stock, and incidents domain in English and Dutch, aligned with relevant attribute-value paired data. To our knowledge, this is the first dataset based on “naturally occurring” human-written texts (i.e., texts that were not collected in a task-based setting), that covers various domains, as well as multiple languages.
提供机构:
DataverseNL
创建时间:
2022-07-14



