five

TOTTO

收藏
arXiv2020-10-06 更新2024-06-21 收录
下载链接:
https://github.com/google-research-datasets/totto
下载链接
链接失效反馈
官方服务:
资源简介:
TOTTO是一个由谷歌研究院创建的开放领域英文表格到文本生成数据集,包含超过120,000个训练实例。该数据集旨在解决控制生成任务,即给定维基百科表格和一组高亮显示的表格单元,生成单句描述。数据集的创建过程涉及注释者直接修订维基百科中的现有候选句子,以确保生成的目标既自然又忠实于源表格。TOTTO适用于高精度条件文本生成,覆盖广泛的领域,且目标完全忠实于源数据。该数据集特别适用于研究模型在控制环境下的文本生成能力,以及解决模型生成文本时的忠实度问题。

TOTTO is an open-domain English table-to-text generation dataset created by Google Research, which contains over 120,000 training instances. This dataset is designed to address controlled generation tasks, i.e., generating a single-sentence description given a Wikipedia table and a set of highlighted table cells. The dataset creation process involves annotators directly revising existing candidate sentences from Wikipedia to ensure that the generated target texts are both natural and faithful to the source table. TOTTO is suitable for high-precision conditional text generation, spans a wide range of domains, and its targets are fully faithful to the source data. This dataset is particularly valuable for researching models' text generation capabilities in controlled environments, as well as tackling the faithfulness issues in model-generated text.
提供机构:
谷歌研究院,纽约,纽约州
创建时间:
2020-04-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作