five

AltGen: 1.3M Plausible Alternatives From Neural Text Generators

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10006412
下载链接
链接失效反馈
官方服务:
资源简介:
AltGen: 1.3M Plausible Alternatives From Neural Text Generators The AltGen dataset contains 1.3 million English texts generated by neural language generators conditioned on contexts from three corpora of acceptability judgements and two corpora of reading times.  For each corpus, each text generator, and each sampling algorithm,100 generations are sampled—for a total of 1,257,300 generations. Details about the language generators and the corpora are presented in a paper published at EMNLP 2023 (in particular, Section 4). Please cite this paper if you use any version of the dataset in your work: Mario Giulianelli, Sarenne Wallbridge, and Raquel Fernández. 2023. Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. The files are in jsonl format and include a context_id field, which allows retrieving the relevant entry from the original corpus, and the alternatives field, which contains the language model generations. Please note that the alternatives are not post-processed (see code and footnote 2 in the paper for further details). Filenames are built as follows: DecodingAlgorithm_DecodingParameter-nNumAlternatives-maxlen_MaxGenerationLength-sep_Separator.jsonl.
创建时间:
2023-10-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作