LAMBADA
收藏arXiv2016-06-20 更新2024-06-21 收录
下载链接:
http://clic.cimec.unitn.it/lambada/
下载链接
链接失效反馈官方服务:
资源简介:
LAMBADA数据集由CIMeC - 脑/脑科学中心创建,旨在通过单词预测任务评估计算模型对文本理解的能力。该数据集包含10,022个叙事段落,每个段落的特点是人类受试者能够通过整个段落猜测最后一个单词,而仅看到最后一个句子则无法猜测。LAMBADA数据集展示了广泛的语义现象,并作为挑战性的测试集,鼓励开发能够真正理解自然语言文本中广泛上下文的新模型。数据集来源于Book Corpus,包含未发表的小说,用于训练和测试模型,以评估其在广泛上下文理解方面的能力。
The LAMBADA dataset was developed by the CIMeC - Center for Mind/Brain Sciences, with the goal of evaluating computational models' ability to comprehend text through word prediction tasks. This dataset contains 10,022 narrative passages, each defined by the scenario where human participants can only correctly guess the final word after reading the entire passage, but fail to do so when only presented with the final sentence. The LAMBADA dataset covers a broad spectrum of semantic phenomena and acts as a challenging testbed, motivating the development of novel models that can genuinely understand extensive contextual information within natural language texts. The dataset is derived from the Book Corpus, which comprises unpublished novels, and is utilized for training and testing models to assess their capacity for comprehensive context understanding.
提供机构:
CIMeC - 脑/脑科学中心
创建时间:
2016-06-20



