five

ePiC

收藏
arXiv2022-05-17 更新2024-06-21 收录
下载链接:
https://epic-benchmark.github.io
下载链接
链接失效反馈
官方服务:
资源简介:
ePiC是一个高质量的众包数据集,专门设计用于评估抽象语言理解能力。该数据集包含2500个与英语谚语配对的叙述,每个谚语有10个叙述。数据集通过精细的标注,确保谚语与叙述之间的词汇重叠最小,迫使模型超越表面推理。ePiC数据集的创建过程涉及从网站抓取谚语候选集,并通过亚马逊Mechanical Turk收集与每个谚语相关的多样化叙述。该数据集的应用领域包括谚语推荐、叙述生成和识别具有相似主题的叙述,旨在推动更基于社会背景的自然语言处理系统的发展。

ePiC is a high-quality crowdsourced dataset specifically designed for evaluating abstract language understanding capabilities. The dataset contains 2,500 narratives paired with English proverbs, with 10 narratives per proverb. Through meticulous annotation, the dataset ensures minimal lexical overlap between proverbs and their paired narratives, compelling models to transcend surface-level reasoning. The creation process of the ePiC dataset involves crawling a candidate set of proverbs from websites, and collecting diverse narratives associated with each proverb via Amazon Mechanical Turk. The application scenarios of this dataset include proverb recommendation, narrative generation, and the identification of thematically similar narratives, with the aim of advancing the development of natural language processing systems that are more socially grounded.
提供机构:
北卡罗来纳大学教堂山分校
创建时间:
2021-09-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作