RISeC Dataset
收藏paperswithcode.com2025-01-21 收录
下载链接:
https://paperswithcode.com/dataset/risec
下载链接
链接失效反馈官方服务:
资源简介:
We propose a newly annotated dataset for information extraction on recipes. Unlike previous approaches to machine comprehension of procedural texts, we avoid a priori pre-defining domain-specific predicates to recognize (e.g., the primitive instructionsin MILK) and focus on basic understanding of the expressed semantics rather than directly reduce them to a simplified state representation.
We thus frame the semantic comprehension of procedural text such as recipes, as fairly generic NLP subtasks, covering (i) entity recognition (ingredients, tools and actions), (ii) relation extraction (what ingredients and tools are involved in the actions), and (iii) zero anaphora resolution (link actions to implicit arguments, e.g., results from previous recipe steps).
Further, our Recipe Instruction Semantic Corpus (RISeC) dataset includes textual descriptions for the zero anaphora, to facilitate language generation thereof. Besides the dataset itself, we contribute a pipeline neural architecture that addresses entity and relation extractionas well an identification of zero anaphora.
本研究提出了一种针对食谱信息提取的新标注数据集。与以往对程序性文本的机器理解方法不同,我们避免预先定义领域特定的谓词以识别(例如,MILK中的基本指令)的做法,而是聚焦于对所表达语义的基本理解,而非直接将其简化为状态表示。因此,我们将程序性文本如食谱的语义理解框架化为相当通用的自然语言处理子任务,涵盖以下三个方面:(i)实体识别(成分、工具和动作),(ii)关系抽取(哪些成分和工具参与动作),以及(iii)零样本指代消解(将动作与隐含论元相联系,例如,将前一步骤的结果与动作相联系)。此外,我们的食谱指令语义语料库(RISeC)包括零样本指代的文本描述,以促进语言生成。除了数据集本身,我们还贡献了一个处理实体和关系抽取以及零样本指代识别的神经网络架构。
提供机构:
Papers with Code



