DEFT corpus
收藏数据集概述
数据集名称
DEFT corpus
数据集描述
DEFT corpus是专为复杂定义提取任务而设计的大型专家标注语料库。该数据集与SemEval 2020 Task 6(DeftEval)相关联,训练和开发数据已发布,测试数据将在2020年2月2日SemEval评估期结束后提供。数据来源于https://cnx.org上的相应教科书。
数据集版本更新
最新版本更新于2019年9月4日。
数据格式
数据采用CoNLL 2003类似的格式,具体结构如下:
TOKEN TXT_SOURCE_FILE START_CHAR END_CHAR TAG TAG_ID ROOT_ID RELATION
许可信息
数据集遵循CC BY-NC-SA 4.0许可协议,商业使用需联系作者。
引用信息
如在出版物中使用此数据集,请引用以下文献:
@inproceedings{spala-etal-2019-deft, title = "{DEFT}: A corpus for definition extraction in free- and semi-structured text", author = "Spala, Sasha and Miller, Nicholas A. and Yang, Yiming and Dernoncourt, Franck and Dockhorn, Carl", booktitle = "Proceedings of the 13th Linguistic Annotation Workshop", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W19-4015", pages = "124--131", abstract = "Definition extraction has been a popular topic in NLP research for well more than a decade, but has been historically limited to well-defined, structured, and narrow conditions. In reality, natural language is messy, and messy data requires both complex solutions and data that reflects that reality. In this paper, we present a robust English corpus and annotation schema that allows us to explore the less straightforward examples of term-definition structures in free and semi-structured text.", }




