Extended data set for training the Stanford coreference component

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://data.mendeley.com/datasets/y8vnr5n7mk

下载链接

链接失效反馈

官方服务：

资源简介：

The extended data set for training the Stanford coreference component is merged from the following open source data sets: CoNLL-2012 https://cemantix.org/conll/2012/data.html GUM https://github.com/amir-zeldes/gum/tree/master/dep WikiCoref http://rali.iro.umontreal.ca/rali/?q=en/wikicoref Phrase Detectives Corpus 2.1.4 https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-2.1.4 Emailcoref https://github.com/paragdakle/emailcoref NP4E http://clg.wlv.ac.uk/projects/NP4E/ The layout of the archive is as follows: The top level of the archive is divided into the directories development, test and train, which contains the training, development and test sets for training the coreference component. Each of these directories is divided into data (containing the English branch of CoNLL), detectCorp (for Phrase Detectives Corpus 2.1.4), email (for Emailcoref), gum, np4e and wikipedia (for WikiCoref). The whole training process is documented on https://github.com/clarkkev/deep-coref During the process of merge these datasets were reviewed, analysed and prepared for this research in accordance with GDPR (and in accordance with Lithuania Law and Germany Law related to the GDPR and Ethics requirements of EU).

创建时间：

2022-06-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集