Extended data set for training the Stanford coreference component
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://data.mendeley.com/datasets/y8vnr5n7mk
下载链接
链接失效反馈官方服务:
资源简介:
The extended data set for training the Stanford coreference component is merged from the following open source data sets:
CoNLL-2012 https://cemantix.org/conll/2012/data.html
GUM https://github.com/amir-zeldes/gum/tree/master/dep
WikiCoref http://rali.iro.umontreal.ca/rali/?q=en/wikicoref
Phrase Detectives Corpus 2.1.4 https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-2.1.4
Emailcoref https://github.com/paragdakle/emailcoref
NP4E http://clg.wlv.ac.uk/projects/NP4E/
The layout of the archive is as follows:
The top level of the archive is divided into the directories development, test and train, which contains the training, development and test sets for training the coreference component.
Each of these directories is divided into data (containing the English branch of CoNLL), detectCorp (for Phrase Detectives Corpus 2.1.4), email (for Emailcoref), gum, np4e and wikipedia (for WikiCoref).
The whole training process is documented on https://github.com/clarkkev/deep-coref
During the process of merge these datasets were reviewed, analysed and prepared for this research in accordance with GDPR (and in accordance with Lithuania Law and Germany Law related to the GDPR and Ethics requirements of EU).
创建时间:
2022-06-27



