five

Extended data set for training the Stanford coreference component

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://data.mendeley.com/datasets/y8vnr5n7mk
下载链接
链接失效反馈
官方服务:
资源简介:
The extended data set for training the Stanford coreference component is merged from the following open source data sets: CoNLL-2012 https://cemantix.org/conll/2012/data.html GUM https://github.com/amir-zeldes/gum/tree/master/dep WikiCoref http://rali.iro.umontreal.ca/rali/?q=en/wikicoref Phrase Detectives Corpus 2.1.4 https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-2.1.4 Emailcoref https://github.com/paragdakle/emailcoref NP4E http://clg.wlv.ac.uk/projects/NP4E/ The layout of the archive is as follows: The top level of the archive is divided into the directories development, test and train, which contains the training, development and test sets for training the coreference component. Each of these directories is divided into data (containing the English branch of CoNLL), detectCorp (for Phrase Detectives Corpus 2.1.4), email (for Emailcoref), gum, np4e and wikipedia (for WikiCoref). The whole training process is documented on https://github.com/clarkkev/deep-coref During the process of merge these datasets were reviewed, analysed and prepared for this research in accordance with GDPR (and in accordance with Lithuania Law and Germany Law related to the GDPR and Ethics requirements of EU).
创建时间:
2022-06-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作