Chinese Event Extraction Dataset (DuEE 1.0)
收藏arXiv2025-09-30 收录
下载链接:
https://aistudio.baidu.com/aistudio/competition/detail/32/0/introduction
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为DuEE1.0,精选自百度热搜榜单,反映了大多数中国人的多样化兴趣。它包含了65个预定义的事件类型,分为训练集(12000句)、开发集(1500句)以及两个测试集(共3500句),总计17000句。在进行实验之前,该数据集需要使用斯坦福NLP工具进行预处理,包括命名实体识别(NER)、词性标注(POS)和依存句法分析(DP)。规模上,数据集包含了17000句话语(其中12000句用于训练,1500句用于开发,3500句用于测试),任务目标是事件抽取。
This dataset is named DuEE1.0. It is curated from Baidu Hot Search rankings and reflects the diverse interests of the majority of Chinese people. The dataset includes 65 predefined event types, and is split into a training set (12,000 utterances), a development set (1,500 utterances), and two test sets with a combined total of 3,500 utterances, resulting in an overall scale of 17,000 utterances. Prior to conducting experiments, preprocessing must be performed on this dataset using Stanford NLP tools, including Named Entity Recognition (NER), Part-of-Speech Tagging (POS), and Dependency Parsing (DP). The task objective of this dataset is event extraction.
提供机构:
Language and Intelligent Technology Competition
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



