five

DuEE

收藏
OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/DuEE
下载链接
链接失效反馈
官方服务:
资源简介:
DuEE是用于事件提取的大型通用中文数据集。它由17,000个句子组成,其中包含65个事件类型的20,000事件和相应的人工注释参数。事件类型根据百度热搜板选择确定。65种事件类型不仅包括传统事件提取评估中常见的事件类型,如 “结婚、辞职、地震”,还包括具有鲜明时代特征的事件类型,如 “喜欢”。数据集中的句子来自百家好新闻,与传统的正式新闻相比,它们具有更自由的表达风格,这使得提取更具挑战性。数据集包含大约17,000个句子,包括训练集中的大约12,000个句子,开发集中的1,500和3,500测试集。训练集和开发集用于训练,可以自由下载。测试集分为两部分。Test1由参赛者在平台上进行自我评价,Test2将在比赛结束前一周发布,并作为最终比赛排名的测试数据。

DuEE is a large-scale general-purpose Chinese dataset for event extraction. It consists of 17,000 sentences, covering 20,000 event instances belonging to 65 event types, along with corresponding manually annotated arguments. The 65 event types are selected based on Baidu Hot Search rankings. These 65 event types not only include common ones used in traditional event extraction evaluations, such as "marriage", "resignation", and "earthquake", but also event types with distinct contemporary characteristics, like "like". The sentences in the dataset are sourced from Baijiahao News, which have a more free-form expression style compared to traditional formal news, making event extraction more challenging. The dataset contains approximately 17,000 sentences in total, with about 12,000 in the training set, 1,500 in the development set, and 3,500 in the test set. The training and development sets are available for free download for model training. The test set is divided into two parts: Test1 is used for participants' self-evaluation on the competition platform, while Test2 will be released one week before the end of the competition and acts as the test data for the final competition ranking.
提供机构:
OpenDataLab
创建时间:
2023-03-30
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
DuEE是一个用于事件提取的大型通用中文数据集,包含17,000个句子和65个事件类型,事件类型基于百度热搜板选择,涵盖传统和时代特征类型。句子来自百家好新闻,表达风格自由,增加了事件提取的挑战性。数据集分为训练集、开发集和测试集,适用于事件提取模型的训练和评估。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作