MINION
收藏arXiv2022-11-18 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2211.05958v2
下载链接
链接失效反馈官方服务:
资源简介:
MINION数据集是由俄勒冈大学计算机与信息科学系创建的大规模多语言事件检测数据集,专注于识别和分类文本中的事件提及触发词。该数据集涵盖8种语言,包括英语、西班牙语、葡萄牙语、波兰语、土耳其语、印地语、日语和韩语,其中5种语言未被现有多语言数据集支持。数据集通过使用维基百科文章进行标注,继承了ACE 2005的标注模式和指南以保证数据质量。MINION数据集旨在解决多语言环境下事件检测的挑战,支持跨语言知识转移和模型泛化研究。
The MINION Dataset is a large-scale multilingual event detection dataset created by the Department of Computer and Information Science at the University of Oregon. It focuses on identifying and classifying event mention triggers in text. This dataset covers 8 languages, including English, Spanish, Portuguese, Polish, Turkish, Hindi, Japanese, and Korean, among which 5 languages are not supported by existing multilingual datasets. The dataset is annotated using Wikipedia articles, and inherits the annotation schema and guidelines from ACE 2005 to ensure data quality. The MINION Dataset aims to address the challenges of event detection in multilingual environments, and supports research on cross-lingual knowledge transfer and model generalization.
提供机构:
俄勒冈大学计算机与信息科学系
创建时间:
2022-11-11



