MultiTACRED

Name: MultiTACRED
Creator: 德国人工智能研究中心
Published: 2023-05-15 15:24:58
License: 暂无描述

arXiv2023-05-15 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2305.04582v2

下载链接

链接失效反馈

官方服务：

资源简介：

MultiTACRED数据集是TACRED数据集的多语言版本，由德国人工智能研究中心创建，包含12种语言，覆盖9个语系，总计约106,000个句子。该数据集通过机器翻译TACRED实例并自动投影实体标注创建，旨在促进多语言关系抽取的研究。数据集涵盖多种语言现象，如复合词、屈折变化和代词省略，适用于多语言模型训练和跨语言学习场景。

MultiTACRED is a multilingual variant of the TACRED dataset, developed by the German Research Center for Artificial Intelligence. It includes 12 languages spanning 9 language families, with a total of approximately 106,000 sentences. This dataset is created by machine-translating TACRED instances and automatically projecting entity annotations, with the goal of facilitating research on multilingual relation extraction. The dataset covers diverse linguistic phenomena such as compound words, inflectional changes, and pronominal ellipsis, and is applicable to multilingual model training and cross-lingual learning scenarios.

提供机构：

德国人工智能研究中心

创建时间：

2023-05-08

搜集汇总

数据集介绍