MultiTACRED
收藏arXiv2023-05-15 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.04582v2
下载链接
链接失效反馈官方服务:
资源简介:
MultiTACRED数据集是TACRED数据集的多语言版本,由德国人工智能研究中心创建,包含12种语言,覆盖9个语系,总计约106,000个句子。该数据集通过机器翻译TACRED实例并自动投影实体标注创建,旨在促进多语言关系抽取的研究。数据集涵盖多种语言现象,如复合词、屈折变化和代词省略,适用于多语言模型训练和跨语言学习场景。
MultiTACRED is a multilingual variant of the TACRED dataset, developed by the German Research Center for Artificial Intelligence. It includes 12 languages spanning 9 language families, with a total of approximately 106,000 sentences. This dataset is created by machine-translating TACRED instances and automatically projecting entity annotations, with the goal of facilitating research on multilingual relation extraction. The dataset covers diverse linguistic phenomena such as compound words, inflectional changes, and pronominal ellipsis, and is applicable to multilingual model training and cross-lingual learning scenarios.
提供机构:
德国人工智能研究中心
创建时间:
2023-05-08
搜集汇总
数据集介绍

背景与挑战
背景概述
MultiTACRED是TACRED数据集的多语言扩展版本,由德国人工智能研究中心开发,包含12种语言、9个语系,总计约106,000个句子。它通过机器翻译和自动实体标注构建,旨在支持多语言关系抽取研究,涵盖多种语言现象,适用于多语言模型训练和跨语言学习场景。
以上内容由遇见数据集搜集并总结生成



