moha/ANETAC
收藏Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/moha/ANETAC
下载链接
链接失效反馈官方服务:
资源简介:
ANETAC是一个英语-阿拉伯语命名实体转写和分类数据集,由免费可用的平行翻译语料库构建而成。该数据集包含79,924个实例,每个实例都是一个三元组(e, a, c),其中e代表英文命名实体,a是其阿拉伯语转写,c是其类别,可以是人、地点或组织。ANETAC数据集主要用于阿拉伯语命名实体转写的研究,但也可用于命名实体分类。
ANETAC is an English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e stands for the English named entity, a is its Arabic transliteration, and c is its class which can be a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed at researchers working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.
提供机构:
moha



