five

moha/ANETAC

收藏
Hugging Face2025-10-06 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/moha/ANETAC
下载链接
链接失效反馈
官方服务:
资源简介:
ANETAC是一个英语-阿拉伯语命名实体转写和分类数据集,由免费可用的平行翻译语料库构建而成。该数据集包含79,924个实例,每个实例都是一个三元组(e, a, c),其中e代表英文命名实体,a是其阿拉伯语转写,c是其类别,可以是人、地点或组织。ANETAC数据集主要用于阿拉伯语命名实体转写的研究,但也可用于命名实体分类。

ANETAC is an English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e stands for the English named entity, a is its Arabic transliteration, and c is its class which can be a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed at researchers working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.
提供机构:
moha
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作