moha/ANETAC

Name: moha/ANETAC
Creator: moha
Published: 2025-10-06 08:27:17
License: 暂无描述

Hugging Face2025-10-06 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/moha/ANETAC

下载链接

链接失效反馈

官方服务：

资源简介：

ANETAC是一个英语-阿拉伯语命名实体转写和分类数据集，由免费可用的平行翻译语料库构建而成。该数据集包含79,924个实例，每个实例都是一个三元组（e, a, c），其中e代表英文命名实体，a是其阿拉伯语转写，c是其类别，可以是人、地点或组织。ANETAC数据集主要用于阿拉伯语命名实体转写的研究，但也可用于命名实体分类。

ANETAC is an English-Arabic named entity transliteration and classification dataset that we built from freely available parallel translation corpora. The dataset contains 79,924 instances, each instance is a triplet (e, a, c), where e stands for the English named entity, a is its Arabic transliteration, and c is its class which can be a Person, a Location, or an Organization. The ANETAC dataset is mainly aimed at researchers working on Arabic named entity transliteration, but it can also be used for named entity classification purposes.

提供机构：

moha

5,000+

优质数据集

54 个

任务类型

进入经典数据集