ANETAC

Name: ANETAC
Creator: 阿尔及尔大学计算机科学系
Published: 2019-07-06 18:37:18
License: 暂无描述

arXiv2019-07-06 更新2024-06-21 收录

下载链接：

https://github.com/MohamedHadjAmeur/ANETAC

下载链接

链接失效反馈

官方服务：

资源简介：

ANETAC是由阿尔及尔大学计算机科学系开发的英语-阿拉伯语命名实体音译和分类数据集，包含79,924个实例，每个实例包含英语命名实体、其阿拉伯语音译及其分类（人、地点或组织）。数据集从公开的平行翻译语料库中构建，用于支持阿拉伯语命名实体音译的研究，并可用于命名实体分类。数据集分为训练、开发和测试集，适用于训练机器翻译模型、处理跨语言信息检索中的专有名词等问题。

ANETAC is an English-Arabic named entity transliteration and classification dataset developed by the Department of Computer Science, University of Algiers. It contains 79,924 instances, each consisting of an English named entity, its Arabic transliteration, and its classification (person, location, or organization). The dataset is constructed from publicly available parallel translation corpora, designed to support research on Arabic named entity transliteration, and can also be used for named entity classification tasks. Split into training, development, and test sets, it is applicable for training machine translation models and addressing issues such as proper noun processing in cross-lingual information retrieval.

提供机构：

阿尔及尔大学计算机科学系

创建时间：

2019-07-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集