five

ANETAC

收藏
arXiv2019-07-06 更新2024-06-21 收录
下载链接:
https://github.com/MohamedHadjAmeur/ANETAC
下载链接
链接失效反馈
官方服务:
资源简介:
ANETAC是由阿尔及尔大学计算机科学系开发的英语-阿拉伯语命名实体音译和分类数据集,包含79,924个实例,每个实例包含英语命名实体、其阿拉伯语音译及其分类(人、地点或组织)。数据集从公开的平行翻译语料库中构建,用于支持阿拉伯语命名实体音译的研究,并可用于命名实体分类。数据集分为训练、开发和测试集,适用于训练机器翻译模型、处理跨语言信息检索中的专有名词等问题。

ANETAC is an English-Arabic named entity transliteration and classification dataset developed by the Department of Computer Science, University of Algiers. It contains 79,924 instances, each consisting of an English named entity, its Arabic transliteration, and its classification (person, location, or organization). The dataset is constructed from publicly available parallel translation corpora, designed to support research on Arabic named entity transliteration, and can also be used for named entity classification tasks. Split into training, development, and test sets, it is applicable for training machine translation models and addressing issues such as proper noun processing in cross-lingual information retrieval.
提供机构:
阿尔及尔大学计算机科学系
创建时间:
2019-07-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作