projecte-aina/ES-AN_Parallel_Corpus
收藏Hugging Face2025-07-02 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/projecte-aina/ES-AN_Parallel_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
ES-AN平行语料库是一个西班牙语-阿拉贡语的数据集,旨在支持西班牙语中资源较少的语言(如阿拉贡语)在自然语言处理任务中的应用,特别是机器翻译。该数据集由巴塞罗那超级计算中心的语言技术单元创建,主要包含通过Apertium规则翻译器生成的合成数据。数据集包括两个分别包含西班牙语和阿拉贡语文本的txt文件,以及一个包含两种语言平行文本的parquet格式文件。数据集仅包含训练集分割,适用于训练西班牙语与阿拉贡语之间的双向机器翻译模型,以及多语种机器翻译模型。
The ES-AN Parallel Corpus is a Spanish-Aragonese dataset created to support the use of under-resourced languages in Spain, such as Aragonese, in NLP tasks, specifically Machine Translation. The dataset was developed by the Language Technologies Unit at the Barcelona Supercomputing Center and consists mainly of synthetic data generated using the Apertium rule-based translator. It includes two separate text files for Spanish and Aragonese, as well as a parquet file with parallel text in both languages. The dataset contains only a training split and is suitable for training bilingual machine translation models between Spanish and Aragonese, as well as multilingual machine translation models.
提供机构:
projecte-aina



