gplsi/CA-VA_alignment_test

Name: gplsi/CA-VA_alignment_test
Creator: gplsi
Published: 2025-12-19 13:06:12
License: 暂无描述

Hugging Face2025-12-19 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/gplsi/CA-VA_alignment_test

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是从Common Voice工具中提取的20万句西班牙语句子构建的，经过严格过滤，选择具有最高语言丰富度的句子。这些句子随后由加泰罗尼亚语言学专家翻译成加泰罗尼亚语和瓦伦西亚语，确保翻译的语言质量和文化准确性。数据集旨在促进瓦伦西亚语和加泰罗尼亚语之间的机器翻译研究，支持多语言NLP研究，并促进这些语言对的翻译系统开发。数据集结构包括1,960个测试示例，每个示例有唯一的id、加泰罗尼亚语翻译和瓦伦西亚语翻译。

This dataset was built from 200,000 sentences extracted from the Common Voice tool, an open resource that collects text contributions in various languages. These sentences were subjected to a rigorous filtering process, selecting only those with the greatest linguistic richness to ensure their usefulness in applications requiring language diversity and complexity. Subsequently, the selected sentences were translated from Spanish to Catalan and Valencian by an expert in Catalan philology, ensuring the linguistic quality and cultural accuracy of the translations in both languages. The dataset is aimed at promoting the development of Machine Translation between Valencian and Catalan, supporting research in multilingual NLP and facilitating the development of translation systems for these language pairs. The test set included in the Phrases task has 1,960 examples, with columns for id, Catalan translation, and Valencian adaptation.

提供机构：

gplsi

原始信息汇总

数据集许可证信息

许可证类型: CC-BY-4.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集