HiTZ/EusParallel
收藏Hugging Face2024-10-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/EusParallel
下载链接
链接失效反馈官方服务:
资源简介:
EusParallel是一个包含英语、西班牙语和巴斯克语的多语言平行文档级语料库。巴斯克语文档由人类撰写,而英语和西班牙语文本则使用`meta-llama/Meta-Llama-3-70B-Instruct`模型从巴斯克语机器翻译而来。该语料库旨在训练高质量的机器翻译模型,能够将文档从英语和西班牙语翻译成巴斯克语。语料库中的巴斯克语文档是从[HiTZ/latxa-corpus-v1.1](HiTZ/latxa-corpus-v1.1)中随机提取的,每个文档包含10到4096个标记。翻译过程使用了`meta-llama/Meta-Llama-3-70B-Instruct`模型,并提供了详细的超参数和提示词模板。翻译计算资源为8xA100 80GB GPU,使用了vLLM推理引擎。
EusParallel is an English, Spanish, and Basque multi-parallel document-level corpus. The Basque documents have been written by humans, while the English and Spanish texts have been machine-translated from Basque using `meta-llama/Meta-Llama-3-70B-Instruct`. The corpus is intended to train high-quality machine translation models that can translate documents from English and Spanish into Basque. The Basque documents are sourced from HiTZ/latxa-corpus-v1.1, with each document containing between 10 and 4096 tokens. Translations were performed using the `meta-llama/Meta-Llama-3-70B-Instruct` model with specific prompts and hyperparameter settings. The translation process utilized 8xA100 80GB GPUs and the vLLM inference engine.
提供机构:
HiTZ



