xri/DhofariArabicNMT
收藏Hugging Face2025-02-27 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/xri/DhofariArabicNMT
下载链接
链接失效反馈官方服务:
资源简介:
DhofariNMT是一个由现代标准阿拉伯语和Dhofari阿拉伯语组成的平行语料库,共包含8000个句子。该数据集旨在用于微调用于Dhofari阿拉伯语的神经机器翻译模型和大型语言模型。Dhofari阿拉伯语是阿曼佐法尔地区使用的阿拉伯语方言。数据集通过XRI Global开发的专有方法创建和整理,确保在数据收集时覆盖概念空间。该数据集适用于文学和叙事文本,但在技术、科学或口语等其他领域的能力有限。
DhofariNMT is a parallel dataset composed of 8,000 sentences in Modern Standard Arabic and Dhofari Arabic. It is intended for fine-tuning Neural Machine Translation models and Large Language Models for Dhofari Arabic. Dhofari Arabic is a dialect of Arabic spoken in the Dhofar region of Oman. The dataset was created and curated using a proprietary method developed by XRI Global to ensure coverage of a conceptual space during data collection. It is suitable for literary and narrative texts but less effective in other domains such as technical, scientific, or colloquial.
提供机构:
xri



