Yorùbá Bible (Bíbélì Mímó ní Èdè Yorùbá Òde-Òní)
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/UBC-NLP/africaNLP2021
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个平行语料库,由约鲁巴语圣经和英语新国际版(NIV)圣经按章节对应组织而成。数据集还包括了预处理步骤,如分词和字节对编码(BPE),以解决数据稀疏性问题,并已分为训练集、验证集和测试集三个部分。该数据集的任务是用于机器翻译。
This dataset is a parallel corpus composed of the Yoruba Bible and the English New International Version (NIV) Bible, organized on a chapter-by-chapter basis. It also includes preprocessing steps such as tokenization and Byte-Pair Encoding (BPE) to address data sparsity issues, and has been split into three subsets: training set, validation set, and test set. This dataset is designed for machine translation tasks.
提供机构:
Biblica



