DS4H-ICTU/bbj-en-translation
收藏Hugging Face2024-11-14 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DS4H-ICTU/bbj-en-translation
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- bbj
- en
license: apache-2.0
tags:
- translation
datasets:
- ghomala
dataset_info:
features:
- name: translation
struct:
- name: bbj
dtype: string
- name: en
dtype: string
splits:
- name: train
num_bytes: 1666787.0848231267
num_examples: 6309
- name: validation
num_bytes: 208447.45758843666
num_examples: 789
- name: test
num_bytes: 208447.45758843666
num_examples: 789
download_size: 1238492
dataset_size: 2083682.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# Ghomala-en Translation Dataset
## Dataset Description
This is a parallel corpus for machine translation between Ghomala and en.
The dataset contains aligned sentences from the Ghomala Bible text corpus.
- **Languages**: Ghomala (bbj) → en
- **Dataset Type**: Parallel Corpus
- **Size**: 7887 parallel sentences
- **Source**: Ghomala Bible text corpus
- **License**: Apache 2.0
## Dataset Structure
- Format: Parallel text pairs
- Fields:
- source_text: Ghomala text
- target_text: en translation
- Splits:
- Train: 80%
- Validation: 10%
- Test: 10%
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("DS4H-ICTU/bbj-en-translation")
```
## Citation
```
@misc{ghomala_en_translation,
title = {ghomala-en Translation Dataset},
author = {NDE HURICH DILAN},
year = 2024,
publisher = Hugging Face
}
```
## Contact
For more information: ndedilan504@gmail.com
提供机构:
DS4H-ICTU



