five

DS4H-ICTU/bbj-en-translation

收藏
Hugging Face2024-11-14 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DS4H-ICTU/bbj-en-translation
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - bbj - en license: apache-2.0 tags: - translation datasets: - ghomala dataset_info: features: - name: translation struct: - name: bbj dtype: string - name: en dtype: string splits: - name: train num_bytes: 1666787.0848231267 num_examples: 6309 - name: validation num_bytes: 208447.45758843666 num_examples: 789 - name: test num_bytes: 208447.45758843666 num_examples: 789 download_size: 1238492 dataset_size: 2083682.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Ghomala-en Translation Dataset ## Dataset Description This is a parallel corpus for machine translation between Ghomala and en. The dataset contains aligned sentences from the Ghomala Bible text corpus. - **Languages**: Ghomala (bbj) → en - **Dataset Type**: Parallel Corpus - **Size**: 7887 parallel sentences - **Source**: Ghomala Bible text corpus - **License**: Apache 2.0 ## Dataset Structure - Format: Parallel text pairs - Fields: - source_text: Ghomala text - target_text: en translation - Splits: - Train: 80% - Validation: 10% - Test: 10% ## Usage ```python from datasets import load_dataset dataset = load_dataset("DS4H-ICTU/bbj-en-translation") ``` ## Citation ``` @misc{ghomala_en_translation, title = {ghomala-en Translation Dataset}, author = {NDE HURICH DILAN}, year = 2024, publisher = Hugging Face } ``` ## Contact For more information: ndedilan504@gmail.com
提供机构:
DS4H-ICTU
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作