five

soynade-research/Wolof-Agri-Captions

收藏
Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/soynade-research/Wolof-Agri-Captions
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: caption dtype: string - name: source dtype: string - name: wolof dtype: string splits: - name: train num_bytes: 350141084.002 num_examples: 20678 download_size: 346584304 dataset_size: 350141084.002 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-sa-4.0 language: - wo - en tags: - wolof - synthetic - agriculture - image-captioning - satellite pretty_name: AgriWolof --- # AgriWolof (Synthetic Corpus) This dataset combines agricultural and satellite images with English captions and their synthetic Wolof translations. Captions were translated using **[Oolel](https://huggingface.co/soynade-research/Oolel-v0.1)**. ## Motivation? Wolof lacks multimodal training data. This dataset attempts to address it by pairing real-world agricultural imagery with natural Wolof descriptions, enabling vision-language model training in the language. ## Methodology - **Source Dataset**: `Karuna-207/satellite_captioning_dataset` and `ButterChicken98/plantvillage-image-text-pairs` - **Translation Tool**: [Oolel-translator](https://github.com/soynade-research/oolel-translator). - **Model**: **[Oolel-v0.1.](https://huggingface.co/soynade-research/Oolel-v0.1)**, an LLM specialized for Wolof Translation Prompt: > *You are a professional translator specializing in Wolof. Your task is to translate the following image caption into natural, fluent Wolof. Rules: - Output ONLY the Wolof translation, nothing else - Do not add explanations, notes, or comments - Keep the meaning faithful to the original - Use natural everyday Wolof, not a word-for-word literal translation - If a technical term has no Wolof equivalent, keep it in its original form* ## Dataset Structure - `image`: The original high-quality English content from FineWeb. - `caption`: Original English caption. - `source`: Source dataset name. - `wolof`: Synthetic Wolof translation of the caption. ## Usage Note As a **synthetic dataset**, this is intended for research and model training. While the Oolel LLM is highly skilled in Wolof, there may still be machine translation errors. We recommend human verification for projects that require high linguistic accuracy.
提供机构:
soynade-research
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作