soynade-research/Wolof-Agri-Captions
收藏Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/soynade-research/Wolof-Agri-Captions
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: caption
dtype: string
- name: source
dtype: string
- name: wolof
dtype: string
splits:
- name: train
num_bytes: 350141084.002
num_examples: 20678
download_size: 346584304
dataset_size: 350141084.002
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: cc-by-sa-4.0
language:
- wo
- en
tags:
- wolof
- synthetic
- agriculture
- image-captioning
- satellite
pretty_name: AgriWolof
---
# AgriWolof (Synthetic Corpus)
This dataset combines agricultural and satellite images with English captions and their synthetic Wolof translations.
Captions were translated using **[Oolel](https://huggingface.co/soynade-research/Oolel-v0.1)**.
## Motivation?
Wolof lacks multimodal training data. This dataset attempts to address it by pairing real-world agricultural imagery with natural Wolof descriptions, enabling vision-language model training in the language.
## Methodology
- **Source Dataset**: `Karuna-207/satellite_captioning_dataset` and `ButterChicken98/plantvillage-image-text-pairs`
- **Translation Tool**: [Oolel-translator](https://github.com/soynade-research/oolel-translator).
- **Model**: **[Oolel-v0.1.](https://huggingface.co/soynade-research/Oolel-v0.1)**, an LLM specialized for Wolof
Translation Prompt:
> *You are a professional translator specializing in Wolof. Your task is to translate the following image caption into natural, fluent Wolof. Rules: - Output ONLY the Wolof translation, nothing else - Do not add explanations, notes, or comments - Keep the meaning faithful to the original - Use natural everyday Wolof, not a word-for-word literal translation - If a technical term has no Wolof equivalent, keep it in its original form*
## Dataset Structure
- `image`: The original high-quality English content from FineWeb.
- `caption`: Original English caption.
- `source`: Source dataset name.
- `wolof`: Synthetic Wolof translation of the caption.
## Usage Note
As a **synthetic dataset**, this is intended for research and model training. While the Oolel LLM is highly skilled in Wolof, there may still be machine translation errors. We recommend human verification for projects that require high linguistic accuracy.
提供机构:
soynade-research



