sohamjune/PlantVillageVQA
收藏Hugging Face2026-03-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sohamjune/PlantVillageVQA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- visual-question-answering
- image-text-to-text
language: en
size_categories: 100K<n<1M
tags:
- plant-science
- agriculture
- vqa
- plant-pathology
---
# PlantVillageVQA
Associated paper: [arXiv:2508.17117](https://huggingface.co/papers/2508.17117)
GitHub repository (Currently Private): [SyedNazmusSakib/PlantVillageVQA](https://github.com/SyedNazmusSakib/PlantVillageVQA)
**PlantVillageVQA** is a multimodal dataset for **visual question answering (VQA)** in plant pathology. It contains **193,609** question–answer (QA) items paired with **55,448** leaf images spanning **14 crops** and **38 diseases**. Questions are organized into **nine categories** and three cognitive levels: perception/identification, symptom grounding/verification, and higher‑order reasoning (diagnosis, causality, counterfactuals).
Images originate from the open **PlantVillage** corpus and are redistributed here with QA supervision in a **flat image layout**. We release the dataset under **CC BY 4.0** to maximize reuse.
The dataset card follows Hugging Face guidance for dataset documentation and responsible use.
## Summary
- **Images:** 55,448 JPEGs, flat naming: `images/image_000001.jpg` … `image_055448.jpg`.
- **Annotations:** 193,609 QA pairs in **CSV** (`PlantVillageVQA.csv`) and **JSON** (`PlantVillageVQA.json`) with identical schema.
- **Fields:** `image_id`, `question_type`, `question`, `answer`, `image_path`, `split`.
- **License:** **CC BY 4.0** (Creative Commons Attribution 4.0 International).
- **Primary deposit (authoritative):** **Mendeley Data** **DOI:** `10.17632/XXXXXX.1`.
Mirrors: **Hugging Face Hub** (this page) and **Kaggle Datasets**. Each mirror cites the DOI and provides SHA‑256 checksums for verification. Persistent DOIs and formal data citation align with Nature/*Scientific Data* recommendations.
## Supported Tasks and Benchmarks
- **Visual Question Answering (VQA):** binary, short, and descriptive answers across nine categories.
- **Symptom grounding:** text–vision alignment for canonical symptom phrases.
- **Open diagnosis:** free‑text disease naming from visual evidence.
- **Causal and counterfactual reasoning:** pathogen attribution and healthy‑state contrast.
Baseline experiments (CLIP, LXMERT, FLAVA) show models exceed chance on binary tasks and capture key terms in descriptive answers, yet retain headroom on reasoning‑heavy categories (details in the paper).
## Languages
- **English** prompts and answers.
## Dataset Structure
### Files
```
images/ # 55,448 JPEGs, flat naming
PlantVillageVQA.csv # 193,609 rows
PlantVillageVQA.json # JSON mirror
```
### Data Fields
| Field | Type | Description |
|----------------|--------|-------------|
| `image_id` | string | Stable identifier of the image. |
| `question_type`| string | One of nine categories (see below). |
| `question` | string | Natural‑language question. |
| `answer` | string | Ground‑truth answer (binary or descriptive). Counterfactual answers are template‑constrained and symptom‑grounded. |
| `image_path` | string | Relative path to image, e.g., `images/image_000123.jpg`. |
| `split` | string | Recommended partition: `train`, `val`, or `test`. |
### Splits
We provide split tags in the annotations. Users may re‑split for specific studies, but should report the split policy for comparability.
## Question Taxonomy
### Level 1 — Perception / Identification
1. **Existence & Sanity Check** — on‑task image/leaf present.
2. **Plant Species Identification** — host recognition.
3. **General Health Assessment** — healthy vs. diseased triage.
### Level 2 — Symptom Grounding / Verification
4. **Visual Attribute Grounding** — canonical symptom phrases ↔ visual evidence.
5. **Detailed Verification** — compositional check: crop + disease.
### Level 3 — Reasoning / Inference
6. **Specific Disease Identification** — open diagnosis.
7. **Comprehensive Description** — holistic expert‑style summary.
8. **Causal Reasoning** — cause/pathogen attribution.
9. **Counterfactual Reasoning** — healthy contrast; remove disease‑specific symptoms.
## Creation Process and Curation
We used a multi‑stage pipeline:
1. **Programmatic QA synthesis** from structured labels.
2. **Linguistic diversification** via expert‑phrased templates; no free‑form LLM generation.
3. **Expert pathology review (Phase 1):** logical fit, medical relevance, clarity.
4. **Deterministic correction of counterfactuals:** disease→symptom phrase map; regenerate only when label provenance is verifiable.
5. **Strict fix‑or‑delete policy:** **9,981** counterfactual pairs corrected and kept; **17,261** unverifiable pairs deleted.
6. **Automated screening (Phase 2):** flagged low‑information and low Q–A‑similarity pairs; sample checks showed high acceptance.
7. **Final release:** **193,609** QA pairs across **55,448** images.
Avoiding unconstrained LLM generation minimizes hallucination risk and keeps changes auditable.
## Citation
Please cite both the dataset and the PlantVillage source corpus:
```bibtex
@article{sakib2025plantvillagevqa,
title={PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science},
author={Sakib, Syed Nazmus and Haque, Nafiul and Hossain, Mohammad Zabed and Arman, Shifat E},
journal={arXiv preprint arXiv:2508.17117},
year={2025}
}
```
## License
**CC BY 4.0 Universal (Creative Commons Attribution 4.0 International).**
提供机构:
sohamjune



