five

vinod-anbalagan/indian-agri-advice-multilingual

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/vinod-anbalagan/indian-agri-advice-multilingual
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: [] language: - en - hi - mr - bn - ta - te - gu - ml - pa - ur - kn - or language_creators: [] license: - cc-by-4.0 multilinguality: - multilingual pretty_name: "Indian Agricultural Advisory Dataset (Multilingual)" size_categories: - 10K<n<100K tags: - adaption - instruction-tuning - agriculture - multilingual - low-resource - global-south - icar - agro-climatic-zones - kisan-call-centre - india task_categories: - question-answering task_ids: - open-domain-qa --- ![banner](https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/0e04ece4-de84-4d22-8986-47f894083c1b.png) This dataset is a remastered version prepared using [Adaption's](https://adaptionlabs.ai/app/auth) Adaptive Data platform. --- # Indian Agricultural Advisory Dataset — Multilingual **18,707 rows | 12 languages | 15 agro-climatic zones | 13 categories | 17 crops** A multilingual agricultural advisory dataset covering all of India's 15 Planning Commission agro-climatic zones, localized to 12 Indian languages. Built from ICAR extension knowledge, India's agro-climatic zone framework, and the Handbook of Agriculture in India. Adapted using [Adaption's Adaptive Data platform](https://adaptionlabs.ai). - **Companion dataset →** [Tamil Agricultural Advisory (Grade A, 9.4/10)](https://huggingface.co/datasets/vinod-anbalagan/tamil-agri-advisory-qa) - **GitHub →** [VinodAnbalagan/tamil-agri-dataset-](https://github.com/VinodAnbalagan/tamil-agri-dataset-) - **Substack →** [The Meta Gradient](https://vinodanbalagan.substack.com) - **Built for** the Adaption Labs Uncharted Data Challenge 2026 --- ## Why This Dataset Exists India has 150 million farming households across 15 distinct agro-climatic zones. A wheat farmer in Punjab faces completely different conditions than a rice farmer in Bengal or a bajra farmer in the Thar desert. Yet most agricultural AI systems give the same generic advice regardless of zone, soil, or season. This dataset was born from a single insight discovered while building the [Tamil Agricultural Advisory Dataset](https://huggingface.co/datasets/vinod-anbalagan/tamil-agri-advisory-qa): **metadata specificity, not row count, drives dataset quality.** When every row carries real agro-ecological context — the specific zone, soil type, irrigation method, and season — AI systems can give advice that actually fits the farmer's conditions. We applied the same framework to all of India: 15 zones, 17 crops, 13 categories, mapped from India's Planning Commission agro-climatic zone data. Then localized to 12 Indian languages to reach over a billion speakers. --- ### Domain - Agriculture (100%) --- ## Languages | Language | Rows | Share | |----------|------|-------| | Marathi | 2,414 | 12.9% | | Bengali | 2,177 | 11.6% | | Gujarati | 2,066 | 11.0% | | Tamil | 2,034 | 10.9% | | Telugu | 2,007 | 10.7% | | Punjabi | 1,974 | 10.6% | | Malayalam | 1,957 | 10.5% | | English | 1,800 | 9.6% | | Hindi | 1,683 | 9.0% | | Urdu | 277 | 1.5% | | Kannada | 170 | 0.9% | | Odia | 148 | 0.8% | --- ### Tone - Practical (75%) - Informative (25%) --- ## Agro-Climatic Zones (15) | Zone | States | Key Crops | Rows | |------|--------|-----------|------| | Western Himalayan | J&K, Himachal, Uttarakhand | Rice, maize, wheat, potato, apple | 1,321 | | Eastern Himalayan | Sikkim, NE states, Tripura | Rice, tea, maize, potato, orange | 1,299 | | Lower Gangetic Plains | West Bengal, Eastern Bihar | Rice, jute, potato, mango, banana | 1,242 | | Middle Gangetic Plains | Eastern UP, Bihar | Rice, wheat, sugarcane, potato | 1,352 | | Upper Gangetic Plains | Central & Western UP | Wheat, sugarcane, rice, potato, mango | 1,335 | | Trans-Gangetic Plains | Punjab, Haryana, Delhi | Wheat, rice, cotton, sugarcane | 1,317 | | Eastern Plateau & Hills | Jharkhand, Chhattisgarh, W. Odisha | Rice, groundnut, ragi, soybean | 1,414 | | Central Plateau & Hills | MP, Rajasthan, UP (Bundelkhand) | Soybean, wheat, gram, cotton | 1,257 | | Western Plateau & Hills | Maharashtra (Deccan), S. MP | Jowar, cotton, sugarcane, groundnut | 1,286 | | Southern Plateau & Hills | Karnataka, TN (interior), AP | Rice, ragi, groundnut, cotton, coconut | 1,373 | | East Coast Plains & Hills | Coastal AP, Odisha, TN | Rice, groundnut, sugarcane, banana | 1,350 | | West Coast Plains & Ghats | Kerala, coastal Karnataka, Goa | Rice, coconut, arecanut, rubber, pepper | 1,362 | | Gujarat Plains & Hills | Gujarat | Groundnut, cotton, rice, wheat, bajra | 1,396 | | Western Dry Region | Rajasthan (Thar) | Bajra, jowar, moth, guar, wheat | 1,299 | | All (mental health) | All India | Crisis routing | 104 | --- ## Categories (13) | Category | Rows | Description | |----------|------|-------------| | `government_schemes` | 1,606 | PM-KISAN, PMFBY, KCC, subsidies | | `market_price` | 1,591 | MSP, e-NAM, APMC, direct selling | | `fertilizer` | 1,585 | NPK dosages, organic inputs, micronutrients | | `weather_advisory` | 1,576 | Drought, flood, cyclone, frost response | | `irrigation` | 1,574 | Water management, drip, sprinkler, canal | | `soil_health` | 1,568 | pH, salinity, organic matter, soil testing | | `crop_management` | 1,567 | Intercropping, rotation, spacing, weed control | | `pest_control` | 1,564 | Pest identification and ICAR-grounded management | | `harvest_timing` | 1,517 | When to harvest, post-harvest storage, drying | | `crop_disease` | 1,515 | Disease diagnosis and treatment | | `financial_support` | 1,513 | Crop insurance, loan relief, drought compensation | | `variety_selection` | 1,427 | ICAR-recommended varieties by zone and season | | `mental_health_safety` | 104 | Crisis routing — Kisan Call Centre 1551, iCall 9152987821 | --- ## Answer Structure (5-Part ICAR Format) 1. **Situation Assessment** — Acknowledge the farmer's specific zone, crop, soil, and season 2. **Immediate Action** — Exact dosage, timing, cost in rupees 3. **Rationale** — Why this fits this specific agro-climatic zone 4. **Long-term Prevention** — Sustainable practice for future seasons 5. **KVK Referral** — Contact nearest Krishi Vigyan Kendra --- ## Schema (17 Columns) | Column | Description | |--------|-------------| | `id` | Unique row ID | | `question` | Context-tagged farmer question | | `answer` | 5-part ICAR advisory answer | | `enhanced_prompt` | Adaption-enriched prompt | | `enhanced_completion` | Adaption-enriched advisory (avg 2,731 chars) | | `reasoning_trace` | Chain-of-thought reasoning | | `category` | Topic (13 categories) | | `crop_primary` | Primary crop (17 crops) | | `soil_type` | Soil classification | | `irrigation_type` | Irrigation method | | `farming_practice` | Conventional / organic / integrated | | `region` | Agro-climatic zone (15 zones) | | `season` | Kharif / Rabi | | `growth_stage` | Crop growth stage | | `severity` | Low / medium / high / urgent | | `source_type` | Provenance | | `reasoning_type` | Cognitive pattern | --- ## Data Sources | Source | What It Grounded | |--------|------------------| | Agro-Climatic Zones of India (Planning Commission) | Zone-crop-soil-season mappings for all 15 zones | | Handbook of Agriculture in India (Oxford, 2007) | National crop agronomy, varieties, dosages | | Handbook on General Agriculture (ANGRAU) | Crop science, soil science, pest management | | ICAR-CRIDA District Contingency Plans | Drought and disaster management for 32 districts | --- ## Key Design Principles - **Distribution by design** — 14 rows per category, 12 rows per zone in the base dataset, balanced before adaptation - **Metadata is not decorative** — 99.4% fill rate on all context columns; every row grounded in real agro-ecological data - **Zone-specific, not generic** — the same pest control question gets different answers in different zones because soil, rainfall, and irrigation differ - **Mental health safety** — crisis helpline routing in every language (Kisan Call Centre 1551, iCall 9152987821) - **Lesson from Tamil** — this dataset was built after discovering that metadata specificity drives quality scores more than row count or answer length --- ## How This Dataset Was Built 1. **Zone-crop mapping** — extracted valid crop-zone-soil-season combinations from India's 15 agro-climatic zones 2. **Question generation** — 169 base questions with rotating category assignments ensuring each zone gets different question types 3. **Answer expansion** — Cohere command-r-plus generated 5-part ICAR advisory answers grounded in zone-specific context 4. **Adaptation** — Adaption's Adaptive Data platform enriched prompts, completions, and added reasoning traces 5. **Localization** — platform translated and localized to 12 Indian languages, expanding 169 → 18,707 rows --- ### Evaluation Results **Quality Gains:** <img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/a7a7be60-152c-4f30-b299-eb46fb47cb86.png" alt="QualityGains" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" /> **Grade Improvement:** <img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/76b235b4-5720-443a-a62e-6925e8288898.png" alt="Grade" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" /> **Percentile Chart:** <img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/2d74c174-d013-4c02-8d72-23ddeb3b821d.png" alt="Percentile Chart" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" /> --- ## Companion Dataset This dataset was built using the same framework that produced the **Tamil Agricultural Advisory Dataset** — which scored **Grade A (9.4/10)** on Adaption's platform after 10 iterative submissions. The key insight — that metadata specificity matters more than row count — was discovered during the Tamil work and applied from day one to this India-wide dataset. - [Tamil Agricultural Advisory Dataset](https://huggingface.co/datasets/vinod-anbalagan/tamil-agri-advisory-qa) — 187 rows, Grade A, 9.4/10, Tamil language --- ## Intended Uses - Training multilingual agricultural advisory chatbots for Indian farmers - Building voice-based advisory systems (WhatsApp, IVR) in regional languages - Evaluating multilingual NLP performance on domain-specific, low-resource Indian language tasks - Research into context-aware AI for the Global South - Fine-tuning models for zone-specific agricultural advice across India --- ## Citation ```bibtex @dataset{anbalagan2026india_agri, title={Indian Agricultural Advisory Dataset (Multilingual)}, author={Anbalagan, Vinod}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/datasets/vinod-anbalagan/indian-agri-advice-multilingual}, license={CC BY 4.0} } ``` --- Built by **Vinod Anbalagan** — AI/ML researcher, Toronto. Created as part of the **Adaption Labs Uncharted Data Challenge 2026**. Dataset adapted using **Adaption's Adaptive Data Platform**. Research documented on [The Meta Gradient](https://vinodanbalagan.substack.com). *India has 15 agro-climatic zones, 22 official languages, and 150 million farming households. They all deserve AI that speaks their language and knows their soil.*
提供机构:
vinod-anbalagan
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作