five

baobabtech/water-conflict-source-data

收藏
Hugging Face2025-11-27 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/baobabtech/water-conflict-source-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification language: - en tags: - water-conflict - setfit - multi-label size_categories: - 1K<n<10K --- # Water Conflict Training Dataset This dataset contains labeled examples for training a multi-label water conflict classifier. ## Dataset Structure ### Files - `positives.csv`: Water conflict headlines with labels (Trigger, Casualty, Weapon) - `negatives.csv`: Non-conflict news headlines (pre-balanced with hard negatives) ### Data Format Both files have consistent schema: | Column | Description | |--------|-------------| | Headline | News headline text | | Basis | For positives: comma-separated labels (Trigger, Casualty, Weapon). For negatives: empty string | | priority_sample | Boolean - For negatives: True for hard negatives (water-related peaceful news), False for ACLED. For positives: always False (not applicable) | Note: `priority_sample` exists in both files for schema consistency but only has meaning for negatives. ### Example Rows **Positive example:** ``` Headline,Basis "Water reservoir attacked in region X",Casualty ``` **Negative example:** ``` Headline,Basis "Political protest unrelated to water"," ``` ## Labels - **Trigger**: Water resource as conflict trigger - **Casualty**: Water infrastructure as casualty/target - **Weapon**: Water as weapon/tool of conflict ## Hard Negatives & Dataset Balance The negatives dataset is pre-balanced and training-ready, including: 1. **Hard Negatives (~120 examples, ~15-20% of negatives)**: Water-related peaceful news that teaches the model "water ≠ conflict". These prevent false positives where any water mention triggers conflict classification. - Water infrastructure projects (peaceful development) - Water research and technology breakthroughs - Water conservation initiatives and conferences - Environmental water management topics 2. **ACLED Negatives (~600 examples)**: General conflict news without water mentions. Sampled from full ACLED dataset for efficient training. The `priority_sample` column identifies hard negatives (True) vs regular negatives (False). This balanced composition eliminates the need for complex sampling logic during training - the dataset is ready to use as-is ## Usage ```python from datasets import load_dataset dataset = load_dataset("baobabtech/water-conflict-training-data") ``` ## Citation If you use this dataset, please cite the original ACLED data source.
提供机构:
baobabtech
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作