five

naidukr/opentriage-dataset

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/naidukr/opentriage-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_ids: - text-classification - text-generation tags: - disaster-response - ticket-triage - emergency-management - knowledge-distillation license: cc-by-4.0 --- # OpenTriage Dataset A comprehensive dataset for disaster ticket triage and emergency response classification. This dataset contains labeled tickets from various disaster scenarios including earthquakes, hurricanes, floods, and other emergency situations. ## Dataset Description The OpenTriage Dataset is designed to train and evaluate models for automated ticket classification in emergency response systems. It includes real-world inspired tickets with multiple classification labels and emergency-related information. ### Dataset Features - **Disaster Ticket Classification**: Multi-label classification of emergency tickets - **Student Training Data**: Educational examples for model distillation - **Real-World Scenarios**: Data inspired by actual emergency response situations - **Comprehensive Annotations**: Rich metadata and classification labels ## Dataset Structure ### Main Datasets 1. **large_triage_dataset.json** (18.7 MB) - Primary dataset with comprehensive ticket triage records - Largest dataset with extensive examples - Full feature set and detailed annotations 2. **large_triage_training_data.json** (4.6 MB) - Training-focused subset optimized for model training - Balanced class distribution - Cleaned and processed entries 3. **large_student_training_data.json** (4.2 MB) - Student training data for knowledge distillation - Simplified annotations for learning - Ideal for distilled model training ### Sample & Reference Data 4. **sample_triage_tickets.json** (37 KB) - Representative sample of the dataset - Quick exploration and testing - Format reference 5. **triage_training_data.json** (46 KB) - Small training dataset - Proof-of-concept examples - Development and testing 6. **student_training_data.json** (44 KB) - Compact student dataset - Educational examples - Model distillation reference ### Configuration Files - **disaster_ticket_triage_config.json**: Configuration schema and parameters for ticket triage ## Data Format Each record typically contains: - `ticket_id`: Unique identifier - `description`: Ticket description or incident report - `category`: Primary classification category - `tags`: Multiple classification tags - `priority`: Urgency level - `disaster_type`: Type of disaster (earthquake, hurricane, flood, etc.) - Metadata and additional structured fields ## Dataset Statistics - **Total Records**: ~100,000+ tickets across all files - **Disaster Types**: Multiple categories (earthquakes, hurricanes, floods, wildfires, etc.) - **Languages**: English - **License**: CC-BY-4.0 ## Usage ### Load with HuggingFace Datasets ```python from datasets import load_dataset # Load the full dataset dataset = load_dataset("naidukr/opentriage-dataset", split="train") # Load specific files large_dataset = load_dataset("naidukr/opentriage-dataset", data_files="large_triage_dataset.json") training_data = load_dataset("naidukr/opentriage-dataset", data_files="large_triage_training_data.json") ``` ### Load with Pandas ```python import json import pandas as pd with open('large_triage_dataset.json', 'r') as f: data = json.load(f) df = pd.DataFrame(data) ``` ## Applications - Emergency response systems - Disaster ticket classification - Emergency management training - Knowledge distillation for light-weight models - Multi-label classification research - Real-time incident categorization ## Dataset Splits The dataset includes multiple versions optimized for different use cases: - **Full Dataset**: Complete training corpus (large_triage_dataset.json) - **Training Subset**: Balanced training set (large_triage_training_data.json) - **Distillation Data**: For creating smaller models (large_student_training_data.json) - **Samples**: Quick reference and format validation ## Citation If you use this dataset, please cite: ```bibtex @dataset{opentriage_dataset_2026, title={OpenTriage: Disaster Ticket Triage Dataset}, author={OpenTriage Team}, year={2026}, url={https://huggingface.co/datasets/naidukr/opentriage-dataset} } ``` ## License This dataset is licensed under **CC-BY-4.0** (Creative Commons Attribution 4.0 International License). You are free to: - Share — copy and redistribute the material in any medium or format - Adapt — remix, transform, and build upon the material for any purpose, even commercially Under the following terms: - Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made ## Contact & Support For questions, issues, or contributions related to this dataset: - Visit: https://huggingface.co/datasets/naidukr/opentriage-dataset - Related Model: https://huggingface.co/naidukr/Opentriage ## Changelog ### v1.0 (2026-04-18) - Initial dataset release - Multiple dataset sizes for different use cases - Comprehensive documentation and usage examples - Git LFS support for large files
提供机构:
naidukr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作