five

King-8/help-request-messages-v2

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/King-8/help-request-messages-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-classification --- # 📊 Help Classifier Dataset (v2) ## 🧠 Overview The **Help Classifier Dataset (v2)** is a curated NLP dataset designed to classify student help requests into meaningful categories within a collaborative learning environment. This dataset was developed as part of a larger AI system for the **Coding in Color (CIC)** ecosystem, where students work across domains such as AI development, game development, 2D/3D art, and robotics. The goal of this dataset is to enable models to: * Understand real student communication * Classify intent behind help requests * Support downstream systems (e.g., generators, agents, MCP tools) --- ## 🚀 Version Update (v1 → v2) ### 🔹 v1 * ~100 examples * Basic structure * Limited variation * Primarily clean, structured inputs ### 🔹 v2 (Current) * **1,000 examples** * Balanced across all categories * High variation in tone and structure * Includes: * informal/slang language * mixed capitalization * short + long messages * indirect and ambiguous requests * real CIC-inspired check-in data 👉 v2 significantly improves **generalization and realism** --- ## 🧩 Task Definition **Task Type:** Text Classification **Input:** Student message (free-form text) **Output:** One of 5 help categories --- ## 🏷️ Labels | Label | Description | | ------------------ | --------------------------------------------------- | | `learning_help` | User is trying to understand a concept or skill | | `project_help` | User needs direction or next steps in a project | | `technical_issue` | Something is broken or not working as expected | | `attendance_issue` | User missed a meeting or needs to catch up | | `general_guidance` | User expresses uncertainty, stress, or needs advice | --- ## 📦 Dataset Structure Each example contains: ```json { "text": "I missed the meeting and now idk what we’re doing", "label": "attendance_issue" } ``` --- ## 📊 Dataset Statistics * **Total Examples:** 1,000 * **Classes:** 5 * **Distribution:** Balanced (~200 per class) --- ## 🎯 Design Philosophy This dataset was intentionally designed to reflect **real-world student communication**, including: * Natural language (not overly cleaned) * Mixed tone (formal + casual) * Realistic ambiguity * Multi-intent phrasing (but single-label classification) The dataset progresses in complexity across batches: 1. Clean structured examples 2. CIC-specific scenarios 3. Messy/realistic inputs 4. Edge cases and ambiguity 5. Advanced multi-layered messages --- ## 🧪 Use Cases This dataset can be used for: * Help request classification systems * Educational AI assistants * Slack/Discord message classification * MCP (Model Context Protocol) pipelines * Routing systems for AI agents --- ## 🔗 System Context (CIC Ecosystem) This dataset is part of a broader system that includes (or will include): * Help Classifier (this dataset) * Help Generator (response generation) * Help Summarizer (context summarization) * MCP Server integration --- ## ⚠️ Limitations * Single-label classification (some inputs may contain multiple intents) * Domain-specific (focused on student tech environments) * Informal language may introduce edge ambiguity --- ## 🔮 Future Improvements * Multi-label classification support * Larger dataset (2,000+ examples) * Additional categories (e.g., collaboration, leadership) * More real-world Slack data integration --- ## 👤 Author Created by Kingston Lewis as part of the Coding in Color program for the AI Dev team.
提供机构:
King-8
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作