King-8/help-request-messages-v2
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/King-8/help-request-messages-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-classification
---
# 📊 Help Classifier Dataset (v2)
## 🧠 Overview
The **Help Classifier Dataset (v2)** is a curated NLP dataset designed to classify student help requests into meaningful categories within a collaborative learning environment.
This dataset was developed as part of a larger AI system for the **Coding in Color (CIC)** ecosystem, where students work across domains such as AI development, game development, 2D/3D art, and robotics.
The goal of this dataset is to enable models to:
* Understand real student communication
* Classify intent behind help requests
* Support downstream systems (e.g., generators, agents, MCP tools)
---
## 🚀 Version Update (v1 → v2)
### 🔹 v1
* ~100 examples
* Basic structure
* Limited variation
* Primarily clean, structured inputs
### 🔹 v2 (Current)
* **1,000 examples**
* Balanced across all categories
* High variation in tone and structure
* Includes:
* informal/slang language
* mixed capitalization
* short + long messages
* indirect and ambiguous requests
* real CIC-inspired check-in data
👉 v2 significantly improves **generalization and realism**
---
## 🧩 Task Definition
**Task Type:** Text Classification
**Input:** Student message (free-form text)
**Output:** One of 5 help categories
---
## 🏷️ Labels
| Label | Description |
| ------------------ | --------------------------------------------------- |
| `learning_help` | User is trying to understand a concept or skill |
| `project_help` | User needs direction or next steps in a project |
| `technical_issue` | Something is broken or not working as expected |
| `attendance_issue` | User missed a meeting or needs to catch up |
| `general_guidance` | User expresses uncertainty, stress, or needs advice |
---
## 📦 Dataset Structure
Each example contains:
```json
{
"text": "I missed the meeting and now idk what we’re doing",
"label": "attendance_issue"
}
```
---
## 📊 Dataset Statistics
* **Total Examples:** 1,000
* **Classes:** 5
* **Distribution:** Balanced (~200 per class)
---
## 🎯 Design Philosophy
This dataset was intentionally designed to reflect **real-world student communication**, including:
* Natural language (not overly cleaned)
* Mixed tone (formal + casual)
* Realistic ambiguity
* Multi-intent phrasing (but single-label classification)
The dataset progresses in complexity across batches:
1. Clean structured examples
2. CIC-specific scenarios
3. Messy/realistic inputs
4. Edge cases and ambiguity
5. Advanced multi-layered messages
---
## 🧪 Use Cases
This dataset can be used for:
* Help request classification systems
* Educational AI assistants
* Slack/Discord message classification
* MCP (Model Context Protocol) pipelines
* Routing systems for AI agents
---
## 🔗 System Context (CIC Ecosystem)
This dataset is part of a broader system that includes (or will include):
* Help Classifier (this dataset)
* Help Generator (response generation)
* Help Summarizer (context summarization)
* MCP Server integration
---
## ⚠️ Limitations
* Single-label classification (some inputs may contain multiple intents)
* Domain-specific (focused on student tech environments)
* Informal language may introduce edge ambiguity
---
## 🔮 Future Improvements
* Multi-label classification support
* Larger dataset (2,000+ examples)
* Additional categories (e.g., collaboration, leadership)
* More real-world Slack data integration
---
## 👤 Author
Created by Kingston Lewis as part of the Coding in Color program for the AI Dev team.
提供机构:
King-8



