RuDubnium/email-triage-openenv

Name: RuDubnium/email-triage-openenv
Creator: RuDubnium
Published: 2026-04-08 16:50:47
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/RuDubnium/email-triage-openenv

下载链接

链接失效反馈

官方服务：

资源简介：

# Email Triage & Response — OpenEnv Environment A real-world OpenEnv environment where AI agents learn to triage corporate email inboxes: categorize, prioritize, reply, forward, and flag emails. ## 🌟 Why Email Triage? Email triage is a task performed by **billions of knowledge workers daily**. It requires: - **Reading comprehension** — understanding intent and urgency - **Decision-making** — choosing correct actions from a discrete set - **Context reasoning** — considering sender, deadlines, dependencies - **Multi-step planning** — processing an inbox of emails in priority order This environment fills a gap in OpenEnv: no existing environment models **text-based decision-making workflows**. --- ## 📋 Environment Overview | Property | Value | |----------|-------| | **Domain** | Corporate Email Inbox Management | | **Spec** | OpenEnv v1 (step/reset/state) | | **Tasks** | 3 (easy → medium → hard) | | **Action Types** | categorize, reply, forward, archive, flag, skip | | **Reward** | Per-step partial rewards (not just end-of-episode) | --- ## 🎯 Tasks ### Task 1: `email_categorization` (Easy) - **Emails**: 5 - **Objective**: Categorize each email into the correct category - **Categories**: `urgent_business`, `meeting_request`, `newsletter`, `spam`, `customer_complaint`, `internal_update` - **Grading**: % of emails correctly categorized (0.0–1.0) - **Max Steps**: 15 ### Task 2: `priority_triage` (Medium) - **Emails**: 10 - **Objective**: Categorize + assign priority (low/medium/high/urgent) + reply to emails needing response - **Grading**: Weighted — 40% category + 30% priority + 30% reply quality - **Max Steps**: 35 ### Task 3: `full_inbox_management` (Hard) - **Emails**: 20 - **Objective**: Full triage — categorize, prioritize, reply, forward complaints to support, flag deadlines, archive spam - **Grading**: 25% category + 25% priority + 20% reply + 15% forward + 15% flag - **Max Steps**: 80 ### Task 4: `strategic_inbox_cleanup` (Pro) - **Emails**: 50 - **Objective**: Long-running strategic management — handle a massive inbox with consistent decision-making - **Grading**: Balanced 20% across all 5 action categories - **Max Steps**: 200 --- ## 🛤 Complex Trajectories & Strategic Routing This environment is designed to test an agent's ability to handle **long-running tasks** with **multiple trajectories**. - **Multiple Routes**: Agents are not forced into a single path. They can choose to: - **Priority-First**: Triage urgent emails immediately, then handle the rest. - **Batch-Processing**: Categorize all emails first, then reply to all, then archive all. - **Linear**: Process the inbox item-by-item from top to bottom. - **Trajectory Length**: With up to 200 steps and 50 emails, agents must maintain state and consistency over long sequences of actions. - **Interdependent Actions**: Most emails require multiple actions (e.g., `categorize` -> `reply` -> `flag` -> `archive`), creating deep decision trees. --- ## 📡 Action Space ```json { "email_id": "string (required) — ID of the email to act on", "action_type": "enum: categorize | reply | forward | archive | flag | skip", "category": "enum: urgent_business | meeting_request | newsletter | spam | customer_complaint | internal_update", "priority": "enum: low | medium | high | urgent", "reply_text": "string — reply content (for 'reply' action)", "forward_to": "string — email address (for 'forward' action)" } ``` ## 👁 Observation Space ```json { "emails": [{"id", "from_addr", "to_addr", "subject", "body", "timestamp"}], "inbox_size": "int — unprocessed emails remaining", "processed_count": "int — emails processed so far", "current_step": "int", "max_steps": "int", "done": "bool", "reward": "float — reward for last action", "cumulative_reward": "float — total reward", "feedback": "string — human-readable feedback", "task_id": "string", "task_description": "string" } ``` --- ## 🏗 Setup & Usage ### Local Development ```bash # Install dependencies cd email_triage_env pip install -r server/requirements.txt # Start the server uvicorn server.app:app --host 0.0.0.0 --port 8000 # Test health curl http://localhost:8000/health # Reset with a task curl -X POST http://localhost:8000/reset \ -H "Content-Type: application/json" \ -d '{"task_id": "email_categorization", "seed": 42}' # Take an action curl -X POST http://localhost:8000/step \ -H "Content-Type: application/json" \ -d '{"email_id": "email_1", "action_type": "categorize", "category": "spam", "priority": "low"}' # Get state curl http://localhost:8000/state # List tasks curl http://localhost:8000/tasks # Get grader score (after episode completes) curl -X POST http://localhost:8000/grader ``` ### Docker ```bash cd email_triage_env docker build -f server/Dockerfile -t email-triage-env . docker run -p 8000:8000 email-triage-env ``` ### Inference ```bash # Set required environment variables export API_BASE_URL="http://localhost:8000" export MODEL_NAME="gpt-4o-mini" export HF_TOKEN="your_huggingface_token" export OPENAI_API_KEY="sk-..." # Run inference python inference.py ``` The inference script uses structured logging (`START`, `STEP`, `END`) and supports custom LLM backends via environment variables. --- ## 📊 Baseline Scores | Task | Difficulty | Rule-Based Score | Description | |------|-----------|-----------------|-------------| | `email_categorization` | Easy | ~0.80–1.00 | Keyword heuristics work well | | `priority_triage` | Medium | ~0.60–0.80 | Priority and reply quality harder | | `full_inbox_management` | Hard | ~0.40–0.60 | Forwarding and flagging add complexity | *Scores are deterministic with seed=42. LLM-based agents score higher.* --- ## 🏆 Reward Design **Per-step rewards** provide signal throughout the trajectory: - ✅ Correct category: **+0.20** - ✅ Correct priority: **+0.15** - ✅ Good reply: **+0.15** - ✅ Correct forward: **+0.15** - ✅ Correct flag: **+0.10** - ✅ Archive spam/newsletters: **+0.10** - ⚡ Close priority (off by 1): **+0.05** - ❌ Wrong category: **-0.10** - ❌ Missed urgent email: **-0.20** - ❌ Archived urgent/complaint: **-0.15 to -0.20** --- ## 📁 Project Structure ``` email_triage_env/ ├── __init__.py # Package exports ├── models.py # Pydantic Action/Observation/State ├── email_data.py # Deterministic email generator ├── tasks.py # Task definitions & graders ├── inference.py # Main inference script (replaces baseline.py) ├── openenv.yaml # OpenEnv manifest ├── pyproject.toml # Package config ├── README.md # This file └── server/ ├── __init__.py ├── app.py # FastAPI application ├── email_triage_environment.py # Core Environment class ├── Dockerfile # Container definition └── requirements.txt # Python dependencies ``` --- ## 🔌 API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/reset` | POST | Start new episode | | `/step` | POST | Execute action | | `/state` | GET | Get episode state | | `/tasks` | GET | List tasks with schema | | `/grader` | POST | Get grader score | | `/baseline` | POST | Run baseline agent | --- ## License MIT

提供机构：

RuDubnium

5,000+

优质数据集

54 个

任务类型

进入经典数据集