RuDubnium/email-triage-openenv
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/RuDubnium/email-triage-openenv
下载链接
链接失效反馈官方服务:
资源简介:
# Email Triage & Response — OpenEnv Environment
A real-world OpenEnv environment where AI agents learn to triage corporate email inboxes: categorize, prioritize, reply, forward, and flag emails.
## 🌟 Why Email Triage?
Email triage is a task performed by **billions of knowledge workers daily**. It requires:
- **Reading comprehension** — understanding intent and urgency
- **Decision-making** — choosing correct actions from a discrete set
- **Context reasoning** — considering sender, deadlines, dependencies
- **Multi-step planning** — processing an inbox of emails in priority order
This environment fills a gap in OpenEnv: no existing environment models **text-based decision-making workflows**.
---
## 📋 Environment Overview
| Property | Value |
|----------|-------|
| **Domain** | Corporate Email Inbox Management |
| **Spec** | OpenEnv v1 (step/reset/state) |
| **Tasks** | 3 (easy → medium → hard) |
| **Action Types** | categorize, reply, forward, archive, flag, skip |
| **Reward** | Per-step partial rewards (not just end-of-episode) |
---
## 🎯 Tasks
### Task 1: `email_categorization` (Easy)
- **Emails**: 5
- **Objective**: Categorize each email into the correct category
- **Categories**: `urgent_business`, `meeting_request`, `newsletter`, `spam`, `customer_complaint`, `internal_update`
- **Grading**: % of emails correctly categorized (0.0–1.0)
- **Max Steps**: 15
### Task 2: `priority_triage` (Medium)
- **Emails**: 10
- **Objective**: Categorize + assign priority (low/medium/high/urgent) + reply to emails needing response
- **Grading**: Weighted — 40% category + 30% priority + 30% reply quality
- **Max Steps**: 35
### Task 3: `full_inbox_management` (Hard)
- **Emails**: 20
- **Objective**: Full triage — categorize, prioritize, reply, forward complaints to support, flag deadlines, archive spam
- **Grading**: 25% category + 25% priority + 20% reply + 15% forward + 15% flag
- **Max Steps**: 80
### Task 4: `strategic_inbox_cleanup` (Pro)
- **Emails**: 50
- **Objective**: Long-running strategic management — handle a massive inbox with consistent decision-making
- **Grading**: Balanced 20% across all 5 action categories
- **Max Steps**: 200
---
## 🛤 Complex Trajectories & Strategic Routing
This environment is designed to test an agent's ability to handle **long-running tasks** with **multiple trajectories**.
- **Multiple Routes**: Agents are not forced into a single path. They can choose to:
- **Priority-First**: Triage urgent emails immediately, then handle the rest.
- **Batch-Processing**: Categorize all emails first, then reply to all, then archive all.
- **Linear**: Process the inbox item-by-item from top to bottom.
- **Trajectory Length**: With up to 200 steps and 50 emails, agents must maintain state and consistency over long sequences of actions.
- **Interdependent Actions**: Most emails require multiple actions (e.g., `categorize` -> `reply` -> `flag` -> `archive`), creating deep decision trees.
---
## 📡 Action Space
```json
{
"email_id": "string (required) — ID of the email to act on",
"action_type": "enum: categorize | reply | forward | archive | flag | skip",
"category": "enum: urgent_business | meeting_request | newsletter | spam | customer_complaint | internal_update",
"priority": "enum: low | medium | high | urgent",
"reply_text": "string — reply content (for 'reply' action)",
"forward_to": "string — email address (for 'forward' action)"
}
```
## 👁 Observation Space
```json
{
"emails": [{"id", "from_addr", "to_addr", "subject", "body", "timestamp"}],
"inbox_size": "int — unprocessed emails remaining",
"processed_count": "int — emails processed so far",
"current_step": "int",
"max_steps": "int",
"done": "bool",
"reward": "float — reward for last action",
"cumulative_reward": "float — total reward",
"feedback": "string — human-readable feedback",
"task_id": "string",
"task_description": "string"
}
```
---
## 🏗 Setup & Usage
### Local Development
```bash
# Install dependencies
cd email_triage_env
pip install -r server/requirements.txt
# Start the server
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Test health
curl http://localhost:8000/health
# Reset with a task
curl -X POST http://localhost:8000/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "email_categorization", "seed": 42}'
# Take an action
curl -X POST http://localhost:8000/step \
-H "Content-Type: application/json" \
-d '{"email_id": "email_1", "action_type": "categorize", "category": "spam", "priority": "low"}'
# Get state
curl http://localhost:8000/state
# List tasks
curl http://localhost:8000/tasks
# Get grader score (after episode completes)
curl -X POST http://localhost:8000/grader
```
### Docker
```bash
cd email_triage_env
docker build -f server/Dockerfile -t email-triage-env .
docker run -p 8000:8000 email-triage-env
```
### Inference
```bash
# Set required environment variables
export API_BASE_URL="http://localhost:8000"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your_huggingface_token"
export OPENAI_API_KEY="sk-..."
# Run inference
python inference.py
```
The inference script uses structured logging (`START`, `STEP`, `END`) and supports custom LLM backends via environment variables.
---
## 📊 Baseline Scores
| Task | Difficulty | Rule-Based Score | Description |
|------|-----------|-----------------|-------------|
| `email_categorization` | Easy | ~0.80–1.00 | Keyword heuristics work well |
| `priority_triage` | Medium | ~0.60–0.80 | Priority and reply quality harder |
| `full_inbox_management` | Hard | ~0.40–0.60 | Forwarding and flagging add complexity |
*Scores are deterministic with seed=42. LLM-based agents score higher.*
---
## 🏆 Reward Design
**Per-step rewards** provide signal throughout the trajectory:
- ✅ Correct category: **+0.20**
- ✅ Correct priority: **+0.15**
- ✅ Good reply: **+0.15**
- ✅ Correct forward: **+0.15**
- ✅ Correct flag: **+0.10**
- ✅ Archive spam/newsletters: **+0.10**
- ⚡ Close priority (off by 1): **+0.05**
- ❌ Wrong category: **-0.10**
- ❌ Missed urgent email: **-0.20**
- ❌ Archived urgent/complaint: **-0.15 to -0.20**
---
## 📁 Project Structure
```
email_triage_env/
├── __init__.py # Package exports
├── models.py # Pydantic Action/Observation/State
├── email_data.py # Deterministic email generator
├── tasks.py # Task definitions & graders
├── inference.py # Main inference script (replaces baseline.py)
├── openenv.yaml # OpenEnv manifest
├── pyproject.toml # Package config
├── README.md # This file
└── server/
├── __init__.py
├── app.py # FastAPI application
├── email_triage_environment.py # Core Environment class
├── Dockerfile # Container definition
└── requirements.txt # Python dependencies
```
---
## 🔌 API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/reset` | POST | Start new episode |
| `/step` | POST | Execute action |
| `/state` | GET | Get episode state |
| `/tasks` | GET | List tasks with schema |
| `/grader` | POST | Get grader score |
| `/baseline` | POST | Run baseline agent |
---
## License
MIT
提供机构:
RuDubnium



