Melikshah/dc-ops-sft-data

Name: Melikshah/dc-ops-sft-data
Creator: Melikshah
Published: 2026-04-21 15:58:09
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Melikshah/dc-ops-sft-data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: bsd language: - en tags: - conversational - reinforcement-learning - datacenter - llm-agent - openenv task_categories: - conversational size_categories: - 1K<n<10K --- # DC-Ops SFT Dataset Supervised fine-tuning conversations for the **DC-Ops** OpenEnv environment — a physics-based datacenter operations RL environment built on Meta's [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework. Generated using **DeepSeek-R1-Distill-Qwen-32B** as a teacher model, rolled out against the live `DcOpsEnvironment` so every dashboard the student sees is produced by the actual thermal+power simulation. ## Format Each agent turn contains **three blocks**: ``` <think> [R1's natural messy chain of thought — exploration, self-correction, all of it] </think> <reasoning> 1. Situation: [what the dashboard shows that matters]. 2. Constraint: [the relevant ASHRAE limit, procedure rule, or system state]. 3. Step: [which phase of assess→diagnose→compensate→verify→resolve]. 4. Action: [the chosen command and why]. </reasoning> <command> diagnose CRAC-3 </command> ``` Why three blocks: - **`<think>`** lets the model think freely in its native format. No length cap. - **`<reasoning>`** is the canonical training signal — concise, structured, ≤200 words, no self-correction. This is what shows up in the operations log. - **`<command>`** is the single action sent to the env. JSONL schema: each line is `{"conversations": [...]}` with `{from, value}` turns: - `from: "system"` — the agent system prompt - `from: "human"` — environment observation: `**Action Result:** ... **Steps Remaining:** N <dashboard>` - `from: "gpt"` — agent reply (three blocks above) ## Headline numbers - Episodes: **1083** - Agent turns (SFT targets): **8387** - Median `<reasoning>` length: **52 words (347 chars)** - Median `<think>` length: **2435 chars** (present in 100.0% of turns) - Median agent turns/episode: **8** - Resolved episodes: **179 (16.53%)** ## Scenario coverage | Scenario | Count | % | |---|---|---| | A1 | 145 | 13.39% | | A2 | 225 | 20.78% | | A4 | 181 | 16.71% | | B1 | 120 | 11.08% | | B3 | 160 | 14.77% | | B4 | 103 | 9.51% | | VAR_CRAC_MAINT | 30 | 2.77% | | VAR_CRAC_STANDBY | 41 | 3.79% | | VAR_GEN_LOWFUEL | 25 | 2.31% | | VAR_UPS_MODE | 53 | 4.89% | ## Command coverage | Command | Count | % of agent turns | |---|---|---| | `set_rack_load` | 4332 | 51.65% | | `diagnose` | 1286 | 15.33% | | `check_status` | 1046 | 12.47% | | `adjust_setpoint` | 692 | 8.25% | | `start_generator` | 250 | 2.98% | | `wait` | 202 | 2.41% | | `set_fan_speed` | 184 | 2.19% | | `acknowledge_alarm` | 167 | 1.99% | | `start_crac` | 93 | 1.11% | | `set_ups_mode` | 64 | 0.76% | | `stop_crac` | 40 | 0.48% | | `refuel_generator` | 23 | 0.27% | | `stop_generator` | 8 | 0.1% | ## Generation pipeline 1. **Environment**: in-process `DcOpsEnvironment` instances (one per worker) 2. **Teacher**: `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B` via vLLM, called with the official `openai` async SDK pointed at vLLM's OpenAI-compatible endpoint 3. **Concurrent rollout**: asyncio worker pool, semaphore-throttled to vLLM's batch capacity (default 24 in-flight) 4. **Filtering**: drop episodes with parse failures, invalid commands, escalations, missing `<reasoning>` blocks, reasoning <30 chars, or avg reward < −0.20 5. **Balanced cap**: round-robin across scenario keys to preserve rare-command coverage ## Citation - DC-Ops environment:https://github.com/TheDeadcoder/dc_ops_environment - OpenEnv framework: https://github.com/meta-pytorch/OpenEnv - Teacher model: DeepSeek-R1-Distill-Qwen-32B (DeepSeek)

提供机构：

Melikshah

5,000+

优质数据集

54 个

任务类型

进入经典数据集