lordx64-claude-opus-4.7-max-cleaned

Name: lordx64-claude-opus-4.7-max-cleaned
Creator: maas
Published: 2026-05-01 01:56:16
License: 暂无描述

魔搭社区2026-05-01 更新2026-05-03 收录

下载链接：

https://modelscope.cn/datasets/TeichAI/lordx64-claude-opus-4.7-max-cleaned

下载链接

链接失效反馈

官方服务：

资源简介：

# reasoning-distill-claude-opus-4-7-max-cleaned Cleaned version of [`lordx64/reasoning-distill-claude-opus-4-7-max`](https://huggingface.co/datasets/lordx64/reasoning-distill-claude-opus-4-7-max). See the original dataset for full provenance, collection methodology, and terms of use. ## Cleaning steps | Step | Filter | Reason | Rows removed | |------|--------|--------|--------------| | 1 | Simulated thinking (`...`) | Rows with `...` in thinking/response indicate the model learned to simulate reasoning (e.g., "Now I'm laying out the puzzle grids...") rather than actually performing it. This causes failures in agentic tasks and hallucinations during thinking. | 1,230 (15.1%) | | 2 | Duplicate prompts | Deduplicated by exact prompt match, keeping the first occurrence. | 989 (12.2%) | | 3 | Missing fields | Rows without valid `thinking`, `prompt`, and `response` content (empty or null values). | 1,098 (13.5%) | | **Total** | | | **3,317 (40.8%)** | | Metric | Value | |--------|-------| | Original rows | 8,124 | | Final rows | 4,807 | | Retention rate | 59.2% | ## Format Each row is a JSON object with the `messages` column first, following the standard chat format: ```json { "messages": [ {"role": "system", "content": "You are a helpful assistant", "thinking": null}, {"role": "user", "content": "...", "thinking": null}, {"role": "assistant", "content": "...", "thinking": "..."} ], "system": "You are a helpful assistant", "prompt": "...", "thinking": "...", "response": "...", "model": "claude-opus-4-7" } ``` ## License Apache-2.0 (dataset packaging). Content subject to upstream [Anthropic usage policies](https://www.anthropic.com/legal/usage-policy).

# reasoning-distill-claude-opus-4-7-max-cleaned 本数据集为 [`lordx64/reasoning-distill-claude-opus-4-7-max`](https://huggingface.co/datasets/lordx64/reasoning-distill-claude-opus-4-7-max) 的清洗后版本。如需获取完整的溯源信息、采集方法及使用条款，请参阅原始数据集。 ## 清洗步骤 | 步骤 | 过滤规则 | 过滤原因 | 移除行数 | |------|--------|--------|--------------| | 1 | 模拟思考（`...`） | 若思考或回复字段中包含`...`，则表明模型仅学会模拟推理过程（例如"Now I'm laying out the puzzle grids..."），而非真正执行推理。这会导致AI智能体（AI Agent）任务失败，并在思考阶段产生幻觉。 | 1230条（占比15.1%） | | 2 | 重复提示词 | 按精确提示词匹配去重，保留首次出现的条目。 | 989条（占比12.2%） | | 3 | 缺失字段 | 缺少有效`thinking`（思考内容）、`prompt`（提示词）和`response`（回复内容）字段（值为空或null）的条目。 | 1098条（占比13.5%） | | **总计** | | | **3317条（占比40.8%）** | | 指标 | 数值 | |--------|-------| | 原始行数 | 8124 | | 最终行数 | 4807 | | 留存率 | 59.2% | ## 数据格式每行均为JSON对象，`messages`字段置于首位，遵循标准对话格式： json { "messages": [ {"role": "system", "content": "你是一名乐于助人的助手", "thinking": null}, {"role": "user", "content": "...", "thinking": null}, {"role": "assistant", "content": "...", "thinking": "..."} ], "system": "你是一名乐于助人的助手", "prompt": "...", "thinking": "...", "response": "...", "model": "claude-opus-4-7" } ## 许可协议本数据集采用Apache-2.0协议（仅针对数据集打包部分）。数据集内容需遵循上游[Anthropic使用政策](https://www.anthropic.com/legal/usage-policy)。

提供机构：

maas

创建时间：

2026-04-27

搜集汇总

数据集介绍