raca-workspace-v1/grpo-tool-sat-dataset-v1
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/raca-workspace-v1/grpo-tool-sat-dataset-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- grpo-tool-saturation
- synthetic
- toy
- tool-use
---
# grpo-tool-sat-dataset-v1
Synthetic lookup-table dataset for the GRPO Tool Saturation experiment. 10k keys k in [0, 9999] with r = k mod 3 feature. Tools map/table have opaque, partially overlapping correct-domains: map correct for r in {0, 2}, table correct for r in {1, 2}. f(k) = SHA256(str(k))[:6]; wrong-hash returns are g_map(k), g_table(k). SFT demos use overlap_skew_map=0.6 on r=2; token-level shuffle across r-classes with fixed seed.
## Dataset Info
- **Rows**: 20000
- **Columns**: 11
## Columns
| Column | Type | Description |
|--------|------|-------------|
| tool | Value('string') | 'map' or 'table' — the tool the demo uses (sft split only) |
| user_prompt | Value('string') | User-side prompt: 'Key: <k>' (sft split only) |
| assistant_completion | Value('string') | SFT target: prose + <tool_call> + <observation> + <answer> (sft split only) |
| k | Value('int64') | Key integer in [0, 9999] |
| r | Value('int64') | k mod 3 (0 = M_only, 1 = T_only, 2 = Overlap) |
| f_k | Value('string') | 6-hex correct answer = SHA256(str(k))[:6] |
| g_map_k | Value('string') | 6-hex wrong answer returned by map(k) when r=1 (meta split only) |
| g_table_k | Value('string') | 6-hex wrong answer returned by table(k) when r=0 (meta split only) |
| correct_tools | List(Value('string')) | List of tools yielding reward 1 for this key (meta split only) |
| region | Value('string') | Human-readable region label |
| split_name | Value('string') | Logical split: 'meta' (every key), 'sft' (demo rows), or 'eval' (held-out eval keys). Filter by this to get the split you need. |
## Generation Parameters
```json
{
"script_name": "src/data_gen.py",
"model": "n/a",
"description": "Synthetic lookup-table dataset for the GRPO Tool Saturation experiment. 10k keys k in [0, 9999] with r = k mod 3 feature. Tools map/table have opaque, partially overlapping correct-domains: map correct for r in {0, 2}, table correct for r in {1, 2}. f(k) = SHA256(str(k))[:6]; wrong-hash returns are g_map(k), g_table(k). SFT demos use overlap_skew_map=0.6 on r=2; token-level shuffle across r-classes with fixed seed.",
"hyperparameters": {
"seed": 1,
"eval_frac": 0.2,
"overlap_skew_map": 0.6,
"hash_slice": 6
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("raca-workspace-v1/grpo-tool-sat-dataset-v1", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*Uploaded via [RACA](https://github.com/Zayne-sprague/Dr-Claude-Code) hf_utility.*
license: MIT协议
tags:
- grpo-tool-saturation
- 合成的
- 玩具数据集
- 工具使用
# grpo-tool-sat-dataset-v1
本数据集为GRPO工具饱和度(GRPO Tool Saturation)实验所用的合成查找表数据集,包含10000个取值范围为[0, 9999]的键k,且带有特征r = k mod 3。工具map与table的有效作用域不透明且存在部分重叠:当r∈{0,2}时map工具有效,当r∈{1,2}时table工具有效。正确答案f(k)为SHA256(str(k))的前6位十六进制值;错误哈希返回值分别为g_map(k)与g_table(k)。监督微调(Supervised Fine-Tuning, SFT)演示样本在r=2时使用overlap_skew_map=0.6的参数设置,并通过固定随机种子对各r类别下的Token进行级打乱。
## 数据集信息
- **数据行数**:20000
- **列数**:11
## 列信息
| 列名 | 数据类型 | 描述 |
|--------|------|-------------|
| tool | Value('string') | 取值为'map'或'table',代表演示所用的工具(仅SFT划分集包含该列) |
| user_prompt | Value('string') | 用户侧提示词:格式为'Key: <k>'(仅SFT划分集包含该列) |
| assistant_completion | Value('string') | SFT目标输出格式:散文体文本 + <工具调用> + <观测结果> + <答案>(仅SFT划分集包含该列) |
| k | Value('int64') | 取值范围为[0, 9999]的整数键 |
| r | Value('int64') | k mod 3的计算结果(0 = 仅map工具可用,1 = 仅table工具可用,2 = 两类工具均可用) |
| f_k | Value('string') | 6位十六进制格式的正确答案,即SHA256(str(k))[:6] |
| g_map_k | Value('string') | 当r=1时,map工具返回的6位十六进制错误答案(仅元划分集包含该列) |
| g_table_k | Value('string') | 当r=0时,table工具返回的6位十六进制错误答案(仅元划分集包含该列) |
| correct_tools | List(Value('string')) | 对当前键可返回奖励值1的工具列表(仅元划分集包含该列) |
| region | Value('string') | 人类可读的区域标签 |
| split_name | Value('string') | 逻辑划分类型:'meta'(包含所有键)、'sft'(演示样本)或'eval'(留出评估集)。可通过该字段筛选所需划分。 |
## 生成参数
json
{
"script_name": "src/data_gen.py",
"model": "无(n/a)",
"description": "本数据集为GRPO工具饱和度实验所用的合成查找表数据集,包含10000个取值范围为[0, 9999]的键k,且带有特征r = k mod 3。工具map与table的有效作用域不透明且存在部分重叠:当r∈{0,2}时map工具有效,当r∈{1,2}时table工具有效。正确答案f(k)为SHA256(str(k))的前6位十六进制值;错误哈希返回值分别为g_map(k)与g_table(k)。SFT演示样本在r=2时使用overlap_skew_map=0.6的参数设置,并通过固定随机种子对各r类别下的Token进行级打乱。",
"hyperparameters": {
"seed": 1,
"eval_frac": 0.2,
"overlap_skew_map": 0.6,
"hash_slice": 6
},
"input_datasets": []
}
## 使用示例
python
from datasets import load_dataset
dataset = load_dataset("raca-workspace-v1/grpo-tool-sat-dataset-v1", split="train")
print(f"已加载 {len(dataset)} 行数据")
---
*通过[RACA](https://github.com/Zayne-sprague/Dr-Claude-Code)的hf_utility工具上传。*
提供机构:
raca-workspace-v1



