five

BRlkl/grpo-5-sft-bootstrap-qwen3-4b-thinking-2507

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BRlkl/grpo-5-sft-bootstrap-qwen3-4b-thinking-2507
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: prompt dtype: string - name: plan dtype: string - name: wins dtype: int64 - name: losses dtype: int64 - name: ties dtype: int64 - name: blacklisted dtype: bool - name: selected_source dtype: string - name: selected_rank dtype: int64 - name: selected_diff dtype: float64 - name: domain dtype: string splits: - name: train num_examples: 27930 configs: - config_name: default data_files: - split: train path: data/train-* --- # BRlkl/grpo-5-sft-bootstrap-thinking Derived from `BRlkl/grpo-5-sft-bootstrap-qwen3-4b-thinking-2507`. This version repairs the `plan` column only for rows where `blacklisted = true`. Transformation: - Parse the `plan` JSON. - Read `walk[0]`. - If `walk[0]` is the wrapped prompt form: `CONVERSATION_HISTORY: [Empty] ... Generate {"walk":[...]} for NEW_USER_MESSAGE.` then replace it with just the embedded `NEW_USER_MESSAGE` text. - Leave all non-blacklisted rows unchanged. Audit summary: - Total rows: 27930 - Blacklisted rows: 6398 - Repaired rows: 6398 - Already-clean blacklisted rows: 0 - Non-blacklisted rows: 21532 - Wrapped blacklisted rows after repair: 0 Generated on 2026-04-01 03:35:20 UTC.
提供机构:
BRlkl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作