laion/CoderForge-Preview-v3-3160
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/laion/CoderForge-Preview-v3-3160
下载链接
链接失效反馈官方服务:
资源简介:
laion/CoderForge-Preview-v3-3160数据集是togethercomputer/CoderForge-Preview数据集的一个子集,具体是trajectories-tokenized_qwencoder子集的行子集。包含3,160行数据,原始数据有155,144行,分布在4个slug中。数据格式是为Qwen3预处理的tokenized数据,每行包含多个字段,如input_ids、attention_mask、labels等。数据集用于text-generation任务,特别适用于axolotl框架,且数据已经预处理,axolotl可以跳过chat_template渲染器。
Row-subset of the pre-tokenized trajectories in togethercomputer/CoderForge-Preview (`trajectories-tokenized_qwencoder` subset). Size: 3,160 rows (source: 155,144 across 4 slugs). Format: native pre-tokenized data for Qwen3 (tokenizer shared with Qwen2.5-Coder / Qwen3-Coder / Qwen3-8B). Per row columns: input_ids, attention_mask, labels, chat_template_applied, trajectory_id, reward, source. Sampled deterministically (seed=42) from a concatenation of all 4 source slugs (R2E_Gym, SWE_Rebench, SWE_Smith, filtered_reward1). Row subsets are nested.
提供机构:
laion



