laion/Sera-4.5A-Full-T1-v3-316
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/laion/Sera-4.5A-Full-T1-v3-316
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是allenai/Sera-4.5A-Full-T1的一个子集,包含316行数据(完整数据集有72,118行)。数据格式为原始JSONL,采用OpenAI原生消息布局,保留了原始字段如messages、instance_id等,并添加了指向父数据集的source字段。每个助手消息包含一个原生的tool_calls数组和一个train: bool标志,用于每条消息的损失掩码。数据集适用于直接通过axolotl使用,配置为type: chat_template和chat_template: chatml。采样方法为确定性随机,种子为42。
Subset of [allenai/Sera-4.5A-Full-T1](https://huggingface.co/datasets/allenai/Sera-4.5A-Full-T1). Size: 316 rows (full dataset: 72,118 rows). Format: Raw JSONL, OpenAI-native messages layout. Preserves the original `messages` field (as JSON string), `instance_id`, `rollout_patch`, `func_name`, `func_path`, `problem_statement`, `target_patch`, `docker_image`. Adds a `source` field pointing back to the parent dataset. Each assistant message carries a native `tool_calls` array (OpenAI tool-calling format) and a `train: bool` flag for per-message loss masking — these are **not** flattened into shareGPT. Intended for direct consumption by [axolotl](https://github.com/axolotl-ai-cloud/axolotl) with `type: chat_template`, `chat_template: chatml`, `message_field_training: train`. Sampling: deterministic random, seed=42, row-indexed into the full dataset.
提供机构:
laion



