laion/Sera-4.6-Lite-T2-v4-1000

Name: laion/Sera-4.6-Lite-T2-v4-1000
Creator: laion
Published: 2026-04-22 20:04:10
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/laion/Sera-4.6-Lite-T2-v4-1000

下载链接

链接失效反馈

官方服务：

资源简介：

laion/Sera-4.6-Lite-T2-v4-1000是allenai/Sera-4.6-Lite-T2数据集的一个子集，包含1,000行数据（源数据共36,083行）。该数据集特别处理了OpenAI的`tool_calls`，将其预渲染为Hermes/Qwen3风格的`<tool_call>...</tool_call>`标记，并将工具响应包装为`<tool_response>...</tool_response>`。数据格式为原始JSONL，每行包含`messages: list[{role, content, train}]`，角色包括`system | user | assistant`。工具观察结果表示为`role: user`，并带有`<tool_response>...</tool_response>`包装。`train: bool`字段用于axolotl的`message_field_training: train`。数据集采用确定性随机采样，种子为42，行索引嵌套在完整的36,083行源数据中。

laion/Sera-4.6-Lite-T2-v4-1000 is a row-subset of the allenai/Sera-4.6-Lite-T2 dataset, containing 1,000 rows (source: 36,083 rows). The dataset pre-renders OpenAI `tool_calls` into Hermes/Qwen3-style `<tool_call>...</tool_call>` wire tokens and wraps tool responses as `<tool_response>...</tool_response>`. The format is raw JSONL, with each row containing `messages: list[{role, content, train}]`. Roles are `system | user | assistant`. Tool observations are represented as `role: user` with `<tool_response>...</tool_response>` wrapping. `train: bool` on each message is the per-message loss mask consumed by axolotls `message_field_training: train`. Sampling is deterministic random, with seed=42, row-indexed into the full 36,083-row source. Row subsets are nested.

提供机构：

laion

5,000+

优质数据集

54 个

任务类型

进入经典数据集