five

laion/Sera-4.6-Lite-T2-v4-1000

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/laion/Sera-4.6-Lite-T2-v4-1000
下载链接
链接失效反馈
官方服务:
资源简介:
laion/Sera-4.6-Lite-T2-v4-1000是allenai/Sera-4.6-Lite-T2数据集的一个子集,包含1,000行数据(源数据共36,083行)。该数据集特别处理了OpenAI的`tool_calls`,将其预渲染为Hermes/Qwen3风格的`<tool_call>...</tool_call>`标记,并将工具响应包装为`<tool_response>...</tool_response>`。数据格式为原始JSONL,每行包含`messages: list[{role, content, train}]`,角色包括`system | user | assistant`。工具观察结果表示为`role: user`,并带有`<tool_response>...</tool_response>`包装。`train: bool`字段用于axolotl的`message_field_training: train`。数据集采用确定性随机采样,种子为42,行索引嵌套在完整的36,083行源数据中。

laion/Sera-4.6-Lite-T2-v4-1000 is a row-subset of the allenai/Sera-4.6-Lite-T2 dataset, containing 1,000 rows (source: 36,083 rows). The dataset pre-renders OpenAI `tool_calls` into Hermes/Qwen3-style `<tool_call>...</tool_call>` wire tokens and wraps tool responses as `<tool_response>...</tool_response>`. The format is raw JSONL, with each row containing `messages: list[{role, content, train}]`. Roles are `system | user | assistant`. Tool observations are represented as `role: user` with `<tool_response>...</tool_response>` wrapping. `train: bool` on each message is the per-message loss mask consumed by axolotls `message_field_training: train`. Sampling is deterministic random, with seed=42, row-indexed into the full 36,083-row source. Row subsets are nested.
提供机构:
laion
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作