five

Kiria-Nozan/TRIM-gpt-oss-120b-16-tasks

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Kiria-Nozan/TRIM-gpt-oss-120b-16-tasks
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: "train/*.jsonl" --- # TRIM Agent Reasoning Messages (HF Public Export) This directory is a Hugging Face-friendly public export of the TRIM agent reasoning SFT data. ## What Is Included - Provider: `vllm` - Model: `gpt-oss-120b` - Splits present: `train` - Records in this export manifest: `12040` - Tasks in this split: `AMES, BBB_Martins, Bioavailability_Ma, CYP2C9_Substrate_CarbonMangels, CYP2D6_Substrate_CarbonMangels, CYP3A4_Substrate_CarbonMangels, Carcinogens_Lagunin, ClinTox, DILI, HIA_Hou, PAMPA_NCATS, Pgp_Broccatelli, SARSCoV2_3CLPro_Diamond, SARSCoV2_Vitro_Touret, Skin_Reaction, hERG` ## Record Schema Each JSONL line is one training example with these top-level fields: - `schema_version` - `task` - `split` - `sample_index` - `sample_id` - `smiles` - `gt_label` - `final_answer_option` - `messages` The `messages` field stores a tool-augmented chat transcript, including nested `tool_calls` and the assistant `thinking` text used in the original SFT export. ## Public Sanitization - Local absolute `source_paths` have been removed from the sample records by default. - Task-level export metadata is stored under `metadata/manifest.json`. ## Loading Example ```python from datasets import load_dataset ds = load_dataset( "json", data_files={"train": "train/*.jsonl"}, ) ```

--- 配置项: - 配置名称:default 数据文件: - 数据拆分:train 文件路径:"train/*.jsonl" --- # TRIM 智能体推理消息(Hugging Face 公开导出版本) 本目录为适配Hugging Face生态的TRIM AI智能体(AI Agent)推理监督微调(Supervised Fine-Tuning,SFT)数据集公开导出文件。 ## 包含内容 - 提供方:`vllm` - 所用模型:`gpt-oss-120b` - 现有数据拆分:`train`(训练集) - 本次导出清单内的记录总数:`12040` - 当前拆分包含的任务:`AMES, BBB_Martins, Bioavailability_Ma, CYP2C9_Substrate_CarbonMangels, CYP2D6_Substrate_CarbonMangels, CYP3A4_Substrate_CarbonMangels, Carcinogens_Lagunin, ClinTox, DILI, HIA_Hou, PAMPA_NCATS, Pgp_Broccatelli, SARSCoV2_3CLPro_Diamond, SARSCoV2_Vitro_Touret, Skin_Reaction, hERG` ## 记录架构 每条JSONL行对应一个训练样本,包含以下顶级字段: - `schema_version`:架构版本 - `task`:任务名称 - `split`:数据拆分 - `sample_index`:样本索引 - `sample_id`:样本标识符 - `smiles`:简化分子线性输入规范(SMILES) - `gt_label`:基准标签 - `final_answer_option`:最终答案选项 - `messages`:对话消息 其中`messages`字段用于存储工具增强型聊天会话记录,包含嵌套的`tool_calls`(工具调用)与原始SFT导出中使用的助手`thinking`(思考过程)文本。 ## 公开导出脱敏处理 - 默认已移除样本记录中的本地绝对`source_paths`(源路径)信息。 - 任务级导出元数据存储于`metadata/manifest.json`文件中。 ## 加载示例 python from datasets import load_dataset ds = load_dataset( "json", data_files={"train": "train/*.jsonl"}, )
提供机构:
Kiria-Nozan
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作