five

Kiria-Nozan/TRIM-gpt-oss-120b-separate-neighbors-only

收藏
Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Kiria-Nozan/TRIM-gpt-oss-120b-separate-neighbors-only
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是TRIM代理推理消息的Hugging Face友好公开导出,专门用于监督微调(SFT)数据。它包括来自提供者vllm、模型gpt-oss-120b的数据,采用local_neighbor_only的SFT模式,仅包含训练分割,共8820条记录。任务涵盖多个化学和生物医学领域,如AMES、BBB_Martins、Bioavailability_Ma、CYP2C9_Substrate_CarbonMangels、CYP2D6_Substrate_CarbonMangels、CYP3A4_Substrate_CarbonMangels、Carcinogens_Lagunin、ClinTox、DILI、HIA_Hou、PAMPA_NCATS、Pgp_Broccatelli、SARSCoV2_3CLPro_Diamond、SARSCoV2_Vitro_Touret、Skin_Reaction和hERG。每条记录以JSONL格式存储,包含schema_version、sft_mode、task、split、sample_index、sample_id、smiles、gt_label、final_answer_option和messages等字段,其中messages字段存储工具增强的聊天记录,包括嵌套的tool_calls和原始SFT导出中的助理思考文本。数据集经过公共清洗,移除了本地绝对路径,任务级导出元数据存储在metadata/manifest.json中。

This directory is a Hugging Face-friendly public export of the TRIM agent reasoning SFT data. It includes data from provider vllm, model gpt-oss-120b, with SFT mode local_neighbor_only, present splits train, and 8820 records in this export manifest. Tasks in this split are AMES, BBB_Martins, Bioavailability_Ma, CYP2C9_Substrate_CarbonMangels, CYP2D6_Substrate_CarbonMangels, CYP3A4_Substrate_CarbonMangels, Carcinogens_Lagunin, ClinTox, DILI, HIA_Hou, PAMPA_NCATS, Pgp_Broccatelli, SARSCoV2_3CLPro_Diamond, SARSCoV2_Vitro_Touret, Skin_Reaction, and hERG. Each JSONL line is one training example with top-level fields such as schema_version, sft_mode, task, split, sample_index, sample_id, smiles, gt_label, final_answer_option, and messages, where the messages field stores a tool-augmented chat transcript, including nested tool_calls and the assistant thinking text used in the original SFT export. Public sanitization has removed local absolute source_paths, and task-level export metadata is stored under metadata/manifest.json.
提供机构:
Kiria-Nozan
二维码
社区交流群
二维码
科研交流群
商业服务