five

ansulev/carnice-glm5-hermes-traces

收藏
Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ansulev/carnice-glm5-hermes-traces
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: raw_rows data_files: - split: train path: data/raw_rows.jsonl - config_name: kept data_files: - split: train path: data/kept.jsonl - config_name: high_quality_kept data_files: - split: train path: data/high_quality_kept.jsonl - config_name: rejected data_files: - split: train path: data/rejected.jsonl - config_name: sft_messages_all data_files: - split: train path: data/sft_messages_all.jsonl - config_name: sft_messages_kept data_files: - split: train path: data/sft_messages_kept.jsonl - config_name: sft_messages_high_quality data_files: - split: train path: data/sft_messages_high_quality.jsonl - config_name: sft_sharegpt_all data_files: - split: train path: data/sft_sharegpt_all.jsonl - config_name: sft_sharegpt_kept data_files: - split: train path: data/sft_sharegpt_kept.jsonl - config_name: sft_sharegpt_high_quality data_files: - split: train path: data/sft_sharegpt_high_quality.jsonl license: other task_categories: - text-generation - other tags: - agents - browser - code - synthetic - tool-use pretty_name: Carnice GLM-5 Hermes Traces size_categories: - 1K<n<10K --- # Carnice GLM-5 Hermes Traces This dataset is a merged release bundle of GLM-5 traces collected through the Hermes Agent harness. It was generated by running the `carnice_trace_prompt_bank_v4` prompt bank through Hermes Agent with: - `z-ai/glm-5` via OpenRouter - local/file/terminal/code-execution tools for local tasks - Hermes browser tools plus Tavily-backed `web_search` / `web_extract` for web tasks - isolated disposable workspaces per prompt This release is prepared for Hugging Face upload and includes sanitized merged files. Remote absolute paths from the collection host were rewritten to neutral placeholders like `WORKSPACE_ROOT/`. ## Included Splits - `raw_rows`: full exported trace rows after collection and sanitization - `kept`: rows that passed the base post-filter - `high_quality_kept`: stricter rows intended as the primary SFT set - `rejected`: failed or filtered rows for analysis / DPO / error mining - `sft_messages_*`: train-ready conversation rows - `sft_sharegpt_*`: same content, retained under the existing ShareGPT naming used in the collector ## Collection Summary - prompt bank total: `4033` - local rows collected: `1983` - web rows collected: `1567` - long-horizon partial rows collected: `82` - kept rows: `1780` - high-quality rows: `1627` ## Notes - `long_horizon` was intentionally stopped early because quality-per-dollar was weak. The partial rows are still included. - The `sft_messages_high_quality` config is the best default starting point for supervised fine-tuning. - The prompt bank itself is not included here; this is the collected trace dataset. See `manifest.json` for counts and token totals, and `examples.json` for representative rows.
提供机构:
ansulev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作