ansulev/carnice-glm5-hermes-traces
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ansulev/carnice-glm5-hermes-traces
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: raw_rows
data_files:
- split: train
path: data/raw_rows.jsonl
- config_name: kept
data_files:
- split: train
path: data/kept.jsonl
- config_name: high_quality_kept
data_files:
- split: train
path: data/high_quality_kept.jsonl
- config_name: rejected
data_files:
- split: train
path: data/rejected.jsonl
- config_name: sft_messages_all
data_files:
- split: train
path: data/sft_messages_all.jsonl
- config_name: sft_messages_kept
data_files:
- split: train
path: data/sft_messages_kept.jsonl
- config_name: sft_messages_high_quality
data_files:
- split: train
path: data/sft_messages_high_quality.jsonl
- config_name: sft_sharegpt_all
data_files:
- split: train
path: data/sft_sharegpt_all.jsonl
- config_name: sft_sharegpt_kept
data_files:
- split: train
path: data/sft_sharegpt_kept.jsonl
- config_name: sft_sharegpt_high_quality
data_files:
- split: train
path: data/sft_sharegpt_high_quality.jsonl
license: other
task_categories:
- text-generation
- other
tags:
- agents
- browser
- code
- synthetic
- tool-use
pretty_name: Carnice GLM-5 Hermes Traces
size_categories:
- 1K<n<10K
---
# Carnice GLM-5 Hermes Traces
This dataset is a merged release bundle of GLM-5 traces collected through the Hermes Agent harness.
It was generated by running the `carnice_trace_prompt_bank_v4` prompt bank through Hermes Agent with:
- `z-ai/glm-5` via OpenRouter
- local/file/terminal/code-execution tools for local tasks
- Hermes browser tools plus Tavily-backed `web_search` / `web_extract` for web tasks
- isolated disposable workspaces per prompt
This release is prepared for Hugging Face upload and includes sanitized merged files. Remote absolute paths from the collection host were rewritten to neutral placeholders like `WORKSPACE_ROOT/`.
## Included Splits
- `raw_rows`: full exported trace rows after collection and sanitization
- `kept`: rows that passed the base post-filter
- `high_quality_kept`: stricter rows intended as the primary SFT set
- `rejected`: failed or filtered rows for analysis / DPO / error mining
- `sft_messages_*`: train-ready conversation rows
- `sft_sharegpt_*`: same content, retained under the existing ShareGPT naming used in the collector
## Collection Summary
- prompt bank total: `4033`
- local rows collected: `1983`
- web rows collected: `1567`
- long-horizon partial rows collected: `82`
- kept rows: `1780`
- high-quality rows: `1627`
## Notes
- `long_horizon` was intentionally stopped early because quality-per-dollar was weak. The partial rows are still included.
- The `sft_messages_high_quality` config is the best default starting point for supervised fine-tuning.
- The prompt bank itself is not included here; this is the collected trace dataset.
See `manifest.json` for counts and token totals, and `examples.json` for representative rows.
提供机构:
ansulev



