austindixson/agent-dataset-hybrid
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/austindixson/agent-dataset-hybrid
下载链接
链接失效反馈官方服务:
资源简介:
# Agent Training Dataset (Unsloth Format)
Hybrid dataset combining real Claude conversations with lightweight augmentations.
## Overview
- **Total conversations:** 4,747
- **Training examples:** 3,797 (train.jsonl)
- **Validation examples:** 475 (valid.jsonl)
- **Test examples:** 475 (test.jsonl)
## Format
ShareGPT format (compatible with Unsloth FastLanguageModel):
\`\`\`json
{
"conversations": [
{"from": "human", "value": "..."},
{"from": "gpt", "value": "..."}
]
}
\`\`\`
## Usage
### With Unsloth
\`\`\`python
from unsloth import FastLanguageModel
from datasets import load_dataset
# Load dataset
dataset = load_dataset(
"json",
data_files={
"train": "train.jsonl",
"valid": "valid.jsonl",
"test": "test.jsonl"
}
)
# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/llama-3-8b-bnb-4bit",
load_in_4bit=True
)
# Train...
\`\`\`
### On PC with RTX 3060
\`\`\`bash
git lfs install
git lfs pull
python train.py \
--data train.jsonl \
--valid_data valid.jsonl
\`\`\`
## Data Composition
- **Real data:** 3,165 Claude Code conversations
- **Augmented:** 1,582 variations (reverse, shorten, paraphrase)
- **Total turns:** 150,619
- **Avg turns/conversation:** 31.7
## License
Same as source Claude Code conversations.
提供机构:
austindixson



