austindixson/agent-dataset-hybrid

Name: austindixson/agent-dataset-hybrid
Creator: austindixson
Published: 2026-04-08 18:49:23
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/austindixson/agent-dataset-hybrid

下载链接

链接失效反馈

官方服务：

资源简介：

# Agent Training Dataset (Unsloth Format) Hybrid dataset combining real Claude conversations with lightweight augmentations. ## Overview - **Total conversations:** 4,747 - **Training examples:** 3,797 (train.jsonl) - **Validation examples:** 475 (valid.jsonl) - **Test examples:** 475 (test.jsonl) ## Format ShareGPT format (compatible with Unsloth FastLanguageModel): \`\`\`json { "conversations": [ {"from": "human", "value": "..."}, {"from": "gpt", "value": "..."} ] } \`\`\` ## Usage ### With Unsloth \`\`\`python from unsloth import FastLanguageModel from datasets import load_dataset # Load dataset dataset = load_dataset( "json", data_files={ "train": "train.jsonl", "valid": "valid.jsonl", "test": "test.jsonl" } ) # Load model model, tokenizer = FastLanguageModel.from_pretrained( "unsloth/llama-3-8b-bnb-4bit", load_in_4bit=True ) # Train... \`\`\` ### On PC with RTX 3060 \`\`\`bash git lfs install git lfs pull python train.py \ --data train.jsonl \ --valid_data valid.jsonl \`\`\` ## Data Composition - **Real data:** 3,165 Claude Code conversations - **Augmented:** 1,582 variations (reverse, shorten, paraphrase) - **Total turns:** 150,619 - **Avg turns/conversation:** 31.7 ## License Same as source Claude Code conversations.

提供机构：

austindixson

5,000+

优质数据集

54 个

任务类型

进入经典数据集