five

austindixson/agent-dataset-hybrid

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/austindixson/agent-dataset-hybrid
下载链接
链接失效反馈
官方服务:
资源简介:
# Agent Training Dataset (Unsloth Format) Hybrid dataset combining real Claude conversations with lightweight augmentations. ## Overview - **Total conversations:** 4,747 - **Training examples:** 3,797 (train.jsonl) - **Validation examples:** 475 (valid.jsonl) - **Test examples:** 475 (test.jsonl) ## Format ShareGPT format (compatible with Unsloth FastLanguageModel): \`\`\`json { "conversations": [ {"from": "human", "value": "..."}, {"from": "gpt", "value": "..."} ] } \`\`\` ## Usage ### With Unsloth \`\`\`python from unsloth import FastLanguageModel from datasets import load_dataset # Load dataset dataset = load_dataset( "json", data_files={ "train": "train.jsonl", "valid": "valid.jsonl", "test": "test.jsonl" } ) # Load model model, tokenizer = FastLanguageModel.from_pretrained( "unsloth/llama-3-8b-bnb-4bit", load_in_4bit=True ) # Train... \`\`\` ### On PC with RTX 3060 \`\`\`bash git lfs install git lfs pull python train.py \ --data train.jsonl \ --valid_data valid.jsonl \`\`\` ## Data Composition - **Real data:** 3,165 Claude Code conversations - **Augmented:** 1,582 variations (reverse, shorten, paraphrase) - **Total turns:** 150,619 - **Avg turns/conversation:** 31.7 ## License Same as source Claude Code conversations.
提供机构:
austindixson
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作