five

txchmechanicus/qwen3.5-toolcalling-v2

收藏
Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/txchmechanicus/qwen3.5-toolcalling-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 pretty_name: Qwen3.5 Tool Calling Dataset v2 size_categories: - 10K<n<100K task_categories: - text-generation tags: - tool-use - tool-calling - function-calling - reasoning - agentic - jupyter - code-execution - sft - chat - qwen3 - qwen3.5 - chain-of-thought - multi-turn - structured-output - json - fine-tuning - open-source - expanded-dataset annotations_creators: - machine-generated language_creators: - found configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # Qwen3.5 Tool Calling Dataset v2 An expanded tool-calling SFT dataset combining **smirki/Tool-Calling-Dataset-UIGEN-X** and **AmanPriyanshu/tool-reasoning-sft-jupyter-agent**, unified into Qwen3 messages format. Adds Jupyter notebook agent data with code execution reasoning chains. ## Dataset Summary | Property | Value | |----------|-------| | **Total Samples** | ~60K+ | | **Train Split** | ~55K | | **Test Split** | ~6K | | **Sources** | UIGEN-X + Jupyter Agent | | **Format** | Qwen3 messages | | **Language** | English | | **License** | Apache 2.0 | ## v1 vs v2 Comparison | Version | Samples | Agent Type | New Sources | |---------|---------|------------|-------------| | **v1** | 51,004 | General tool calling | smirki/Tool-Calling-Dataset-UIGEN-X | | **v2** (this) | ~60K+ | + Code/Jupyter agent | + AmanPriyanshu/tool-reasoning-sft-jupyter-agent | ## What's New in v2? - **Jupyter Agent**: Code execution with `add_and_execute_jupyter_code_cell` tool - **Richer Reasoning**: Structured `reasoning → tool_call → tool_output → answer` chains - **Data Science Tasks**: CSV analysis, visualization, statistical computation - **Multi-step Execution**: Multiple code cells in sequence ## Dataset Structure ### Data Fields | Field | Type | Description | |-------|------|-------------| | `messages` | `list[dict]` | Conversation turns with `role` and `content` | ### Role Types | Role | Source | Description | |------|--------|-------------| | `system` | Both | Tool schema + assistant instructions | | `user` | Both | User request or tool output | | `assistant` | Both | `<think>` reasoning + tool call or final answer | > Note: Original `reasoning`, `tool_call`, `tool_output` roles from Jupyter Agent source are normalized to `assistant` or `user`. ## Sources | Dataset | Format | Samples | Key Capability | |---------|--------|---------|----------------| | [smirki/Tool-Calling-Dataset-UIGEN-X](https://huggingface.co/datasets/smirki/Tool-Calling-Dataset-UIGEN-X) | `conversations[from/value]` | 51,004 | General API/tool calls with `<think>` reasoning | | [AmanPriyanshu/tool-reasoning-sft-jupyter-agent](https://huggingface.co/datasets/AmanPriyanshu/tool-reasoning-sft-jupyter-agent-dataset-sft-tool-use-agent-data-cleaned-rectified) | `messages[role/content]` | ~10K | Jupyter code execution agent | ## Format ### UIGEN-X Style: API Tool Calling ```json { "messages": [ { "role": "system", "content": "You are a deep thinking AI...\n<tools>\n{\"name\": \"search_web\", \"description\": \"Search the web\", ...}\n</tools>" }, { "role": "user", "content": "Find the latest news about AI developments" }, { "role": "assistant", "content": "<think>\nUser wants latest AI news. I'll use search_web.\n</think>\n<tool_call>\n{\"name\": \"search_web\", \"arguments\": {\"query\": \"latest AI developments 2025\"}}\n</tool_call>" }, { "role": "user", "content": "<tool_response>\n[{\"title\": \"New LLM achieves SOTA\", \"url\": \"...\"}]\n</tool_response>" }, { "role": "assistant", "content": "Here are the latest AI developments: New LLM achieves state-of-the-art performance..." } ] } ``` ### Jupyter Agent Style: Code Execution ```json { "messages": [ { "role": "system", "content": "You are an intelligent data science assistant with access to a stateful Jupyter notebook. You can use the add_and_execute_jupyter_code_cell tool.\n\nYou have access to: winequality-red.csv" }, { "role": "user", "content": "What is the average alcohol content in the dataset?" }, { "role": "assistant", "content": "<think>\nI need to load the CSV and compute the mean of the alcohol column.\n</think>\n{\"name\": \"add_and_execute_jupyter_code_cell\", \"arguments\": {\"code\": \"import pandas as pd\\ndf = pd.read_csv('/home/user/input/winequality-red.csv')\\nprint(df['alcohol'].mean())\"}}" }, { "role": "assistant", "content": "10.422983114446529" }, { "role": "assistant", "content": "<answer>\nThe average alcohol content is approximately 10.42%.\n</answer>" } ] } ``` ## Source Conversions ```python # --- UIGEN-X (ShareGPT) --- role_map = {"human": "user", "gpt": "assistant", "system": "system"} # --- Jupyter Agent (native messages) --- # reasoning -> assistant (merged with following tool_call) # tool_call -> assistant # tool_output -> user # answer -> assistant ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("Mustafaege/qwen3.5-toolcalling-v2") # Check both types of examples for sample in dataset['train']: msgs = sample['messages'] has_jupyter = any('jupyter_code_cell' in str(m['content']) for m in msgs) if has_jupyter: print("Jupyter agent example found!") break ``` ## Training with Unsloth ```python from unsloth import FastLanguageModel from trl import SFTTrainer, SFTConfig model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-1.7B", max_seq_length = 8192, # Longer for multi-step reasoning load_in_4bit = True, ) trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset['train'], args = SFTConfig( per_device_train_batch_size = 2, gradient_accumulation_steps = 8, max_seq_length = 8192, ), ) trainer.train() ``` ## Related Datasets | Version | Samples | Link | |---------|---------|------| | **v1** | 51,004 | [Mustafaege/qwen3.5-toolcalling-v1](https://huggingface.co/datasets/Mustafaege/qwen3.5-toolcalling-v1) | | **v2** (this) | ~60K+ | [Mustafaege/qwen3.5-toolcalling-v2](https://huggingface.co/datasets/Mustafaege/qwen3.5-toolcalling-v2) | ## License Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details. --- Built for Qwen3.5 fine-tuning. Part of the [Mustafaege](https://huggingface.co/Mustafaege) model series.
提供机构:
txchmechanicus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作