txchmechanicus/qwen3.5-toolcalling-v2
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/txchmechanicus/qwen3.5-toolcalling-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
pretty_name: Qwen3.5 Tool Calling Dataset v2
size_categories:
- 10K<n<100K
task_categories:
- text-generation
tags:
- tool-use
- tool-calling
- function-calling
- reasoning
- agentic
- jupyter
- code-execution
- sft
- chat
- qwen3
- qwen3.5
- chain-of-thought
- multi-turn
- structured-output
- json
- fine-tuning
- open-source
- expanded-dataset
annotations_creators:
- machine-generated
language_creators:
- found
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# Qwen3.5 Tool Calling Dataset v2
An expanded tool-calling SFT dataset combining **smirki/Tool-Calling-Dataset-UIGEN-X** and **AmanPriyanshu/tool-reasoning-sft-jupyter-agent**, unified into Qwen3 messages format. Adds Jupyter notebook agent data with code execution reasoning chains.
## Dataset Summary
| Property | Value |
|----------|-------|
| **Total Samples** | ~60K+ |
| **Train Split** | ~55K |
| **Test Split** | ~6K |
| **Sources** | UIGEN-X + Jupyter Agent |
| **Format** | Qwen3 messages |
| **Language** | English |
| **License** | Apache 2.0 |
## v1 vs v2 Comparison
| Version | Samples | Agent Type | New Sources |
|---------|---------|------------|-------------|
| **v1** | 51,004 | General tool calling | smirki/Tool-Calling-Dataset-UIGEN-X |
| **v2** (this) | ~60K+ | + Code/Jupyter agent | + AmanPriyanshu/tool-reasoning-sft-jupyter-agent |
## What's New in v2?
- **Jupyter Agent**: Code execution with `add_and_execute_jupyter_code_cell` tool
- **Richer Reasoning**: Structured `reasoning → tool_call → tool_output → answer` chains
- **Data Science Tasks**: CSV analysis, visualization, statistical computation
- **Multi-step Execution**: Multiple code cells in sequence
## Dataset Structure
### Data Fields
| Field | Type | Description |
|-------|------|-------------|
| `messages` | `list[dict]` | Conversation turns with `role` and `content` |
### Role Types
| Role | Source | Description |
|------|--------|-------------|
| `system` | Both | Tool schema + assistant instructions |
| `user` | Both | User request or tool output |
| `assistant` | Both | `<think>` reasoning + tool call or final answer |
> Note: Original `reasoning`, `tool_call`, `tool_output` roles from Jupyter Agent source are normalized to `assistant` or `user`.
## Sources
| Dataset | Format | Samples | Key Capability |
|---------|--------|---------|----------------|
| [smirki/Tool-Calling-Dataset-UIGEN-X](https://huggingface.co/datasets/smirki/Tool-Calling-Dataset-UIGEN-X) | `conversations[from/value]` | 51,004 | General API/tool calls with `<think>` reasoning |
| [AmanPriyanshu/tool-reasoning-sft-jupyter-agent](https://huggingface.co/datasets/AmanPriyanshu/tool-reasoning-sft-jupyter-agent-dataset-sft-tool-use-agent-data-cleaned-rectified) | `messages[role/content]` | ~10K | Jupyter code execution agent |
## Format
### UIGEN-X Style: API Tool Calling
```json
{
"messages": [
{
"role": "system",
"content": "You are a deep thinking AI...\n<tools>\n{\"name\": \"search_web\", \"description\": \"Search the web\", ...}\n</tools>"
},
{
"role": "user",
"content": "Find the latest news about AI developments"
},
{
"role": "assistant",
"content": "<think>\nUser wants latest AI news. I'll use search_web.\n</think>\n<tool_call>\n{\"name\": \"search_web\", \"arguments\": {\"query\": \"latest AI developments 2025\"}}\n</tool_call>"
},
{
"role": "user",
"content": "<tool_response>\n[{\"title\": \"New LLM achieves SOTA\", \"url\": \"...\"}]\n</tool_response>"
},
{
"role": "assistant",
"content": "Here are the latest AI developments: New LLM achieves state-of-the-art performance..."
}
]
}
```
### Jupyter Agent Style: Code Execution
```json
{
"messages": [
{
"role": "system",
"content": "You are an intelligent data science assistant with access to a stateful Jupyter notebook. You can use the add_and_execute_jupyter_code_cell tool.\n\nYou have access to: winequality-red.csv"
},
{
"role": "user",
"content": "What is the average alcohol content in the dataset?"
},
{
"role": "assistant",
"content": "<think>\nI need to load the CSV and compute the mean of the alcohol column.\n</think>\n{\"name\": \"add_and_execute_jupyter_code_cell\", \"arguments\": {\"code\": \"import pandas as pd\\ndf = pd.read_csv('/home/user/input/winequality-red.csv')\\nprint(df['alcohol'].mean())\"}}"
},
{
"role": "assistant",
"content": "10.422983114446529"
},
{
"role": "assistant",
"content": "<answer>\nThe average alcohol content is approximately 10.42%.\n</answer>"
}
]
}
```
## Source Conversions
```python
# --- UIGEN-X (ShareGPT) ---
role_map = {"human": "user", "gpt": "assistant", "system": "system"}
# --- Jupyter Agent (native messages) ---
# reasoning -> assistant (merged with following tool_call)
# tool_call -> assistant
# tool_output -> user
# answer -> assistant
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("Mustafaege/qwen3.5-toolcalling-v2")
# Check both types of examples
for sample in dataset['train']:
msgs = sample['messages']
has_jupyter = any('jupyter_code_cell' in str(m['content']) for m in msgs)
if has_jupyter:
print("Jupyter agent example found!")
break
```
## Training with Unsloth
```python
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Qwen3-1.7B",
max_seq_length = 8192, # Longer for multi-step reasoning
load_in_4bit = True,
)
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset['train'],
args = SFTConfig(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 8,
max_seq_length = 8192,
),
)
trainer.train()
```
## Related Datasets
| Version | Samples | Link |
|---------|---------|------|
| **v1** | 51,004 | [Mustafaege/qwen3.5-toolcalling-v1](https://huggingface.co/datasets/Mustafaege/qwen3.5-toolcalling-v1) |
| **v2** (this) | ~60K+ | [Mustafaege/qwen3.5-toolcalling-v2](https://huggingface.co/datasets/Mustafaege/qwen3.5-toolcalling-v2) |
## License
Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.
---
Built for Qwen3.5 fine-tuning. Part of the [Mustafaege](https://huggingface.co/Mustafaege) model series.
提供机构:
txchmechanicus



