five

Mustafaege/qwen3.5-functioncalling-v1

收藏
Hugging Face2026-03-07 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Mustafaege/qwen3.5-functioncalling-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 pretty_name: Qwen3.5 Function Calling Dataset v1 size_categories: - 100K<n<1M task_categories: - text-generation tags: - function-calling - tool-use - sft - chat - qwen3 - qwen3.5 - instruction-following - structured-output - json - fine-tuning - open-source annotations_creators: - machine-generated language_creators: - found configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # Qwen3.5 Function Calling Dataset v1 A curated function-calling SFT dataset built from **glaiveai/glaive-function-calling-v2**, converted and standardized into Qwen3 messages format for fine-tuning Qwen3.5 series models. ## Dataset Summary | Property | Value | |----------|-------| | **Total Samples** | 112,960 | | **Train Split** | 101,664 | | **Test Split** | 11,296 | | **Source** | glaiveai/glaive-function-calling-v2 | | **Format** | Qwen3 messages | | **Language** | English | | **License** | Apache 2.0 | ## What is Function Calling? Function calling allows language models to produce structured JSON outputs to invoke external APIs, tools, or services. A model learns to: - Parse a user request and identify the appropriate function - Extract the correct arguments from the conversation - Return a valid JSON function call ## Dataset Structure ### Data Fields | Field | Type | Description | |-------|------|-------------| | `messages` | `list[dict]` | Conversation turns with `role` and `content` | ### Role Types | Role | Description | |------|-------------| | `system` | Function definitions and assistant instructions | | `user` | User requests triggering function calls | | `assistant` | Model responses with function call JSON or natural language | ## Format All samples are standardized to Qwen3's native messages format: ```json { "messages": [ { "role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required -\n{\n \"name\": \"get_exchange_rate\",\n \"description\": \"Get the exchange rate between two currencies\",\n \"parameters\": {\n \"type\": \"object\",\n \"properties\": {\n \"base_currency\": {\"type\": \"string\"},\n \"target_currency\": {\"type\": \"string\"}\n },\n \"required\": [\"base_currency\", \"target_currency\"]\n }\n}" }, { "role": "user", "content": "Can you tell me the current exchange rate from US dollars to Euros?" }, { "role": "assistant", "content": "<functioncall> {\"name\": \"get_exchange_rate\", \"arguments\": {\"base_currency\": \"USD\", \"target_currency\": \"EUR\"}}" }, { "role": "user", "content": "FUNCTION RESPONSE: {\"exchange_rate\": 0.85}" }, { "role": "assistant", "content": "The current exchange rate from US dollars to Euros is 0.85. This means 1 USD equals 0.85 EUR." } ] } ``` ## Source Conversion | Source Dataset | Original Format | Conversion | |----------------|----------------|------------| | [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | `system` + `chat` (USER:/ASSISTANT: pattern) | Split on role markers → messages list | ### Conversion Logic ```python # system field -> {"role": "system", "content": ...} # chat field: # "USER: ..." -> {"role": "user", "content": ...} # "ASSISTANT: ..." -> {"role": "assistant", "content": ...} ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("Mustafaege/qwen3.5-functioncalling-v1") print(dataset) # DatasetDict({ # train: Dataset({features: ['messages'], num_rows: 101664}), # test: Dataset({features: ['messages'], num_rows: 11296}) # }) # Iterate samples for example in dataset['train']: messages = example['messages'] system_prompt = messages[0]['content'] user_msg = messages[1]['content'] print(f"Functions defined: {len(system_prompt.split('name'))-1}") break ``` ## Training with Unsloth ```python from unsloth import FastLanguageModel from unsloth.chat_templates import get_chat_template model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-0.6B", max_seq_length = 2048, load_in_4bit = True, ) # Apply chat template tokenizer = get_chat_template(tokenizer, chat_template="qwen-3") from trl import SFTTrainer trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset['train'], dataset_text_field = "messages", ) ``` ## Related Datasets | Version | Samples | New Sources | Link | |---------|---------|-------------|------| | **v1** (this) | 112,960 | glaive-function-calling-v2 | [Mustafaege/qwen3.5-functioncalling-v1](https://huggingface.co/datasets/Mustafaege/qwen3.5-functioncalling-v1) | | **v2** | ~225K | + alpaca_function_calling | [Mustafaege/qwen3.5-functioncalling-v2](https://huggingface.co/datasets/Mustafaege/qwen3.5-functioncalling-v2) | ## Citation If you use this dataset, please cite the original source: ``` @dataset{glaive-function-calling-v2, author = {GlaiveAI}, title = {Glaive Function Calling v2}, year = {2024}, url = {https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2} } ``` ## License Apache 2.0 — see [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details. --- Built for Qwen3.5 fine-tuning. Part of the [Mustafaege](https://huggingface.co/Mustafaege) model series.
提供机构:
Mustafaege
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作