Name: prem-research/Funcdex-MT-Function-Calling
Creator: prem-research
Published: 2025-11-15 10:56:26
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/prem-research/Funcdex-MT-Function-Calling

下载链接

链接失效反馈

官方服务：

资源简介：

--- library_name: datasets license: mit task_categories: - text-generation language: - en tags: - agent - Agentic Learning - tool use - function calling - multi-turn --- [![Funcdex-Collection](https://img.shields.io/badge/Hugging%20Face-Model-yellow?logo=huggingface)](https://huggingface.co/collections/prem-research/funcdex) [![Dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-yellow?logo=huggingface)](https://huggingface.co/datasets/prem-research/Funcdex-MT-Function-Calling) [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/prem-research/Funcdex-Synthesizer) [![PremAI](https://img.shields.io/badge/Project-PremAI-green)](https://www.premai.io/) # Funcdex-MT-Function-Calling Dataset <div align="center"> <img src="assets/funcdex_hero.png" alt="Funcdex Hero" width="70%"> </div> Funcdex-MT-Function-Calling is a multi-turn function calling dataset designed for training language models to interact with real-world tools and APIs. The dataset contains 1,787 conversations covering 10 individual toolkits and 5 multi-toolkit bundles, with comprehensive system prompts and realistic multi-turn interactions. The code used to generate the dataset can be found [here](https://github.com/prem-research/Funcdex-Synthesizer). Models trained on this dataset have excellent Performance/$. <div align="center"> <img src="assets/line_plot.png" alt="Line Plot" width="80%"> </div> Notes: - *Funcdex-0.6B is the average of performances of individual Funcdex-0.6B models.* - For cost, we track the number of prompt/completion tokens for evaluating 300 conversations. - e.g. If token cost is input=$1 and output=$10 per million tokens, and evaluation needed `0.5M` and `0.1M` input/output tokens, then cost is `1 * 0.5 + 0.1 * 10 = $1.5`. - *Qwen3-0.6B and Qwen3-1.7B evaluation costs are estimated by extrapolating from Llama3.2-3B serverless costs. Other model's costs are sourced from Openrouter.* # Dataset Overview ## Statistics The dataset is split into training and test sets: <table border="1" class="dataframe"> <thead> <tr style="text-align: center;"> <th>Split</th> <th>Conversations</th> <th>Avg Turns</th> <th>Avg Tool Calls</th> <th>Avg Tokens</th> <th>Total Tokens</th> </tr> </thead> <tbody> <tr style="text-align: center;"> <td><strong>Train</strong></td> <td>1,487</td> <td>32.45</td> <td>8.44</td> <td>5,357</td> <td>7,965,919</td> </tr> <tr style="text-align: center;"> <td><strong>Test</strong></td> <td>300</td> <td>34.56</td> <td>9.20</td> <td>5,449</td> <td>1,634,658</td> </tr> <tr style="text-align: center;"> <td><strong>Total</strong></td> <td>1,787</td> <td>32.81</td> <td>8.56</td> <td>5,372</td> <td>9,600,577</td> </tr> </tbody> </table> ### Per-Conversation Statistics - **Number of turns**: Average 32.81 (min: 4, max: 265, median: 28.00) - **Number of tool calls**: Average 8.56 (min: 0, max: 87, median: 7.00) - **Number of tokens**: Average 5,372 (min: 2,691, max: 19,851, median: 5,156) ### Per Function Call Statistics - **Number of arguments**: Average 3.14 (min: 0, max: 14, median: 3.00) - **Train set**: Average 3.15 arguments per call (12,543 total function calls) - **Test set**: Average 3.09 arguments per call (2,761 total function calls) #### Argument Count Distribution The distribution of arguments per function call shows that most tool calls use 2-4 arguments: <table border="1" class="dataframe"> <thead> <tr style="text-align: center;"> <th>Arguments</th> <th>Count</th> <th>Percentage</th> </tr> </thead> <tbody> <tr style="text-align: center;"> <td>2</td> <td>4,994</td> <td>32.63%</td> </tr> <tr style="text-align: center;"> <td>3</td> <td>2,822</td> <td>18.44%</td> </tr> <tr style="text-align: center;"> <td>4</td> <td>2,685</td> <td>17.54%</td> </tr> <tr style="text-align: center;"> <td>1</td> <td>1,869</td> <td>12.21%</td> </tr> <tr style="text-align: center;"> <td>5</td> <td>1,256</td> <td>8.21%</td> </tr> <tr style="text-align: center;"> <td>6</td> <td>819</td> <td>5.35%</td> </tr> <tr style="text-align: center;"> <td>7+</td> <td>859</td> <td>5.61%</td> </tr> </tbody> </table> ### Total Across All Conversations - **Total turns**: 58,624 - **Total tool calls**: 15,304 - **Total tokens**: 9,600,577 ## Dataset Structure Each example in the dataset follows this structure: ```python { "toolkit": "googledrive", # Toolkit identifier "system_prompt": "...", # Detailed system prompt for the conversation "tools": [...], # List of available tools (OpenAPI format) "messages": [ # Conversation messages { "role": "user", "content": "..." }, { "role": "assistant", "content": "...", "tool_calls": [ { "id": "call_0", "type": "function", "function": { "name": "CREATE_FILE_OR_FOLDER", "arguments": {...} } } ] }, { "role": "tool", "tool_call_id": "call_0", "content": "{...}" # JSON string with tool response } ] } ``` ## Supported Toolkits The dataset covers the following toolkits and bundles: ### Individual Toolkits - <img src="assets/icons/asana.png" width="20" height="20" style="vertical-align: middle;"/> **Asana** - <img src="assets/icons/calendly.png" width="20" height="20" style="vertical-align: middle;"/> **Calendly** - <img src="assets/icons/gmail.png" width="20" height="20" style="vertical-align: middle;"/> **Gmail** - <img src="assets/icons/google-calendar.png" width="20" height="20" style="vertical-align: middle;"/> **Google Calendar** - <img src="assets/icons/docs.png" width="20" height="20" style="vertical-align: middle;"/> **Google Docs** - <img src="assets/icons/google-drive.png" width="20" height="20" style="vertical-align: middle;"/> **Google Drive** - <img src="assets/icons/jira.png" width="20" height="20" style="vertical-align: middle;"/> **Jira** - <img src="assets/icons/stripe.png" width="20" height="20" style="vertical-align: middle;"/> **Stripe** - <img src="assets/icons/to-do-list.png" width="20" height="20" style="vertical-align: middle;"/> **Todoist** - <img src="assets/icons/whatsapp.png" width="20" height="20" style="vertical-align: middle;"/> **WhatsApp** ### Multi-Toolkit Bundles - **Gmail + Google Calendar** - **Google Drive + Google Docs** - **Jira + Gmail** - **Google Drive + Calendly + Google Calendar** - **WhatsApp + Todoist** ## Dataset Characteristics ### Multi-Turn Conversations The dataset emphasizes realistic multi-turn interactions where: - Conversations average 32+ turns, enabling complex workflows - Models learn to maintain context across multiple exchanges - Tool calls are interleaved with natural language responses - Users provide feedback and clarifications throughout ### Rich System Prompts Each conversation includes detailed system prompts that: - Define the assistant's role and behavior - Specify workflow patterns and best practices - Guide tool selection and usage - Encourage confirmation and error handling ### Real-World Tool Usage - Tool calls follow OpenAPI specifications - Arguments are properly formatted JSON objects - Tool responses are realistic API responses - Conversations demonstrate proper error handling and retry logic # Models Trained on This Dataset The following models have been trained using this dataset: ## Funcdex-1.7B [Funcdex-1.7B](https://huggingface.co/prem-research/Funcdex-1.7B) - General-purpose model trained on all toolkits, achieving 0.43 Exact Match and 0.81 String Ratio on the test set. ## Funcdex-0.6B Series (Specialized Models) Specialized LoRA models fine-tuned on individual toolkit subsets: - [Funcdex-0.6B-asana](https://huggingface.co/prem-research/Funcdex-0.6B-asana) - [Funcdex-0.6B-calendly](https://huggingface.co/prem-research/Funcdex-0.6B-calendly) - [Funcdex-0.6B-gmail](https://huggingface.co/prem-research/Funcdex-0.6B-gmail) - [Funcdex-0.6B-googlecalendar](https://huggingface.co/prem-research/Funcdex-0.6B-googlecalendar) - [Funcdex-0.6B-googledocs](https://huggingface.co/prem-research/Funcdex-0.6B-googledocs) - [Funcdex-0.6B-googledrive](https://huggingface.co/prem-research/Funcdex-0.6B-googledrive) - [Funcdex-0.6B-jira](https://huggingface.co/prem-research/Funcdex-0.6B-jira) - [Funcdex-0.6B-stripe](https://huggingface.co/prem-research/Funcdex-0.6B-stripe) - [Funcdex-0.6B-todoist](https://huggingface.co/prem-research/Funcdex-0.6B-todoist) - [Funcdex-0.6B-whatsapp](https://huggingface.co/prem-research/Funcdex-0.6B-whatsapp) ## Funcdex-0.6B Bundle Models Multi-toolkit bundle models: - [Funcdex-0.6B-gmail_googlecalendar](https://huggingface.co/prem-research/Funcdex-0.6B-gmail_googlecalendar) - [Funcdex-0.6B-googledrive_googledocs](https://huggingface.co/prem-research/Funcdex-0.6B-googledrive_googledocs) - [Funcdex-0.6B-jira_gmail](https://huggingface.co/prem-research/Funcdex-0.6B-jira_gmail) - [Funcdex-0.6B-googledrive_calendly_googlecalendar](https://huggingface.co/prem-research/Funcdex-0.6B-googledrive_calendly_googlecalendar) - [Funcdex-0.6B-whatsapp_todoist](https://huggingface.co/prem-research/Funcdex-0.6B-whatsapp_todoist) # Loading and Using the Dataset ## From HuggingFace Datasets ```python from datasets import load_dataset import json # Load the dataset dataset = load_dataset("prem-research/Funcdex-MT-Function-Calling", split="train") # Access an example example = dataset[0] print(f"Toolkit: {example['toolkit']}") print(f"Number of messages: {len(example['messages'])}") # Parse tools (stored as JSON string) tools = json.loads(example['tools']) print(f"Number of available tools: {len(tools)}") ``` ## Filtering by Toolkit ```python # Filter for a specific toolkit gmail_dataset = dataset.filter(lambda x: x["toolkit"] == "gmail") print(f"Gmail conversations: {len(gmail_dataset)}") # Filter for bundle bundle_dataset = dataset.filter(lambda x: x["toolkit"] == "gmail_googlecalendar") ``` # Preparing Data for Training ## Formatting Conversations for Tokenization To prepare conversations for training with Qwen-based models, format them using the chat template: ```python from transformers import AutoTokenizer import json tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") def format_conversation_for_tokenization(conversation: dict) -> str: """ Format a conversation with tools into the chat template format for tokenization. """ messages = conversation['messages'] system_prompt = conversation['system_prompt'] tools = conversation['tools'] # Parse tools if they're a JSON string if isinstance(tools, str): tools = json.loads(tools) # Build messages list with system prompt first formatted_messages = [{"role": "system", "content": system_prompt}] # Add the conversation messages for msg in messages: # Convert tool calls arguments to JSON strings for chat template if msg.get('role') == 'assistant' and 'tool_calls' in msg: formatted_msg = {"role": "assistant"} # Always add content field (even if empty) formatted_msg['content'] = msg.get('content', '') # Format tool calls with JSON string arguments tool_calls = [] for tc in msg['tool_calls']: tool_call = { "id": tc['id'], "type": tc.get('type', 'function'), "function": { "name": tc['function']['name'], "arguments": json.dumps(tc['function']['arguments']) if isinstance(tc['function'].get('arguments'), dict) else tc['function'].get('arguments', '{}') } } tool_calls.append(tool_call) formatted_msg['tool_calls'] = tool_calls formatted_messages.append(formatted_msg) else: formatted_messages.append(msg) # Apply chat template with tools text = tokenizer.apply_chat_template( formatted_messages, tools=tools, tokenize=False, add_generation_prompt=False ) return text # Format a conversation formatted_text = format_conversation_for_tokenization(example) num_tokens = len(tokenizer.encode(formatted_text)) print(f"Formatted conversation has {num_tokens} tokens") ``` ## Creating Training Datasets For training with HuggingFace Transformers: ```python from datasets import Dataset def format_conversation_with_tools(conversation: dict, tokenizer) -> str: """Format conversation for training.""" messages = conversation['messages'] system_prompt = conversation['system_prompt'] tools = json.loads(conversation['tools']) if isinstance(conversation['tools'], str) else conversation['tools'] # Build formatted messages formatted_messages = [{"role": "system", "content": system_prompt}] for msg in messages: if msg.get('role') == 'assistant' and 'tool_calls' in msg: formatted_msg = { "role": "assistant", "content": msg.get('content', ''), "tool_calls": [ { "id": tc['id'], "type": tc.get('type', 'function'), "function": { "name": tc['function']['name'], "arguments": json.dumps(tc['function']['arguments']) if isinstance(tc['function'].get('arguments'), dict) else tc['function'].get('arguments', '{}') } } for tc in msg['tool_calls'] ] } formatted_messages.append(formatted_msg) else: formatted_messages.append(msg) # Apply chat template text = tokenizer.apply_chat_template( formatted_messages, tools=tools, tokenize=False, add_generation_prompt=False ) return text # Create training dataset train_texts = [format_conversation_with_tools(conv, tokenizer) for conv in train_data] train_dataset = Dataset.from_dict({"text": train_texts}) ``` - To train toolkit-specific models, you can filter using the `toolkit` key. # Limitations and Gaps - No Sequential Tool Use - No Parallel Tool Use - The assistant is trained to ask for clarification questions and explicit confirmation before generating tool calls. # Dataset Generation The dataset was generated using the [Funcdex-Synthesizer](https://github.com/prem-research/Funcdex-Synthesizer) codebase. For more details on dataset generation, see the [GitHub repository](https://github.com/prem-research/Funcdex-Synthesizer). # License The dataset is licensed under MIT License.

应用场景：