function_call_competition
收藏魔搭社区2026-05-15 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/swift/function_call_competition
下载链接
链接失效反馈官方服务:
资源简介:
## 数据集
数据集地址:https://modelscope.cn/datasets/swift/function_call_competition
数据集下载:
```python
from modelscope import MsDataset
dataset = MsDataset.load('swift/function_call_competition', split='train')
test_dataset = MsDataset.load('swift/function_call_competition', split='test')
print(dataset)
print(test_dataset)
"""
Dataset({
features: ['tools', 'messages'],
num_rows: 50000
})
Dataset({
features: ['tools', 'messages'],
num_rows: 5000
})
"""
# 打印数据集
print(dataset[0])
"""
{'tools': '[{"name": "get_gifs_by_id", "description": "Fetches multiple GIF details from Giphy by their IDs.", "parameters": {"ids": {"description": "A comma-separated string of GIF IDs.", "type": "str", "default": "feqkVgjJpYtjy,7rzbxdu0ZEXLy"}}}, {"name": "shows_id_episodes", "description": "Retrieve a list of episodes for a specific show from the given platform.", "parameters": {"platform": {"description": "The platform from which to retrieve the episodes (e.g., \'ios\', \'android\', \'androidtv\', \'web\').", "type": "str", "default": "ios"}, "is_id": {"description": "The ID of the show.", "type": "int", "default": "82856"}, "offset": {"description": "The number of records to skip in the results. Default is 0.", "type": "int, optional", "default": "0"}, "region": {"description": "The region to filter the episodes by (e.g., \'US\', \'FR\'). Default is \'US\'.", "type": "str, optional", "default": "US"}, "limit": {"description": "The number of records to return per request, with a maximum of 100. Default is 25.", "type": "int, optional", "default": "25"}, "sort": {"description": "The order to sort the records by. Use \'regular\' for default order or \'reverse\' for most recent episodes first. Default is \'regular\'.", "type": "str, optional", "default": "regular"}, "season": {"description": "The specific season for which to retrieve episodes. If not specified, episodes from all seasons will be retrieved. Default is 1.", "type": "int, optional", "default": "1"}}}, {"name": "get_all_details", "description": "Fetches all details of a YouTube video, stream, or shorts including captions, audio-only files, video-only files, and stream URL if available.", "parameters": {"is_id": {"description": "The unique ID of the YouTube video/stream/shorts.", "type": "str", "default": "ppCIVJE3my4"}}}, {"name": "sticker_roulette", "description": "Fetches a randomly selected sticker from Giphy\'s sticker collection. Optionally, it allows the selection to be limited to a specific tag. The function interacts with the Giphy API to retrieve the sticker.", "parameters": {"tag": {"description": "The tag to limit the scope of the sticker search. Should be URL encoded and can include phrases separated by hyphens.", "type": "str", "default": "oops"}}}, {"name": "get_channel_leaderboards", "description": "Fetches the leaderboards for a specified Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel to get leaderboards for. Defaults to \'shadoune666\'.", "type": "str", "default": "shadoune666"}}}, {"name": "get_post_data_download_video_photo", "description": "Fetches detailed data for a given post, including a download link for any videos associated with the post.", "parameters": {"post_id": {"description": "The unique identifier of the post. Defaults to \'adPXX3Q\'.", "type": "str", "default": "adPXX3Q"}}}, {"name": "get_user_id", "description": "Fetches the user ID for a given Twitch channel using the Twitch API.", "parameters": {"channel": {"description": "The Twitch channel name for which to fetch the user ID.", "type": "str", "default": "xqc"}}}, {"name": "get_channel_points_context", "description": "Fetches the channel points context for a given Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel for which to fetch the channel points context.", "type": "str", "default": "xqc"}}}]', 'messages': [{'role': 'user', 'content': "I need to fetch the leaderboards for the Twitch channel 'ninja'. Could you provide the Python code using the 'get_channel_leaderboards' function?"}, {'role': 'tool_call', 'content': '{"name": "get_channel_leaderboards", "arguments": {"channel": "ninja"}}'}]}
"""
print(test_dataset[0])
"""
{'tools': '[{"name": "geogrid_seach_with_ranking", "description": "Perform a full grid search and retrieve the ranking of a business at every coordinate point in the grid. The grid cells in the results are ordered left-to-right, then top-to-bottom. Additional ranking data for the business is provided based on either place ID or business name.", "parameters": {"match_value": {"description": "The search query or keyword.", "type": "str", "default": "ChIJoejvAr3Mj4ARtHrbKxtAHXI"}, "query": {"description": "The Google Place ID of the business or the business name to match in results. Use the `match_type` parameter to specify the choice.", "type": "str", "default": "web design"}, "lng": {"description": "Longitude value of the grid center coordinate point.", "type": "int", "default": "-121.938314"}, "lat": {"description": "Latitude value of the grid center coordinate point.", "type": "int", "default": "37.341759"}, "zoom": {"description": "Google Maps zoom level to use for searching each grid point. Default is 13.", "type": "int, optional", "default": "13"}, "match_type": {"description": "The type of match to perform for ranking. Either \'place_id\' or \'name\'. Default is \'place_id\'.", "type": "str, optional", "default": "place_id"}, "distance_unit": {"description": "The unit of measurement for distance. Default is \'km\'.", "type": "str, optional", "default": "km"}, "width": {"description": "The width of the grid in location points for non-square grid searches. Default is 5.", "type": "int, optional", "default": "5"}, "height": {"description": "The height of the grid in location points for non-square grid searches. Default is 5.", "type": "int, optional", "default": "5"}, "grid_size": {"description": "The size of the grid (e.g., 3x3, 5x5). Default is 5.", "type": "int, optional", "default": "5"}, "distance": {"description": "The distance between coordinate points on the same row/column in the grid. Default is 1.", "type": "int, optional", "default": "1"}}}]', 'messages': [{'role': 'user', 'content': "Find the ranking of a coffee shop named 'The Daily Grind' in New York City, centered at longitude -74 and latitude 40, using a 3x3 grid with a zoom level of 15."}]}
"""
```
该数据集包含 50,000 条训练样本 和 5,000 条测试样本。在训练集中,每条样本均包含真实标签 `{"role": "tool_call", "content": "xxx"}`,其中可能包含多条 `"tool_call"`,用于表示并行的工具调用(评分时不考虑调用顺序)。测试集中则不包含 `"tool_call"` 标签,模型需要根据测试集中提供的 `"tools"` 和 `"messages"` 字段,推断出需要调用的工具名称及其所需传入的参数。
## 基线
以下介绍如何使用**ms-swift**大模型训练框架,基于该数据集对**Qwen3-8B**进行**LoRA微调**的基线实现:
- ms-swift: https://github.com/modelscope/ms-swift
- Qwen3-8B: https://modelscope.cn/models/Qwen/Qwen3-8B
- 显存需求:该基线实现所需显存资源为**22GiB**,可在魔搭平台的免费算力A10上运行:https://modelscope.cn/my/mynotebook
- 性能指标:该基线实现的评测分数为**0.9259**。
在开始微调之前,请确保您的环境已正确配置:
```bash
pip install liger_kernel transformers ms-swift -U
```
首先,我们观察,在训练过程中,ms-swift如何对数据样本进行格式转换,以及对哪些部分进行损失计算:
```python
from modelscope import MsDataset
from swift.llm import get_model_tokenizer, get_template
dataset = MsDataset.load('swift/function_call_competition', split='train')
_, tokenizer = get_model_tokenizer('Qwen/Qwen3-8B', load_model=False)
template = get_template(tokenizer.model_meta.template, tokenizer, agent_template='hermes')
data = dataset[0]
template.set_mode('train')
encoded = template.encode(data)
print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n')
print(f'[LABELS] {template.safe_decode(encoded["labels"])}')
"""
[INPUT_IDS] <|im_start|>system
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "get_gifs_by_id", "description": "Fetches multiple GIF details from Giphy by their IDs.", "parameters": {"ids": {"description": "A comma-separated string of GIF IDs.", "type": "str", "default": "feqkVgjJpYtjy,7rzbxdu0ZEXLy"}}}}
{"type": "function", "function": {"name": "shows_id_episodes", "description": "Retrieve a list of episodes for a specific show from the given platform.", "parameters": {"platform": {"description": "The platform from which to retrieve the episodes (e.g., 'ios', 'android', 'androidtv', 'web').", "type": "str", "default": "ios"}, "is_id": {"description": "The ID of the show.", "type": "int", "default": "82856"}, "offset": {"description": "The number of records to skip in the results. Default is 0.", "type": "int, optional", "default": "0"}, "region": {"description": "The region to filter the episodes by (e.g., 'US', 'FR'). Default is 'US'.", "type": "str, optional", "default": "US"}, "limit": {"description": "The number of records to return per request, with a maximum of 100. Default is 25.", "type": "int, optional", "default": "25"}, "sort": {"description": "The order to sort the records by. Use 'regular' for default order or 'reverse' for most recent episodes first. Default is 'regular'.", "type": "str, optional", "default": "regular"}, "season": {"description": "The specific season for which to retrieve episodes. If not specified, episodes from all seasons will be retrieved. Default is 1.", "type": "int, optional", "default": "1"}}}}
{"type": "function", "function": {"name": "get_all_details", "description": "Fetches all details of a YouTube video, stream, or shorts including captions, audio-only files, video-only files, and stream URL if available.", "parameters": {"is_id": {"description": "The unique ID of the YouTube video/stream/shorts.", "type": "str", "default": "ppCIVJE3my4"}}}}
{"type": "function", "function": {"name": "sticker_roulette", "description": "Fetches a randomly selected sticker from Giphy's sticker collection. Optionally, it allows the selection to be limited to a specific tag. The function interacts with the Giphy API to retrieve the sticker.", "parameters": {"tag": {"description": "The tag to limit the scope of the sticker search. Should be URL encoded and can include phrases separated by hyphens.", "type": "str", "default": "oops"}}}}
{"type": "function", "function": {"name": "get_channel_leaderboards", "description": "Fetches the leaderboards for a specified Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel to get leaderboards for. Defaults to 'shadoune666'.", "type": "str", "default": "shadoune666"}}}}
{"type": "function", "function": {"name": "get_post_data_download_video_photo", "description": "Fetches detailed data for a given post, including a download link for any videos associated with the post.", "parameters": {"post_id": {"description": "The unique identifier of the post. Defaults to 'adPXX3Q'.", "type": "str", "default": "adPXX3Q"}}}}
{"type": "function", "function": {"name": "get_user_id", "description": "Fetches the user ID for a given Twitch channel using the Twitch API.", "parameters": {"channel": {"description": "The Twitch channel name for which to fetch the user ID.", "type": "str", "default": "xqc"}}}}
{"type": "function", "function": {"name": "get_channel_points_context", "description": "Fetches the channel points context for a given Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel for which to fetch the channel points context.", "type": "str", "default": "xqc"}}}}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
I need to fetch the leaderboards for the Twitch channel 'ninja'. Could you provide the Python code using the 'get_channel_leaderboards' function?<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "get_channel_leaderboards", "arguments": {"channel": "ninja"}}
</tool_call><|im_end|>
[LABELS] [-100 * 1082]<tool_call>
{"name": "get_channel_leaderboards", "arguments": {"channel": "ninja"}}
</tool_call><|im_end|>
"""
```
### 训练
单卡训练的代码如下。如果要进行多卡训练,你可以使用命令行方式启动训练。
- 多卡训练参考:[例子](https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu)
- 注册数据集参考:[文档](https://swift.readthedocs.io/zh-cn/latest/Customization/%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86.html),[例子](https://github.com/modelscope/ms-swift/tree/main/examples/custom)
```python
# 22GiB
import os
from typing import Dict, Any
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
TrainArguments, sft_main, register_dataset, DatasetMeta, ResponsePreprocessor, SubsetDataset
)
register_dataset(
DatasetMeta(
ms_dataset_id='swift/function_call_competition',
subsets=[SubsetDataset('train', split=['train']), SubsetDataset('test', split=['test'])]
))
if __name__ == '__main__':
sft_main(TrainArguments(
model='Qwen/Qwen3-8B',
dataset=['swift/function_call_competition:train'],
agent_template='hermes',
loss_scale='hermes',
train_type='lora',
torch_dtype='bfloat16',
num_train_epochs=2,
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
learning_rate=1e-4,
lora_rank=8,
lora_alpha=32,
target_modules=['all-linear'],
gradient_accumulation_steps=16,
eval_steps=50,
save_steps=50,
save_total_limit=2,
logging_steps=5,
max_length=2048,
output_dir='output',
warmup_ratio=0.05,
dataset_num_proc=4,
dataloader_num_workers=4,
use_liger_kernel=True,
attn_impl='flash_attn',
packing=True,
save_only_model=True,
acc_strategy='seq',
))
```
### 推理
你需要安装`vllm>=0.8.5`进行推理加速。使用训练后的模型对测试集中的样本进行推理,产生`result.jsonl`文件:
```python
import os
from typing import Dict, Any
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import InferArguments, infer_main, register_dataset, DatasetMeta, SubsetDataset
register_dataset(
DatasetMeta(
ms_dataset_id='swift/function_call_competition',
subsets=[SubsetDataset('train', split=['train']), SubsetDataset('test', split=['test'])]
))
ckpt_dir = 'output/vx-xxx/checkpoint-xxx' # last_checkpoint
result = infer_main(InferArguments(
adapters=[ckpt_dir],
temperature=0,
val_dataset="swift/function_call_competition:test",
infer_backend='vllm',
vllm_max_lora_rank=8,
result_path='result.jsonl',
max_model_len=4096,
gpu_memory_utilization=0.8,
max_new_tokens=512))
```
最终提交的文件需要组织成固定格式,并命名为`result.json`。以下为对`result.jsonl`中的内容进行格式转换,转成提交评分系统格式的脚本:
```python
import os
import json
from datasets import Dataset, load_dataset
from tqdm import tqdm
import re
dataset = load_dataset('json', data_files='result.jsonl', split='train')
res = []
for data in tqdm(dataset):
content = data['messages'][-1]['content']
functions = re.findall(r'<tool_call>(.+?)</tool_call>', content, re.DOTALL)
toolcall = []
for function in functions:
try:
function = json.loads(function)
except Exception:
continue
toolcall.append(function)
toolcall = json.dumps(toolcall, ensure_ascii=False)
res.append({'toolcall': toolcall})
new_dataset = Dataset.from_list(res)
new_dataset.to_json('toolcall.jsonl')
os.rename('toolcall.jsonl', 'result.json')
```
其中`result.json`的前三列内容为:(仅供参考)
```jsonl
{"toolcall":"[{\"name\": \"geogrid_seach_with_ranking\", \"arguments\": {\"match_value\": \"The Daily Grind\", \"query\": \"The Daily Grind\", \"lng\": -74, \"lat\": 40, \"zoom\": 15, \"distance_unit\": \"km\", \"grid_size\": 3, \"distance\": 1}}]"}
{"toolcall":"[{\"name\": \"fish_api_fish_name\", \"arguments\": {\"name\": \"Bluefin Tuna\"}}]"}
{"toolcall":"[{\"name\": \"kunyomi_reading\", \"arguments\": {\"kun\": \"\u307f\u305a\"}}, {\"name\": \"downloadscreenshot\", \"arguments\": {\"is_id\": 67890}}]"}
```
- 注意:该格式为jsonl,由于比赛界面只能传递json后缀文件,因此将后缀设置为json。
- `result.json`总共5000行,请确保顺序与`test.jsonl`顺序一致。
- 其中"toolcall"字段需要是一个JSON字符串,其中包含字典的列表。每一个字典为一个工具调用,需要包含工具名"name"和传入工具的参数"arguments"。
然后将`result.json`提交系统即可产生评分。评分系统的给分设计请参考"评测"章节。
## 评测
该竞赛只评测function call的工具调用准确性,而不对模型的通用能力进行考察。即模型只需要从"tools"给出的工具中选择正确的工具并传入正确参数即算回答正确。
评分系统会对每条数据样本进行打分,最高为1分。总分5000分。最终对5000分进行归一化,即除以5000,得到最终的分数(0-1)。
1. 该条toolcall可以被正确解析,即为json字符串,每个工具调用包含name和arguments,且包含至少一条工具调用,得0.1分。
2. 满足1且工具调用数量正确,且选择的工具名全部正确,得0.4分。
3. 满足2且传入工具的参数值全部正确,得1分。
评测伪代码如下:
```python
# result.json -> result.jsonl
dataset = load_dataset('json', data_files='result.jsonl', split='train')
labels, tools_list = read_labels()
res = []
for data, label_data, tools in tqdm(zip(dataset, labels, tools_list)):
try:
parsed_data = [{'name': toolcall['name'], 'arguments': toolcall['arguments']} for toolcall in json.loads(data['toolcall'])]
except Exception:
res.append(0)
continue
names = [toolcall['name'] for toolcall in parsed_data]
label_names = [toolcall['name'] for toolcall in label_data]
counter = Counter(names)
label_counter = Counter(label_names)
if label_counter != counter:
res.append(0.1)
continue
try:
arguments_equal = is_arguments_equal(parsed_data, label_data, tools)
except Exception:
arguments_equal = False
if not arguments_equal:
res.append(0.4)
continue
res.append(1)
print(f'score: {sum(res) / len(res)}') # score: 0.9259
```
## 交流群
<img src="https://modelscope.cn/datasets/swift/function_call_competition/resolve/master/wechat_swift.png" width="200" height="200">
提供机构:
maas
创建时间:
2025-05-19



