five

function_call_competition

收藏
魔搭社区2026-05-15 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/swift/function_call_competition
下载链接
链接失效反馈
官方服务:
资源简介:
## 数据集 数据集地址:https://modelscope.cn/datasets/swift/function_call_competition 数据集下载: ```python from modelscope import MsDataset dataset = MsDataset.load('swift/function_call_competition', split='train') test_dataset = MsDataset.load('swift/function_call_competition', split='test') print(dataset) print(test_dataset) """ Dataset({ features: ['tools', 'messages'], num_rows: 50000 }) Dataset({ features: ['tools', 'messages'], num_rows: 5000 }) """ # 打印数据集 print(dataset[0]) """ {'tools': '[{"name": "get_gifs_by_id", "description": "Fetches multiple GIF details from Giphy by their IDs.", "parameters": {"ids": {"description": "A comma-separated string of GIF IDs.", "type": "str", "default": "feqkVgjJpYtjy,7rzbxdu0ZEXLy"}}}, {"name": "shows_id_episodes", "description": "Retrieve a list of episodes for a specific show from the given platform.", "parameters": {"platform": {"description": "The platform from which to retrieve the episodes (e.g., \'ios\', \'android\', \'androidtv\', \'web\').", "type": "str", "default": "ios"}, "is_id": {"description": "The ID of the show.", "type": "int", "default": "82856"}, "offset": {"description": "The number of records to skip in the results. Default is 0.", "type": "int, optional", "default": "0"}, "region": {"description": "The region to filter the episodes by (e.g., \'US\', \'FR\'). Default is \'US\'.", "type": "str, optional", "default": "US"}, "limit": {"description": "The number of records to return per request, with a maximum of 100. Default is 25.", "type": "int, optional", "default": "25"}, "sort": {"description": "The order to sort the records by. Use \'regular\' for default order or \'reverse\' for most recent episodes first. Default is \'regular\'.", "type": "str, optional", "default": "regular"}, "season": {"description": "The specific season for which to retrieve episodes. If not specified, episodes from all seasons will be retrieved. Default is 1.", "type": "int, optional", "default": "1"}}}, {"name": "get_all_details", "description": "Fetches all details of a YouTube video, stream, or shorts including captions, audio-only files, video-only files, and stream URL if available.", "parameters": {"is_id": {"description": "The unique ID of the YouTube video/stream/shorts.", "type": "str", "default": "ppCIVJE3my4"}}}, {"name": "sticker_roulette", "description": "Fetches a randomly selected sticker from Giphy\'s sticker collection. Optionally, it allows the selection to be limited to a specific tag. The function interacts with the Giphy API to retrieve the sticker.", "parameters": {"tag": {"description": "The tag to limit the scope of the sticker search. Should be URL encoded and can include phrases separated by hyphens.", "type": "str", "default": "oops"}}}, {"name": "get_channel_leaderboards", "description": "Fetches the leaderboards for a specified Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel to get leaderboards for. Defaults to \'shadoune666\'.", "type": "str", "default": "shadoune666"}}}, {"name": "get_post_data_download_video_photo", "description": "Fetches detailed data for a given post, including a download link for any videos associated with the post.", "parameters": {"post_id": {"description": "The unique identifier of the post. Defaults to \'adPXX3Q\'.", "type": "str", "default": "adPXX3Q"}}}, {"name": "get_user_id", "description": "Fetches the user ID for a given Twitch channel using the Twitch API.", "parameters": {"channel": {"description": "The Twitch channel name for which to fetch the user ID.", "type": "str", "default": "xqc"}}}, {"name": "get_channel_points_context", "description": "Fetches the channel points context for a given Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel for which to fetch the channel points context.", "type": "str", "default": "xqc"}}}]', 'messages': [{'role': 'user', 'content': "I need to fetch the leaderboards for the Twitch channel 'ninja'. Could you provide the Python code using the 'get_channel_leaderboards' function?"}, {'role': 'tool_call', 'content': '{"name": "get_channel_leaderboards", "arguments": {"channel": "ninja"}}'}]} """ print(test_dataset[0]) """ {'tools': '[{"name": "geogrid_seach_with_ranking", "description": "Perform a full grid search and retrieve the ranking of a business at every coordinate point in the grid. The grid cells in the results are ordered left-to-right, then top-to-bottom. Additional ranking data for the business is provided based on either place ID or business name.", "parameters": {"match_value": {"description": "The search query or keyword.", "type": "str", "default": "ChIJoejvAr3Mj4ARtHrbKxtAHXI"}, "query": {"description": "The Google Place ID of the business or the business name to match in results. Use the `match_type` parameter to specify the choice.", "type": "str", "default": "web design"}, "lng": {"description": "Longitude value of the grid center coordinate point.", "type": "int", "default": "-121.938314"}, "lat": {"description": "Latitude value of the grid center coordinate point.", "type": "int", "default": "37.341759"}, "zoom": {"description": "Google Maps zoom level to use for searching each grid point. Default is 13.", "type": "int, optional", "default": "13"}, "match_type": {"description": "The type of match to perform for ranking. Either \'place_id\' or \'name\'. Default is \'place_id\'.", "type": "str, optional", "default": "place_id"}, "distance_unit": {"description": "The unit of measurement for distance. Default is \'km\'.", "type": "str, optional", "default": "km"}, "width": {"description": "The width of the grid in location points for non-square grid searches. Default is 5.", "type": "int, optional", "default": "5"}, "height": {"description": "The height of the grid in location points for non-square grid searches. Default is 5.", "type": "int, optional", "default": "5"}, "grid_size": {"description": "The size of the grid (e.g., 3x3, 5x5). Default is 5.", "type": "int, optional", "default": "5"}, "distance": {"description": "The distance between coordinate points on the same row/column in the grid. Default is 1.", "type": "int, optional", "default": "1"}}}]', 'messages': [{'role': 'user', 'content': "Find the ranking of a coffee shop named 'The Daily Grind' in New York City, centered at longitude -74 and latitude 40, using a 3x3 grid with a zoom level of 15."}]} """ ``` 该数据集包含 50,000 条训练样本 和 5,000 条测试样本。在训练集中,每条样本均包含真实标签 `{"role": "tool_call", "content": "xxx"}`,其中可能包含多条 `"tool_call"`,用于表示并行的工具调用(评分时不考虑调用顺序)。测试集中则不包含 `"tool_call"` 标签,模型需要根据测试集中提供的 `"tools"` 和 `"messages"` 字段,推断出需要调用的工具名称及其所需传入的参数。 ## 基线 以下介绍如何使用**ms-swift**大模型训练框架,基于该数据集对**Qwen3-8B**进行**LoRA微调**的基线实现: - ms-swift: https://github.com/modelscope/ms-swift - Qwen3-8B: https://modelscope.cn/models/Qwen/Qwen3-8B - 显存需求:该基线实现所需显存资源为**22GiB**,可在魔搭平台的免费算力A10上运行:https://modelscope.cn/my/mynotebook - 性能指标:该基线实现的评测分数为**0.9259**。 在开始微调之前,请确保您的环境已正确配置: ```bash pip install liger_kernel transformers ms-swift -U ``` 首先,我们观察,在训练过程中,ms-swift如何对数据样本进行格式转换,以及对哪些部分进行损失计算: ```python from modelscope import MsDataset from swift.llm import get_model_tokenizer, get_template dataset = MsDataset.load('swift/function_call_competition', split='train') _, tokenizer = get_model_tokenizer('Qwen/Qwen3-8B', load_model=False) template = get_template(tokenizer.model_meta.template, tokenizer, agent_template='hermes') data = dataset[0] template.set_mode('train') encoded = template.encode(data) print(f'[INPUT_IDS] {template.safe_decode(encoded["input_ids"])}\n') print(f'[LABELS] {template.safe_decode(encoded["labels"])}') """ [INPUT_IDS] <|im_start|>system # Tools You may call one or more functions to assist with the user query. You are provided with function signatures within <tools></tools> XML tags: <tools> {"type": "function", "function": {"name": "get_gifs_by_id", "description": "Fetches multiple GIF details from Giphy by their IDs.", "parameters": {"ids": {"description": "A comma-separated string of GIF IDs.", "type": "str", "default": "feqkVgjJpYtjy,7rzbxdu0ZEXLy"}}}} {"type": "function", "function": {"name": "shows_id_episodes", "description": "Retrieve a list of episodes for a specific show from the given platform.", "parameters": {"platform": {"description": "The platform from which to retrieve the episodes (e.g., 'ios', 'android', 'androidtv', 'web').", "type": "str", "default": "ios"}, "is_id": {"description": "The ID of the show.", "type": "int", "default": "82856"}, "offset": {"description": "The number of records to skip in the results. Default is 0.", "type": "int, optional", "default": "0"}, "region": {"description": "The region to filter the episodes by (e.g., 'US', 'FR'). Default is 'US'.", "type": "str, optional", "default": "US"}, "limit": {"description": "The number of records to return per request, with a maximum of 100. Default is 25.", "type": "int, optional", "default": "25"}, "sort": {"description": "The order to sort the records by. Use 'regular' for default order or 'reverse' for most recent episodes first. Default is 'regular'.", "type": "str, optional", "default": "regular"}, "season": {"description": "The specific season for which to retrieve episodes. If not specified, episodes from all seasons will be retrieved. Default is 1.", "type": "int, optional", "default": "1"}}}} {"type": "function", "function": {"name": "get_all_details", "description": "Fetches all details of a YouTube video, stream, or shorts including captions, audio-only files, video-only files, and stream URL if available.", "parameters": {"is_id": {"description": "The unique ID of the YouTube video/stream/shorts.", "type": "str", "default": "ppCIVJE3my4"}}}} {"type": "function", "function": {"name": "sticker_roulette", "description": "Fetches a randomly selected sticker from Giphy's sticker collection. Optionally, it allows the selection to be limited to a specific tag. The function interacts with the Giphy API to retrieve the sticker.", "parameters": {"tag": {"description": "The tag to limit the scope of the sticker search. Should be URL encoded and can include phrases separated by hyphens.", "type": "str", "default": "oops"}}}} {"type": "function", "function": {"name": "get_channel_leaderboards", "description": "Fetches the leaderboards for a specified Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel to get leaderboards for. Defaults to 'shadoune666'.", "type": "str", "default": "shadoune666"}}}} {"type": "function", "function": {"name": "get_post_data_download_video_photo", "description": "Fetches detailed data for a given post, including a download link for any videos associated with the post.", "parameters": {"post_id": {"description": "The unique identifier of the post. Defaults to 'adPXX3Q'.", "type": "str", "default": "adPXX3Q"}}}} {"type": "function", "function": {"name": "get_user_id", "description": "Fetches the user ID for a given Twitch channel using the Twitch API.", "parameters": {"channel": {"description": "The Twitch channel name for which to fetch the user ID.", "type": "str", "default": "xqc"}}}} {"type": "function", "function": {"name": "get_channel_points_context", "description": "Fetches the channel points context for a given Twitch channel using the provided RapidAPI key.", "parameters": {"channel": {"description": "The name of the Twitch channel for which to fetch the channel points context.", "type": "str", "default": "xqc"}}}} </tools> For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call><|im_end|> <|im_start|>user I need to fetch the leaderboards for the Twitch channel 'ninja'. Could you provide the Python code using the 'get_channel_leaderboards' function?<|im_end|> <|im_start|>assistant <tool_call> {"name": "get_channel_leaderboards", "arguments": {"channel": "ninja"}} </tool_call><|im_end|> [LABELS] [-100 * 1082]<tool_call> {"name": "get_channel_leaderboards", "arguments": {"channel": "ninja"}} </tool_call><|im_end|> """ ``` ### 训练 单卡训练的代码如下。如果要进行多卡训练,你可以使用命令行方式启动训练。 - 多卡训练参考:[例子](https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu) - 注册数据集参考:[文档](https://swift.readthedocs.io/zh-cn/latest/Customization/%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86.html),[例子](https://github.com/modelscope/ms-swift/tree/main/examples/custom) ```python # 22GiB import os from typing import Dict, Any os.environ['CUDA_VISIBLE_DEVICES'] = '0' from swift.llm import ( TrainArguments, sft_main, register_dataset, DatasetMeta, ResponsePreprocessor, SubsetDataset ) register_dataset( DatasetMeta( ms_dataset_id='swift/function_call_competition', subsets=[SubsetDataset('train', split=['train']), SubsetDataset('test', split=['test'])] )) if __name__ == '__main__': sft_main(TrainArguments( model='Qwen/Qwen3-8B', dataset=['swift/function_call_competition:train'], agent_template='hermes', loss_scale='hermes', train_type='lora', torch_dtype='bfloat16', num_train_epochs=2, per_device_train_batch_size=1, per_device_eval_batch_size=1, learning_rate=1e-4, lora_rank=8, lora_alpha=32, target_modules=['all-linear'], gradient_accumulation_steps=16, eval_steps=50, save_steps=50, save_total_limit=2, logging_steps=5, max_length=2048, output_dir='output', warmup_ratio=0.05, dataset_num_proc=4, dataloader_num_workers=4, use_liger_kernel=True, attn_impl='flash_attn', packing=True, save_only_model=True, acc_strategy='seq', )) ``` ### 推理 你需要安装`vllm>=0.8.5`进行推理加速。使用训练后的模型对测试集中的样本进行推理,产生`result.jsonl`文件: ```python import os from typing import Dict, Any os.environ['CUDA_VISIBLE_DEVICES'] = '0' from swift.llm import InferArguments, infer_main, register_dataset, DatasetMeta, SubsetDataset register_dataset( DatasetMeta( ms_dataset_id='swift/function_call_competition', subsets=[SubsetDataset('train', split=['train']), SubsetDataset('test', split=['test'])] )) ckpt_dir = 'output/vx-xxx/checkpoint-xxx' # last_checkpoint result = infer_main(InferArguments( adapters=[ckpt_dir], temperature=0, val_dataset="swift/function_call_competition:test", infer_backend='vllm', vllm_max_lora_rank=8, result_path='result.jsonl', max_model_len=4096, gpu_memory_utilization=0.8, max_new_tokens=512)) ``` 最终提交的文件需要组织成固定格式,并命名为`result.json`。以下为对`result.jsonl`中的内容进行格式转换,转成提交评分系统格式的脚本: ```python import os import json from datasets import Dataset, load_dataset from tqdm import tqdm import re dataset = load_dataset('json', data_files='result.jsonl', split='train') res = [] for data in tqdm(dataset): content = data['messages'][-1]['content'] functions = re.findall(r'<tool_call>(.+?)</tool_call>', content, re.DOTALL) toolcall = [] for function in functions: try: function = json.loads(function) except Exception: continue toolcall.append(function) toolcall = json.dumps(toolcall, ensure_ascii=False) res.append({'toolcall': toolcall}) new_dataset = Dataset.from_list(res) new_dataset.to_json('toolcall.jsonl') os.rename('toolcall.jsonl', 'result.json') ``` 其中`result.json`的前三列内容为:(仅供参考) ```jsonl {"toolcall":"[{\"name\": \"geogrid_seach_with_ranking\", \"arguments\": {\"match_value\": \"The Daily Grind\", \"query\": \"The Daily Grind\", \"lng\": -74, \"lat\": 40, \"zoom\": 15, \"distance_unit\": \"km\", \"grid_size\": 3, \"distance\": 1}}]"} {"toolcall":"[{\"name\": \"fish_api_fish_name\", \"arguments\": {\"name\": \"Bluefin Tuna\"}}]"} {"toolcall":"[{\"name\": \"kunyomi_reading\", \"arguments\": {\"kun\": \"\u307f\u305a\"}}, {\"name\": \"downloadscreenshot\", \"arguments\": {\"is_id\": 67890}}]"} ``` - 注意:该格式为jsonl,由于比赛界面只能传递json后缀文件,因此将后缀设置为json。 - `result.json`总共5000行,请确保顺序与`test.jsonl`顺序一致。 - 其中"toolcall"字段需要是一个JSON字符串,其中包含字典的列表。每一个字典为一个工具调用,需要包含工具名"name"和传入工具的参数"arguments"。 然后将`result.json`提交系统即可产生评分。评分系统的给分设计请参考"评测"章节。 ## 评测 该竞赛只评测function call的工具调用准确性,而不对模型的通用能力进行考察。即模型只需要从"tools"给出的工具中选择正确的工具并传入正确参数即算回答正确。 评分系统会对每条数据样本进行打分,最高为1分。总分5000分。最终对5000分进行归一化,即除以5000,得到最终的分数(0-1)。 1. 该条toolcall可以被正确解析,即为json字符串,每个工具调用包含name和arguments,且包含至少一条工具调用,得0.1分。 2. 满足1且工具调用数量正确,且选择的工具名全部正确,得0.4分。 3. 满足2且传入工具的参数值全部正确,得1分。 评测伪代码如下: ```python # result.json -> result.jsonl dataset = load_dataset('json', data_files='result.jsonl', split='train') labels, tools_list = read_labels() res = [] for data, label_data, tools in tqdm(zip(dataset, labels, tools_list)): try: parsed_data = [{'name': toolcall['name'], 'arguments': toolcall['arguments']} for toolcall in json.loads(data['toolcall'])] except Exception: res.append(0) continue names = [toolcall['name'] for toolcall in parsed_data] label_names = [toolcall['name'] for toolcall in label_data] counter = Counter(names) label_counter = Counter(label_names) if label_counter != counter: res.append(0.1) continue try: arguments_equal = is_arguments_equal(parsed_data, label_data, tools) except Exception: arguments_equal = False if not arguments_equal: res.append(0.4) continue res.append(1) print(f'score: {sum(res) / len(res)}') # score: 0.9259 ``` ## 交流群 <img src="https://modelscope.cn/datasets/swift/function_call_competition/resolve/master/wechat_swift.png" width="200" height="200">
提供机构:
maas
创建时间:
2025-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作