five

tool-plannings-v0.2

收藏
魔搭社区2025-12-05 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/Vikhrmodels/tool-plannings-v0.2
下载链接
链接失效反馈
官方服务:
资源简介:
# Vikhrmodels/tool-plannings-v0.2 An English-Russian synthetic dataset for the task of **function calling**, obtained using the OpenAI API and executable functions in the environment (the behavior is close to reality). ## Case coverage ``` ┌─ full_dataset/ (non-permuted dataset including all cases below) ├─ single_request/ (single turn case with only successful tool callings) ├─ multiple_request/ (multiple turn case with successful and not tool callings) ├─ rejected_by_unavailable/ (unsuccessful examples when user wants unavailable tool calling) └─ rejected_by_required/ (unsuccessful examples when user didnt specify all required parameters) ``` ## Sample structed - **id**: (*int*) non-unique identificator - **tools**: (*str*) ready for training list of tools - **conversation**: (*str*) ready for training list of messages Examples looks like: ``` { 'id': 422, 'tools': '[ {"type": "function", "function": {"name": "parse_filename", "description": "Parses a filename and extracts the version number, extension, and other file information.", "parameters": {"type": "object", "properties": {"filename": {"type": "string", "description": "The filename to parse, expected to be in the format `name-v[version].[extension]`."}}, "required": ["filename"]}}}, {"type": "function", "function": {"name": "remove_dup_char_and_space", "description": "Removes duplicate characters and spaces from the input string.", "parameters": {"type": "object", "properties": {"s": {"type": "string", "description": "The input string from which duplicate characters and spaces will be removed."}}, "required": ["s"]}}} ]', 'conversation': '[ {"role": "user", "content": "Hey, can you help me clean up this string? It’s got a lot of repeating characters and spaces: \\"aa bbb ccccc\\"."}, {"role": "assistant", "tool_plan": "To clean up the string \\"aa bbb ccccc\\" by removing duplicate characters and spaces, I will use the available tool `remove_dup_char_and_space`. This tool is designed specifically for this purpose. \\n\\nHere’s the step-by-step plan:\\n1. I will prepare a tool call to `remove_dup_char_and_space` with the input string \\"aa bbb ccccc\\".\\n2. I will execute this tool call to get the cleaned-up version of the string.\\n\\nOnce I have the result from the tool, I will present the cleaned string to the user.", "tool_calls": [{"id": 0, "type": "function", "function": {"name": "remove_dup_char_and_space", "arguments": {"s": "aa bbb ccccc"}}}]}, {"role": "tool", "tool_call_id": 0, "content": {"data": "a bc"}}, {"role": "assistant", "content": "The cleaned-up version of the string is: \\"a bc\\". This removes the duplicate characters and spaces effectively."} ]' } ``` ``` { 'id': 422, 'tools': '[ {"type": "function", "function": {"name": "create_collapsible_panel", "description": "Creates an HTML snippet to display a collapsible panel for a given text message, with optional `title` and `style` attributes.", "parameters": {"type": "object", "properties": {"message": {"type": "string", "description": "The text message to be displayed in the panel."}, "title": {"type": "string", "description": "The title of the panel. Defaults to \'Details\'."}, "style": {"type": "string", "description": "The style of the panel. Can be \'warning\', \'info\', or \'danger\'. Defaults to \'info\'."}}, "required": ["message"]}}}, {"type": "function", "function": {"name": "is_valid_guid", "description": "Checks if a string is a valid GUID (Globally Unique Identifier).", "parameters": {"type": "object", "properties": {"guid": {"type": "string", "description": "The string to check. A valid GUID must follow these rules: 1. Consist of exactly 32 characters. 2. Use lowercase hexadecimal digits (0 to 9, a to f). 3. Each character in the string must exactly match the character at the same position in one of the two following patterns: \'0123456789abcdef0123456789abcdef\' or \'abcdef0123456789abcdef0123456789\'"}}, "required": ["guid"]}}}, {"type": "function", "function": {"name": "find_search_term", "description": "Finds the first occurrence of a search term in a longer string.", "parameters": {"type": "object", "properties": {"content": {"type": "string", "description": "The longer string to search within."}, "term": {"type": "string", "description": "The search term to find."}}, "required": ["content", "term"]}}} ]', 'conversation': '[ {"role": "user", "content": "Эй, мне нужно найти первое вхождение слова в длинном тексте. Вот текст: \\"Сегодня хорошая погода, и я планирую погулять в парке.\\" А слово, которое я ищу, это \\"погода\\". Можешь помочь?"}, {"role": "assistant", "tool_plan": "Для решения задачи я буду использовать инструмент, который позволяет находить первое вхождение слова в длинном тексте. В данном случае, у меня есть текст: \\"Сегодня хорошая погода, и я планирую погулять в парке.\\" и слово \\"погода\\", которое нужно найти. \\n\\nЯ подготовлю вызов инструмента `find_search_term`, передав ему текст и искомое слово. После этого я смогу получить результат и сообщить пользователю о первом вхождении слова. \\n\\nШаги:\\n1. Подготовить вызов инструмента `find_search_term` с параметрами: `content` (длинный текст) и `term` (искомое слово).\\n2. Выполнить вызов инструмента.\\n3. Обработать результат и сообщить пользователю о первом вхождении слова.\\n\\nТеперь я выполню первый шаг и подготовлю вызов инструмента.", "tool_calls": [{"id": 0, "type": "function", "function": {"name": "find_search_term", "arguments": {"content": "Сегодня хорошая погода, и я планирую погулять в парке.", "term": "погода"}}}]}, {"role": "tool", "tool_call_id": 0, "content": {"data": "погода, и я планирую погулять в парке."}}, {"role": "assistant", "content": "Первое вхождение слова \\"погода\\" в тексте \\"Сегодня хорошая погода, и я планирую погулять в парке.\\" начинается с этого слова и продолжается до конца предложения: \\"погода, и я планирую погулять в парке.\\""} ]' } ``` ### Roles insides * **user** * *content* ─ user's request * **assistant** * *tool_plan* ─ assistant's plan before tool callings * *tool_calls* ─ assistant's tool callings (NB: *id == tool_call_id*) * *content* ─ assistant's regular response * **tool** * *tool_call_id* ─ tool's local identificator (NB: *id == tool_call_id*) * *content* ─ tool's response in format *"{"data": ...}"* * *is_tool_error* ─ tool's optional parameter wether response is error ## Using dataset ```python from datasets import load_dataset dataset = load_dataset( "Vikhrmodels/tool-plannings-v0.2", "full_dataset", # either other case ) ``` ## Authors - Dmitry Tishencko, [Vikhr Team](https://t.me/vikhrlabs) - Sergei Bratchikov, [NLP Wanderer](https://t.me/nlpwanderer), [Vikhr Team](https://t.me/vikhrlabs) - Aleksandr Nikolich, [Vikhr Team](https://t.me/vikhrlabs)

# Vikhrmodels/tool-plannings-v0.2 这是一个面向**函数调用(function calling)**任务的英俄双语合成数据集,通过OpenAI API与环境中的可执行函数构建而成,其行为逻辑贴近真实应用场景。 ## 案例覆盖范围 ┌─ full_dataset/(未经过打乱的完整数据集,包含下述所有案例类型) ├─ single_request/(仅包含成功函数调用的单轮对话案例) ├─ multiple_request/(包含成功与失败函数调用的多轮对话案例) ├─ rejected_by_unavailable/(用户请求使用不可用工具时的失败示例) └─ rejected_by_required/(用户未指定全部必填参数时的失败示例) ## 样本结构 - **id**:(整数型)非唯一标识符 - **tools**:(字符串型)可直接用于训练的工具列表 - **conversation**:(字符串型)可直接用于训练的对话消息列表 示例格式如下: { 'id': 422, 'tools': '[ {"type": "function", "function": {"name": "parse_filename", "description": "解析文件名并提取版本号、扩展名与其他文件元信息。", "parameters": {"type": "object", "properties": {"filename": {"type": "string", "description": "待解析的文件名,预期格式为`name-v[version].[extension]`。"}}, "required": ["filename"]}}}, {"type": "function", "function": {"name": "remove_dup_char_and_space", "description": "从输入字符串中移除重复字符与多余空格。", "parameters": {"type": "object", "properties": {"s": {"type": "string", "description": "待处理的输入字符串,将从中移除重复字符与多余空格。"}}, "required": ["s"]}}} ]', 'conversation': '[ {"role": "user", "content": "嘿,能帮我清理一下这个字符串吗?它有很多重复字符和空格:"aa bbb ccccc"。"}, {"role": "assistant", "tool_plan": "要清理字符串"aa bbb ccccc"以移除重复字符与多余空格,我将使用可用工具`remove_dup_char_and_space`,该工具专为该场景设计。 具体步骤如下: 1. 准备针对`remove_dup_char_and_space`的工具调用,传入待清理的字符串"aa bbb ccccc"。 2. 执行该工具调用以获取清理后的字符串。 获取工具返回结果后,我将向用户呈现清理后的字符串。", "tool_calls": [{"id": 0, "type": "function", "function": {"name": "remove_dup_char_and_space", "arguments": {"s": "aa bbb ccccc"}}}]}, {"role": "tool", "tool_call_id": 0, "content": {"data": "a bc"}}, {"role": "assistant", "content": "该字符串的清理结果为:"a bc"。该工具已有效移除了重复字符与多余空格。"} ]' } { 'id': 422, 'tools': '[ {"type": "function", "function": {"name": "create_collapsible_panel", "description": "生成可折叠面板的HTML代码片段,用于展示指定文本消息,支持可选的`title`与`style`属性。", "parameters": {"type": "object", "properties": {"message": {"type": "string", "description": "待在面板中展示的文本消息。"}, "title": {"type": "string", "description": "面板标题,默认为"Details"。"}, "style": {"type": "string", "description": "面板样式,可选值为'warning'、'info'或'danger',默认为'info'。"}}, "required": ["message"]}}}, {"type": "function", "function": {"name": "is_valid_guid", "description": "检查字符串是否为有效的全局唯一标识符(Globally Unique Identifier, GUID)。", "parameters": {"type": "object", "properties": {"guid": {"type": "string", "description": "待检查的字符串。有效的GUID需符合以下规则:1. 恰好包含32个字符;2. 使用小写十六进制数字(0-9、a-f);3. 字符串中的每个字符需与下述两种模式之一的对应位置字符完全匹配:'0123456789abcdef0123456789abcdef'或'abcdef0123456789abcdef0123456789'"}}, "required": ["guid"]}}}, {"type": "function", "function": {"name": "find_search_term", "description": "在长字符串中查找搜索词的首次出现位置。", "parameters": {"type": "object", "properties": {"content": {"type": "string", "description": "待搜索的长字符串。"}, "term": {"type": "string", "description": "待查找的搜索词。"}}, "required": ["content", "term"]}}} ]', 'conversation': '[ {"role": "user", "content": "嘿,我需要在一段长文本中查找某个词的首次出现。文本如下:"Сегодня хорошая погода, и я планирую погулять в парке.",要查找的词是"погода"。能帮我吗?"}, {"role": "assistant", "tool_plan": "为解决该问题,我将使用可在长文本中查找词首次出现的工具。当前给定的文本为"Сегодня хорошая погода, и я планирую погулять в парке.",待查找的词为"погода"。 我将准备调用工具`find_search_term`,传入文本与待查找的词。执行该工具调用后即可获取结果,并向用户反馈词的首次出现位置。 具体步骤: 1. 准备针对`find_search_term`的工具调用,传入参数`content`(待搜索的长文本)与`term`(待查找的词)。 2. 执行该工具调用。 3. 处理工具返回结果并向用户反馈词的首次出现位置。 现在我将执行第一步,准备工具调用。", "tool_calls": [{"id": 0, "type": "function", "function": {"name": "find_search_term", "arguments": {"content": "Сегодня хорошая погода, и я планирую погулять в парке.", "term": "погода"}}}]}, {"role": "tool", "tool_call_id": 0, "content": {"data": "погода, и я планирую погулять в парке."}}, {"role": "assistant", "content": "词"погода"在文本"Сегодня хорошая погода, и я планирую погулять в парке."中的首次出现位置始于该词本身,直至句子结尾:"погода, и я планирую погулять в парке.""} ]' } ### 对话角色说明 * **user(用户)** * *content*:用户的请求内容 * **assistant(助手)** * *tool_plan*:助手发起工具调用前的规划文本 * *tool_calls*:助手发起的工具调用列表(注意:此处`id`与`tool_call_id`一一对应) * *content*:助手的常规回复内容 * **tool(工具)** * *tool_call_id*:工具的本地标识符(注意:此处`id`与`tool_call_id`一一对应) * *content*:工具的响应内容,格式为`{"data": ...}` * *is_tool_error*:工具的可选参数,用于标识该响应是否为错误信息 ## 数据集使用方法 python from datasets import load_dataset dataset = load_dataset( "Vikhrmodels/tool-plannings-v0.2", "full_dataset", # 可替换为其他数据集子集 ) ## 作者信息 - 德米特里·季申科(Dmitry Tishencko),[Vikhr团队](https://t.me/vikhrlabs) - 谢尔盖·布拉奇科夫(Sergei Bratchikov),[NLP漫游者](https://t.me/nlpwanderer),[Vikhr团队](https://t.me/vikhrlabs) - 亚历山大·尼利奇(Aleksandr Nikolich),[Vikhr团队](https://t.me/vikhrlabs)
提供机构:
maas
创建时间:
2025-09-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作