glaive-function-calling-openai

Hugging Face2024-12-10 更新2024-12-12 收录

下载链接：

https://huggingface.co/datasets/madroid/glaive-function-calling-openai

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含用于训练和评估语言模型在函数调用能力上的对话示例。数据集包括一个完整的函数调用示例集合和一个精选的子集，专注于最常用的函数。数据集的结构包括一个完整的数据集和几个测试子集。每个记录都是一个JSON对象，包含对话消息、可用函数定义和实际的函数调用。数据集还包括最常用的函数分布信息，并提供了加载和评估数据集的示例代码。

This dataset comprises conversational examples for training and evaluating language models' function calling capabilities. It includes both a full set of function call examples and a curated subset focused on the most commonly used functions. Structurally, the dataset consists of a complete dataset and multiple test subsets. Each record is a JSON object that encompasses conversational messages, definitions of available functions, and the actual function calls. Additionally, the dataset provides distribution information of the most frequently used functions, alongside sample code for loading and evaluating the dataset.

创建时间：

2024-12-08

原始信息汇总

OpenAI Function Calling Dataset

数据集概述

该数据集包含OpenAI函数调用对话的示例，旨在用于训练和评估语言模型在函数调用能力方面的表现。数据集包括一个全面的函数调用示例集合和一个精选的常用函数子集。

数据集结构

完整数据集

包含所有函数调用示例
文件: openai_function_calling_all.jsonl
大小: 112,754条记录
包含多种函数调用场景

测试子集

用户可以根据需求选择合适的测试子集：

快速评估: 使用Top 1000子集
全面测试: 使用更大的子集
模型训练: 使用完整数据集

数据格式

每条记录是一个JSON对象，包含以下结构：

json { "messages": [ { "role": "system", "content": "You are a helpful assistant with access to the following functions. Use them if required" }, { "role": "user", "content": "Hi, I had a pizza for lunch today which was about 800 calories. Can you track this for me?" }, { "role": "assistant", "content": "Sure, I can help you with that. Let me track this for you." } ], "tools": [ { "type": "function", "function": { "name": "track_calories", "description": "Track daily calorie intake", "parameters": { "type": "object", "properties": { "meal": { "type": "string", "description": "The meal for which calories are being tracked" }, "calories": { "type": "number", "description": "The number of calories consumed" }, "date": { "type": "string", "format": "date", "description": "The date for which calories are being tracked" } }, "required": [ "meal", "calories", "date" ] } } } ], "tool_calls": [ { "id": "mhnMNaInh", "type": "function", "function": { "name": "track_calories", "arguments": "{meal: pizza, calories: 800, date: 2022-03-01}" } } ] }

字段描述

messages: 对话消息列表，引导到函数调用
- role: 消息发送者的角色 ("system", "user", 或 "assistant")
- content: 消息内容
tools: 可用函数定义列表
- type: 工具类型 (目前仅支持 "function")
- function: 函数定义，包括名称、描述和参数
tool_calls: 助手实际进行的函数调用
- id: 函数调用的唯一标识符
- type: 工具调用类型
- function: 函数调用详情，包括名称和参数

常用函数分布

以下是数据集中使用频率最高的10个函数：

calculate_distance: 5,063次调用
convert_currency: 4,681次调用
get_stock_price: 3,809次调用
calculate_discount: 3,277次调用
calculate_bmi: 3,241次调用
calculate_tip: 3,106次调用
calculate_age: 3,046次调用
generate_random_number: 3,003次调用
calculate_area: 2,866次调用
get_movie_details: 2,509次调用

使用方法

加载数据集

可以使用Hugging Face datasets库加载数据集：

python from datasets import load_dataset

加载完整数据集

dataset = load_dataset("madroid/openai-function-calling", "train")

加载特定测试子集

eval = load_dataset("madroid/openai-function-calling", "eval")

模型评估示例

以下是如何使用测试子集评估模型的示例：

python import json import openai from tqdm import tqdm

def evaluate_function_calling(dataset, model="gpt-3.5-turbo"): results = { function_name_accuracy: 0, arguments_accuracy: 0, total_accuracy: 0 }

total = 0
correct_function_names = 0
correct_arguments = 0
correct_total = 0

for example in tqdm(dataset):
    # 解析JSON字符串
    data = json.loads(example[json])
    
    try:
        # 准备请求
        response = openai.chat.completions.create(
            model=model,
            messages=data[messages],
            tools=data[tools],
            tool_choice="auto"
        )
        
        # 比较结果
        expected_calls = data[tool_calls]
        actual_calls = response.choices[0].message.tool_calls
        
        for expected, actual in zip(expected_calls, actual_calls):
            total += 1
            
            # 检查函数名称
            if expected[function][name] == actual.function.name:
                correct_function_names += 1
                
                # 检查参数
                expected_args = json.loads(expected[function][arguments])
                actual_args = json.loads(actual.function.arguments)
                
                if expected_args == actual_args:
                    correct_arguments += 1
                    correct_total += 1
                    
    except Exception as e:
        print(f"Error processing example: {e}")
        continue

# 计算准确率
results[function_name_accuracy] = correct_function_names / total
results[arguments_accuracy] = correct_arguments / total
results[total_accuracy] = correct_total / total

return results

示例用法

top_100_dataset = load_dataset("madroid/openai-function-calling", "top_100") results = evaluate_function_calling(top_100_dataset) print("Evaluation Results:") print(f"Function Name Accuracy: {results[function_name_accuracy]:.2%}") print(f"Arguments Accuracy: {results[arguments_accuracy]:.2%}") print(f"Total Accuracy: {results[total_accuracy]:.2%}")

可以根据需求使用不同的测试子集：

快速评估: 使用top_100
全面测试: 使用更大的子集
训练或彻底评估: 使用完整数据集

致谢

该数据集是从Locutusque/function-calling-chatml数据集编译而来。

搜集汇总

数据集介绍

构建方式

该数据集通过收集和整理OpenAI函数调用的对话示例构建而成，旨在为语言模型在函数调用能力方面的训练和评估提供支持。数据集包含了多种函数调用场景的完整示例，并进一步细分为多个测试子集，以满足不同用户的需求。每个记录以JSON对象形式呈现，包含对话消息、可用函数定义及实际函数调用信息，确保数据结构的清晰和一致性。

使用方法

用户可以通过Hugging Face的datasets库加载该数据集，支持完整数据集和特定测试子集的加载。数据集的使用方法灵活多样，既可用于模型训练，也可用于快速评估或全面测试。通过提供的评估脚本，用户可以轻松地对模型进行函数调用能力的评估，并根据需要选择不同的测试子集，以获得准确和全面的评估结果。

背景与挑战

背景概述

随着自然语言处理技术的快速发展，函数调用能力在语言模型中的重要性日益凸显。glaive-function-calling-openai数据集由OpenAI主导，旨在为语言模型的函数调用能力提供训练和评估资源。该数据集包含了丰富的函数调用对话示例，涵盖了多种实际应用场景，并特别关注了最常用的函数。通过这一数据集，研究人员能够更有效地训练和评估模型在函数调用任务中的表现，从而推动语言模型在实际应用中的进一步发展。

当前挑战

该数据集在构建过程中面临多项挑战。首先，如何确保数据集的多样性和代表性，以覆盖尽可能多的函数调用场景，是一个重要问题。其次，数据集的规模和复杂性增加了数据处理和模型训练的难度。此外，评估函数调用任务的准确性，特别是函数名称和参数的匹配度，也是一个技术难题。最后，如何在不同规模的测试子集上进行有效评估，以满足不同研究需求，也是该数据集需要解决的挑战之一。

常用场景

经典使用场景

glaive-function-calling-openai数据集的经典使用场景主要集中在语言模型的函数调用能力训练与评估。该数据集通过丰富的函数调用示例，帮助模型学习如何在对话中识别并调用适当的函数。例如，模型可以通过学习如何调用'track_calories'函数来帮助用户记录每日的卡路里摄入，或者通过'calculate_distance'函数计算两地之间的距离。这些场景不仅涵盖了日常生活中的常见需求，还涉及金融、健康、娱乐等多个领域，为模型的多功能应用提供了坚实的基础。

解决学术问题

glaive-function-calling-openai数据集解决了语言模型在函数调用能力方面的学术研究问题。传统的语言模型往往难以在复杂的对话环境中准确识别并调用适当的函数，而该数据集通过提供大量的函数调用示例，帮助模型学习如何在不同情境下选择合适的函数。这不仅提升了模型的实用性，还为研究者提供了一个标准化的评估框架，推动了自然语言处理领域在函数调用方面的研究进展。

实际应用

在实际应用中，glaive-function-calling-openai数据集的应用场景广泛。例如，在健康管理领域，模型可以通过调用'track_calories'或'calculate_bmi'等函数，帮助用户进行健康数据的记录与分析；在金融领域，模型可以通过'convert_currency'或'get_stock_price'等函数，提供实时的金融信息查询服务。此外，该数据集还可用于开发智能助手、客服系统等，提升用户体验与服务效率。

数据集最近研究