five

智能体Agent Reinforcement Learning强化学习公开数据集

收藏
魔搭社区2025-12-23 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/rockingdingo/Agent-RL-Open-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
# Open Agent RL Dataset: High Quality AI Agent | Tool Use & Function Calls | Reinforcement Learning Datasets [Github](https://github.com/AI-Agent-Hub/AI-Agent-Marketplace)|[Huggingface](https://huggingface.co/datasets/DeepNLP/AI-Agent-Marketplace-Index)|[Pypi](https://pypi.org/project/ai-agent-marketplace/) | [Open Source AI Agent Marketplace DeepNLP](https://www.deepnlp.org/store/ai-agent)|[Agent RL Dataset](https://www.deepnlp.org/store/dataset) DeepNLP website provides **high quality, genuine, online users' request** of Agent & RL datasets to help LLM foundation/SFT/Post Train to get more capable models at function call, tool use and planning. The datasets are collected and sampled from users' requests on our various clients (Web/App/Mini App) and [Open OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router) and [Open OneKey MCP Router](https://www.deepnlp.org/agent/onekey-mcp-router). Some datasets requires [credit](https://www.deepnlp.org/workspace/billing) to deduct and you can easily gain more credit by activities such as commenting and discussion and uploading your own datasets to the communities. Visit Our AI Store Dataset Tab to Select [Dataset](https://www.deepnlp.org/store/dataset). **Disclaimer**: Safe privacy preserving or personalized information are marked and filtered out. <img src="https://raw.githubusercontent.com/AI-Agent-Hub/mcp-marketplace/refs/heads/main/app/mcp_tool_use/docs/demo_plot_chart.jpg" style="height:600px;" alt="AI Agent Marketplace Category"> | Dataset Name | Description | User Feedback | Example Dataset Download | Full DataSet Download | |----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---- |-------|-----------------------------------------------------------------------------------------------------------------| | Reinforcement Learning | Sessions of user and assistant' multi-dialogues, rewards from users' feedback in this session, such click of confirmation (Accept/Reject), Upvote, Downvote on the responses, etc. | YES | 50 instances, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset-example) | 1k, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset) | ## 1. Dataset Features **Genuine Users' Queries**: Most of the high quality datasets are collected from query logs of our live AI Agents, such as [MCP Tool Use Agent](https://agent.deepnlp.org/agent/mcp_tool_use), [Open OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router) and [Open OneKey MCP Router](https://www.deepnlp.org/agent/onekey-mcp-router). **Function Call and MCP Servers Support**: The datasets covers wide range of MCP servers from the Open MCP Marketplace() and Playgrounds. **Users Action and Humans' Feedback**: Users' actual feedbacks are crucial in improving the AI Agents training process. We collects users' genuine actions, such as **ACCEPT/REJECT** in confirming the function call results, **Upvote/Downvote** action of the final responses, and many other users' feedback on clickable elements. **Various Domains and Tasks**: We covers 40+ categories of AI agents' tool use scenarios, ranging from information seeking (AI search, map search, etc) to autonomous AI agents browser use, computer use, Data Analysis, Excel Spreadsheet and Powerpoint creation and generation, etc. **Example AI Agent Dataset Dialogues** | Domain | Related MCP Server| Demo | | ---- | ---- | ---- | | Office File Agent | Excel Spreadsheet, Powerpoint, PDF, etc | [Example](https://agent.deepnlp.org/agent/mcp_tool_use/share/ee640008-6bc1-4c3a-832b-2557f985b540) [MCP]() | | AI Search/Deep Research | Bing/Google Custom/Perplexity/Tavily/Firecrawl | [Demo](https://agent.deepnlp.org/agent/mcp_tool_use?server=tavily-ai/tavily-mcp) [MCP]() | | Map Trip Planning | GoogleMap, Amap(Gaode), BaiduMap, etc. | [Example](https://agent.deepnlp.org/agent/mcp_tool_use/share/8ab0b25c-b72d-4cae-9c86-a852df8c6541) [MCP](https://agent.deepnlp.org/agent/mcp_tool_use?server=amap-mcp/amap-mcp-%E9%AB%98%E5%BE%B7%E5%9C%B0%E5%9B%BE-mcp) [Use MCP]() | | Browser Usage | Playwright, Puppeteer, etc. | [Demo](https://agent.deepnlp.org/agent/mcp_tool_use?server=puppeteer/puppeteer) [MCP]() | | Chart,Graph,Image | everart,mcp-server-charts(AntV),canva-mcp | [Demo](https://agent.deepnlp.org/agent/mcp_tool_use/share/93d94694-e673-49d3-b805-820c4ef842bd) [MCP]() | ## 2. Dataset Introduction We provide main below types of AI agents datasets in List of Messages Json Formats and scalar data such as rewards, etc. | Dataset Name | Description | User Feedback | Example Dataset Download | Full DataSet Download | |----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---- |-------|-----------------------------------------------------------------------------------------------------------------| | Tool Use Multi-Turn Dialogue | The tool use multi-turn dialogue dataset is in the list of messages formats, Useful for AI Search/Deep Research/Map/Financial Data/etc | YES | 50 instances, [Download](https://www.deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-tool-use-dialogue-open-dataset-example) | 1k, [Download](https://www.deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-tool-use-dialogue-open-dataset) | | Function Calling Tool Use | The dataset contains **messages** and **available tools** as input and output the choosen **tool_call** result indicating which tool to use and the arguments. The datasets are collected from calling SOTA LLM such as GPT, OpenAI o-series, Claude, Qwen, Kimi, etc. | No | 50 instances, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-function-calling-open-dataset-example) | 1k, [Download](https://deepnlp.org/store/ai-agent/ai-agent/pub-deepnlp/agent-function-calling-open-dataset) | | Reinforcement Learning | Sessions of user and assistant' multi-dialogues, rewards from users' feedback in this session, such click of confirmation (Accept/Reject), Upvote, Downvote on the responses, etc. | YES | 50 instances, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset-example) | 1k, [Download](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset) | ### Dataset 3 Reinforcement Learning We collect users' positive and negative feedbacks on the AI Agent workflow. **Positive feedback** include the click on the **ACCEPT** button of the function call results and **Upvote** button. We set rewards as 1.0, and use "reward_description" field to include detailed introduction of their actions. **Negative feedback** include the click on the **REJECT** button of the function call results and **Downvote** button. We set rewards as -1.0. **Dataset Description** | KEY | Type | Description | |----------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | trace_id | String | Identify each unique new user request or API calling | | session_id | String | The identifier of each dialogue, which consists of multiple turns of dialogues and every user input produces a new trace_id | | messages | List of Json Object | Dialogue Messages | | message.reward | scalar | Users' feedback on each individual message or function call, rool result level. ACCEPT/REJECT, 1.0 for positive feedback, -1.0 for negative feedback | | message.reward_description | string | Detailed description of actions of users. | | reward | scalar | Overall feedback of whole session, Buttom Bar Action, 1.0 for positive feedback, -1.0 for negative feedback | | reward_description | string | Details description of the action, such as ACCEPT/REJECT,UPVOTE,DOWNVOTE, etc | <img src="https://raw.githubusercontent.com/AI-Agent-Hub/mcp-marketplace/refs/heads/main/app/mcp_tool_use/docs/route_planning_mcp_v2.jpg" style="height:600px;" alt="AI Agent Marketplace Category"> ** Note that: The rewards are on two folds: each function call or message level, whole dialogue session level. Each message or function call level: Json $.message.reward ,e.g. Users may not be satisfied with LLM choose the wrong tools, they might click rejects or after autorun they may click feedback. Dialogue session level: Json: $reward, Bottom Bar Buttons, Upvote, Downvote, etc. Auto-Run mode, function calls doesn't need users' confirmation and default to Accept. **Positive Reward Example** In this financial data checking scenarios, users' clicks on the ACCEPT to confirm the function calls of tool 'get_stock_price_global_market' and parameters "{\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}" form server finance-agent-mcp-server. ``` { "messages": [{ "role": "user", "content": "Find the Microsoft and Nvidia Stock Price and Market Cap. Who has higher market cap today?" }, { "role": "assistant", "content": "<p>Finding Relevant Tools Using Queries: </p><div><div class=\"div_planning_highlight\">find</div> <div class=\"div_planning_highlight\">market cap</div> <div class=\"div_planning_highlight\">compare</div> <div class=\"div_planning_highlight\">stock</div> <div class=\"div_planning_highlight\">and</div> <div class=\"div_planning_highlight\">finance data</div> <div class=\"div_planning_highlight\">market</div></div>\r<p>Relevant MCP Servers: </p><div></div>\r<p>Relevant Tools: </p> <div><div class=\"div_planning_highlight\">GetAwsSolutionsConstructPattern</div><div class=\"div_planning_highlight\">SearchGenAICDKConstructs</div><div class=\"div_planning_highlight\">LambdaLayerDocumentationProvider</div><div class=\"div_planning_highlight\">pay_order</div><div class=\"div_planning_highlight\">create_refund</div><div class=\"div_planning_highlight\">get_refund</div><div class=\"div_planning_highlight\">sequentialthinking</div><div class=\"div_planning_highlight\">get_stock_price_global_market</div><div class=\"div_planning_highlight\">read_documentation</div><div class=\"div_planning_highlight\">search_documentation</div><div class=\"div_planning_highlight\">recommend</div><div class=\"div_planning_highlight\">brave_local_search</div><div class=\"div_planning_highlight\">firecrawl_scrape</div><div class=\"div_planning_highlight\">firecrawl_search</div><div class=\"div_planning_highlight\">firecrawl_extract</div><div class=\"div_planning_highlight\">firecrawl_deep_research</div><div class=\"div_planning_highlight\">list_directory</div><div class=\"div_planning_highlight\">list_directory_with_sizes</div><div class=\"div_planning_highlight\">move_file</div><div class=\"div_planning_highlight\">tavily_map</div></div>| \n <div class=\"tool-call-container\">\n <div class=\"tool-call-header\">\n <span class=\"arrow\">\u25b6</span>\n <span class=\"header-text\">Call Tool get_stock_price_global_market of Server finance-agent-mcp-server</span>\n </div>\n <div class=\"collapsible-content\">\n <div class=\"div_tool_call_row\"><h3>Parameters</h3><div class=\"div_tool_call_json\"> {\n \"market\": \"US\",\n \"symbol_list\": [\n \"MSFT\",\n \"NVDA\"\n ]\n} </div></div>\n <div class=\"div_tool_call_row div_tool_call_row_options\">\n <input type=\"button\" class=\"agent-button-base agent-button-highlight\" value=\"ACCEPT\" set_on_click=\"true\">\n <input type=\"button\" class=\"agent-button-base\" value=\"REJECT\" set_on_click=\"true\">\n </div>\n <div class=\"div_tool_call_row results-section\"><h3>Results</h3><div class=\"div_tool_call_json\"></div></div>\n </div>\n </div>\n\n|Searching MCP Marketplace Tools and Servers", "context": "{\"tool_id\": \"call_45853133e9a84899adcbb3\", \"tool_name\": \"get_stock_price_global_market\", \"tool_arguments\": {\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}, \"toolname_to_server_dict\": {\"GetAwsSolutionsConstructPattern\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"LambdaLayerDocumentationProvider\": [\"awslabs.cdk-mcp-server\"], \"pay_order\": [\"paypal\"], \"create_refund\": [\"paypal\"], \"get_refund\": [\"paypal\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"read_documentation\": [\"awslabs.aws-documentation-mcp-server\"], \"search_documentation\": [\"awslabs.aws-documentation-mcp-server\"], \"recommend\": [\"awslabs.aws-documentation-mcp-server\"], \"brave_local_search\": [\"brave-search\"], \"firecrawl_scrape\": [\"firecrawl-mcp\"], \"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"list_directory\": [\"filesystem\"], \"list_directory_with_sizes\": [\"filesystem\"], \"move_file\": [\"filesystem\"], \"tavily_map\": [\"tavily-remote-mcp\"]}}" }, { "role": "user", "content": "<action>ACCEPT</action>" }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_45853133e9a84899adcbb3", "type": "tool_use", "function": { "name": "get_stock_price_global_market", "arguments": "{\"symbol_list\": [\"MSFT\", \"NVDA\"], \"market\": \"US\"}" } }] }, { "role": "tool", "tool_call_id": "call_45853133e9a84899adcbb3", "name": "get_stock_price_global_market", "content": "[{\"avg_price\": \"517.10 USD\", \"high\": \"522.82 USD\", \"low\": \"514.02 USD\", \"symbol\": \"MSFT\", \"update_time\": \"\", \"previous_close\": \"520.17 USD\", \"change\": \"-3.07\", \"market_capitalization\": \"3.84T USD\", \"pe_ratio\": \"37.98\", \"data_source\": \"morningstar.com\", \"source_url\": \"https://www.morningstar.com/stocks/xnas/msft/quote\"}, {\"avg_price\": \"182.01 USD\", \"high\": \"182.94 USD\", \"low\": \"180.59 USD\", \"symbol\": \"NVDA\", \"update_time\": \"\", \"previous_close\": \"180.45 USD\", \"change\": \"+1.56\", \"market_capitalization\": \"4.44T USD\", \"pe_ratio\": \"57.04\", \"data_source\": \"morningstar.com\", \"source_url\": \"https://www.morningstar.com/stocks/xnas/nvda/quote\"}]" }, { "role": "system", "content": "\n \n **Background**\n You are an AI Agent expert good at generating answers given users' original query and related context information, such as web pages search result, tool function call results, etc.\n\n **Tasks**:\n - Analyse User's query and consider current date, current time and users' location (country, city, district) if provided\n - Identify the language of user's original query and generate the answers using the same language, such as English, Chinese, French, Russion, etc.\n - Identify the expected output content-type of the answers, choose from \"text/markdown\", \"text/html\", \"application/code\", with default to \"text/html\". If users want you to generate a code with explanation, use mixed content type for different sections, \"application/code\" and \"text/markdown\".\n - Output detailed response to answer users' original question directly, without the thinking process. Please include the necessary information.\n - Coding or Math: If users' query is to write codes or math formula, please generate the code in the first place. Secondly, generate a few points describing the meanings of the main functions or changes.\n - MCP Tool Use or Function Call: Answer Users' questions using the mcp or function call results. Please include all the result information such as the weather forecast for all 3 days, etc.\n - Citation and Data Sources: If the results contains URL of sources and citations, remember to include the main text in plan htmlp a tag, such as <a href=\"{url}\" class=\"citation_url\">{citation_name}</a>\n\n **Output Format**:\n - The output consists of multiple sections. Each section begins with a content type field describing the content-type of the paragraph and then the main content.\n\n **Example**:\n **User Query**:\n Please generate a python code of adding two numbers\n **Output**:\n <content-type>application/code</content-type>\n ```\n def add(a, b):\n return a + b\n ```\n <content-type>text/markdown</content-type>\n ### Description\n This function takes two numbers and output the sum.\n" }, { "role": "user", "content": " \n **Context Variables**\n Datetime: 2025-08-19 10:08:17 CST (Asia/Shanghai)\n User Location: \n **User Query**\n <action>ACCEPT</action>\n" }], "reward": 1.0, "reward_description": "ACCEPT" } ``` **Negative Reward Example** User is not satisfied with LLM choosing "firecrawl_search" and its results and clicks on the downvote button. ``` { "messages": [{ "role": "user", "content": "Latest AI Company Research News" }, { "role": "assistant", "content": "<p>Finding Relevant Tools Using Queries: </p><div><div class=\"div_planning_highlight\">company</div> <div class=\"div_planning_highlight\">latest</div> <div class=\"div_planning_highlight\">AI news</div> <div class=\"div_planning_highlight\">ai</div> <div class=\"div_planning_highlight\">artificial intelligence</div> <div class=\"div_planning_highlight\">technology trends</div> <div class=\"div_planning_highlight\">research updates</div></div>\r<p>Relevant MCP Servers: </p><div></div>\r<p>Relevant Tools: </p> <div><div class=\"div_planning_highlight\">firecrawl_search</div><div class=\"div_planning_highlight\">firecrawl_extract</div><div class=\"div_planning_highlight\">firecrawl_deep_research</div><div class=\"div_planning_highlight\">get_pricing</div><div class=\"div_planning_highlight\">generate_cost_report</div><div class=\"div_planning_highlight\">get_pricing_service_codes</div><div class=\"div_planning_highlight\">get_pricing_attribute_values</div><div class=\"div_planning_highlight\">get_price_list_urls</div><div class=\"div_planning_highlight\">sequentialthinking</div><div class=\"div_planning_highlight\">get_notification_details</div><div class=\"div_planning_highlight\">get_stock_price_global_market</div><div class=\"div_planning_highlight\">tavily_search</div><div class=\"div_planning_highlight\">ExplainCDKNagRule</div><div class=\"div_planning_highlight\">CheckCDKNagSuppressions</div><div class=\"div_planning_highlight\">SearchGenAICDKConstructs</div><div class=\"div_planning_highlight\">show_subscription_plan_details</div><div class=\"div_planning_highlight\">show_subscription_details</div><div class=\"div_planning_highlight\">get_order</div><div class=\"div_planning_highlight\">accept_dispute_claim</div><div class=\"div_planning_highlight\">get_refund</div></div>| \n <div class=\"tool-call-container\">\n <div class=\"tool-call-header\">\n <span class=\"arrow\">\u25b6</span>\n <span class=\"header-text\">Call Tool firecrawl_search of Server firecrawl-mcp</span>\n </div>\n <div class=\"collapsible-content\">\n <div class=\"div_tool_call_row\"><h3>Parameters</h3><div class=\"div_tool_call_json\"> {\n \"country\": \"us\",\n \"lang\": \"en\",\n \"limit\": 5,\n \"query\": \"latest AI company research news\",\n \"scrapeOptions\": {\n \"formats\": [\n \"markdown\"\n ],\n \"onlyMainContent\": true\n }\n} </div></div>\n <div class=\"div_tool_call_row div_tool_call_row_options\">\n <input type=\"button\" class=\"agent-button-base agent-button-highlight\" value=\"ACCEPT\" set_on_click=\"true\">\n <input type=\"button\" class=\"agent-button-base\" value=\"REJECT\" set_on_click=\"true\">\n </div>\n <div class=\"div_tool_call_row results-section\"><h3>Results</h3><div class=\"div_tool_call_json\"></div></div>\n </div>\n </div>\n\n|Searching MCP Marketplace Tools and Servers", "id": "33ea10a7-c66a-48cd-80d3-ac9ace6ae747", "context": "{\"tool_id\": \"call_30fa90e8ecb944bd8613b3\", \"tool_name\": \"firecrawl_search\", \"tool_arguments\": {\"query\": \"latest AI company research news\", \"limit\": 5, \"lang\": \"en\", \"country\": \"us\", \"scrapeOptions\": {\"formats\": [\"markdown\"], \"onlyMainContent\": true}}, \"toolname_to_server_dict\": {\"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"get_pricing\": [\"awslabs.aws-pricing-mcp-server\"], \"generate_cost_report\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_service_codes\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_attribute_values\": [\"awslabs.aws-pricing-mcp-server\"], \"get_price_list_urls\": [\"awslabs.aws-pricing-mcp-server\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_notification_details\": [\"github\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"tavily_search\": [\"tavily-remote-mcp\"], \"ExplainCDKNagRule\": [\"awslabs.cdk-mcp-server\"], \"CheckCDKNagSuppressions\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"show_subscription_plan_details\": [\"paypal\"], \"show_subscription_details\": [\"paypal\"], \"get_order\": [\"paypal\"], \"accept_dispute_claim\": [\"paypal\"], \"get_refund\": [\"paypal\"]}}" }, { "role": "user", "content": "<action>ACCEPT</action>" "reward": 1.0, }, { "role": "assistant", "content": "", "tool_calls": [{ "id": "call_31924021412031204fsdf", "type": "tool_use", "function": { "name": "firecrawl_search", "arguments": "{\"query\": \"AI Research News\"}" } }] }, { "role": "tool", "tool_call_id": "call_31924021412031204fsdf", "name": "firecrawl_search", "content": "\nThe latest AI company research news highlights significant advancements and strategic moves by leading tech firms. Here are some key updates:\n\n- **Google DeepMind** introduced **AlphaFold 3**, a revolutionary model capable of predicting the structure and interactions of all life\u2019s molecules, including proteins, DNA, and RNA, with unprecedented accuracy. This breakthrough could transform drug discovery and molecular biology research. [Read more](https://www.deepmind.com/)\n\n- **OpenAI** is advancing its **o3** and **o4** reasoning models, focusing on improved planning, tool use, and problem-solving capabilities. The company is also reportedly developing a new AI model named **Strawberry**, aimed at enhancing deep reasoning and autonomous task execution. [Source](https://openai.com/research)\n\n- **Anthropic** released **Claude 3.5 Sonnet**, which outperforms its predecessors and competing models in coding, math, and reasoning benchmarks. The model is now being integrated into enterprise workflows for enhanced decision support and automation. [Learn more](https://www.anthropic.com/news/claude-3-5-sonnet)\n\n- **Microsoft** and **NVIDIA** have deepened their AI partnership, co-developing next-gen AI chips and cloud infrastructure to support large-scale AI training. Microsoft is also integrating more AI capabilities into its Azure platform, including custom LLM deployments. [Microsoft AI News](https://news.microsoft.com/ai)\n\n- **xAI**, Elon Musk\u2019s AI company, continues training **Grok-3** on real-time data from X (formerly Twitter), aiming to create a more dynamic and context-aware AI assistant. Grok is being tested for advanced reasoning and multi-modal capabilities. [Updates via X](https://x.com/xai)\n\n- **Meta AI** unveiled **Llama 3.1**, a fully open-source large language model with enhanced multilingual support and reasoning abilities. It is now one of the most widely adopted models in the open-source AI community. [Meta AI Blog](https://ai.meta.com/llama/)\n\nThese developments reflect a rapid pace of innovation in the AI sector, with companies focusing on more powerful, efficient, and accessible models for both research and commercial applications." }, { "role": "assistant", "content": "Start Running Tool firecrawl_search ", "id": "3cbb4504-f080-4bb7-a9cc-9cfaa3a9756d", "context": "" }, { "role": "assistant", "content": "<p>Finding Relevant Tools Using Queries: </p><div><div class=\"div_planning_highlight\">company</div> <div class=\"div_planning_highlight\">latest</div> <div class=\"div_planning_highlight\">AI news</div> <div class=\"div_planning_highlight\">ai</div> <div class=\"div_planning_highlight\">artificial intelligence</div> <div class=\"div_planning_highlight\">technology trends</div> <div class=\"div_planning_highlight\">research updates</div></div>\r<p>Relevant MCP Servers: </p><div></div>\r<p>Relevant Tools: </p> <div><div class=\"div_planning_highlight\">firecrawl_search</div><div class=\"div_planning_highlight\">firecrawl_extract</div><div class=\"div_planning_highlight\">firecrawl_deep_research</div><div class=\"div_planning_highlight\">get_pricing</div><div class=\"div_planning_highlight\">generate_cost_report</div><div class=\"div_planning_highlight\">get_pricing_service_codes</div><div class=\"div_planning_highlight\">get_pricing_attribute_values</div><div class=\"div_planning_highlight\">get_price_list_urls</div><div class=\"div_planning_highlight\">sequentialthinking</div><div class=\"div_planning_highlight\">get_notification_details</div><div class=\"div_planning_highlight\">get_stock_price_global_market</div><div class=\"div_planning_highlight\">tavily_search</div><div class=\"div_planning_highlight\">ExplainCDKNagRule</div><div class=\"div_planning_highlight\">CheckCDKNagSuppressions</div><div class=\"div_planning_highlight\">SearchGenAICDKConstructs</div><div class=\"div_planning_highlight\">show_subscription_plan_details</div><div class=\"div_planning_highlight\">show_subscription_details</div><div class=\"div_planning_highlight\">get_order</div><div class=\"div_planning_highlight\">accept_dispute_claim</div><div class=\"div_planning_highlight\">get_refund</div></div>| \n <div class=\"tool-call-container\">\n <div class=\"tool-call-header\">\n <span class=\"arrow\">\u25b6</span>\n <span class=\"header-text\">Call Tool firecrawl_search of Server firecrawl-mcp</span>\n </div>\n <div class=\"collapsible-content\">\n <div class=\"div_tool_call_row\"><h3>Parameters</h3><div class=\"div_tool_call_json\"> {\n \"country\": \"us\",\n \"lang\": \"en\",\n \"limit\": 5,\n \"query\": \"latest AI company research news\",\n \"scrapeOptions\": {\n \"formats\": [\n \"markdown\"\n ],\n \"onlyMainContent\": true\n }\n} </div></div>\n <div class=\"div_tool_call_row div_tool_call_row_options\">\n <input type=\"button\" class=\"agent-button-base agent-button-highlight\" value=\"ACCEPT\" set_on_click=\"true\">\n <input type=\"button\" class=\"agent-button-base\" value=\"REJECT\" set_on_click=\"true\">\n </div>\n <div class=\"div_tool_call_row results-section\"><h3>Results</h3><div class=\"div_tool_call_json\"></div></div>\n </div>\n </div>\n\n|Searching MCP Marketplace Tools and Servers", "id": "33ea10a7-c66a-48cd-80d3-ac9ace6ae747", "context": "{\"tool_id\": \"call_30fa90e8ecb944bd8613b3\", \"tool_name\": \"firecrawl_search\", \"tool_arguments\": {\"query\": \"latest AI company research news\", \"limit\": 5, \"lang\": \"en\", \"country\": \"us\", \"scrapeOptions\": {\"formats\": [\"markdown\"], \"onlyMainContent\": true}}, \"toolname_to_server_dict\": {\"firecrawl_search\": [\"firecrawl-mcp\"], \"firecrawl_extract\": [\"firecrawl-mcp\"], \"firecrawl_deep_research\": [\"firecrawl-mcp\"], \"get_pricing\": [\"awslabs.aws-pricing-mcp-server\"], \"generate_cost_report\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_service_codes\": [\"awslabs.aws-pricing-mcp-server\"], \"get_pricing_attribute_values\": [\"awslabs.aws-pricing-mcp-server\"], \"get_price_list_urls\": [\"awslabs.aws-pricing-mcp-server\"], \"sequentialthinking\": [\"sequential-thinking\"], \"get_notification_details\": [\"github\"], \"get_stock_price_global_market\": [\"finance-agent-mcp-server\"], \"tavily_search\": [\"tavily-remote-mcp\"], \"ExplainCDKNagRule\": [\"awslabs.cdk-mcp-server\"], \"CheckCDKNagSuppressions\": [\"awslabs.cdk-mcp-server\"], \"SearchGenAICDKConstructs\": [\"awslabs.cdk-mcp-server\"], \"show_subscription_plan_details\": [\"paypal\"], \"show_subscription_details\": [\"paypal\"], \"get_order\": [\"paypal\"], \"accept_dispute_claim\": [\"paypal\"], \"get_refund\": [\"paypal\"]}}" }], "reward": -1.0, "reward_description": "DOWNVOTE" } ```

# Open Agent RL 数据集:高质量AI智能体(AI Agent) | 工具调用与函数调用 | 强化学习数据集 [Github](https://github.com/AI-Agent-Hub/AI-Agent-Marketplace)|[Huggingface](https://huggingface.co/datasets/DeepNLP/AI-Agent-Marketplace-Index)|[Pypi](https://pypi.org/project/ai-agent-marketplace/) | [开源AI智能体集市DeepNLP](https://www.deepnlp.org/store/ai-agent)|[Agent RL 数据集](https://www.deepnlp.org/store/dataset) DeepNLP平台提供**高质量、真实的在线用户请求数据集**,涵盖AI智能体与强化学习(RL)相关数据,旨在帮助大语言模型(LLM)进行基座训练、监督微调(SFT)与后训练,以提升模型在函数调用、工具使用与任务规划方面的能力。本数据集从我们多端客户端(网页端/应用端/小程序)以及[Open OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router)和[Open OneKey MCP Router](https://www.deepnlp.org/agent/onekey-mcp-router)的用户请求中采集与采样得到。部分数据集需要使用[积分](https://www.deepnlp.org/workspace/billing)进行兑换,您可通过评论互动、参与讨论以及向社区上传自有数据集等方式轻松获取更多积分。 访问我们的AI商店数据集专区以挑选[数据集](https://www.deepnlp.org/store/dataset)。 **免责声明**:已对安全隐私保护信息或个性化信息进行标记与过滤。 <img src="https://raw.githubusercontent.com/AI-Agent-Hub/mcp-marketplace/refs/heads/main/app/mcp_tool_use/docs/demo_plot_chart.jpg" style="height:600px;" alt="AI智能体集市分类"> | 数据集名称 | 数据集描述 | 用户反馈支持 | 示例数据集下载 | 完整数据集下载 | |----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---- |-------|-----------------------------------------------------------------------------------------------------------------| | 强化学习(RL) | 包含用户与助手的多轮对话会话、会话中用户反馈对应的奖励(如确认函数调用结果时的接受/点击、对回复的点赞/点踩等)。 | 是 | 50条数据,[下载](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset-example) | 1000条数据,[下载](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset) | ## 1. 数据集特性 **真实用户查询**:大部分高质量数据集采集自我们上线的AI智能体(AI Agent)的查询日志,例如[MCP工具调用智能体](https://agent.deepnlp.org/agent/mcp_tool_use)、[Open OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router)以及[Open OneKey MCP Router](https://www.deepnlp.org/agent/onekey-mcp-router)。 **函数调用与MCP服务器支持**:数据集覆盖了来自开放MCP集市与Playgrounds平台的各类MCP服务器。 **用户行为与人工反馈**:用户的真实反馈对优化AI智能体的训练流程至关重要。我们采集了用户的真实操作行为,例如确认函数调用结果时的**接受/拒绝**操作、对最终回复的**点赞/点踩**操作,以及用户对可交互元素的各类其他反馈。 **多领域与多任务覆盖**:我们覆盖了40+类AI智能体工具使用场景,涵盖信息检索(AI搜索、地图搜索等)、自主AI智能体浏览器使用、计算机操作、数据分析、Excel表格与PowerPoint文档生成等。 **示例AI智能体数据集对话** | 应用领域 | 关联MCP服务器 | 演示链接 | | ---- | ---- | ---- | | 办公文档智能体 | Excel表格、PowerPoint、PDF等 | [示例](https://agent.deepnlp.org/agent/mcp_tool_use/share/ee640008-6bc1-4c3a-832b-2557f985b540) [MCP]() | | AI搜索/深度研究 | Bing、Google自定义搜索、Perplexity、Tavily、Firecrawl等 | [演示](https://agent.deepnlp.org/agent/mcp_tool_use?server=tavily-ai/tavily-mcp) [MCP]() | | 地图与行程规划 | Google地图、高德地图(Amap)、百度地图等 | [示例](https://agent.deepnlp.org/agent/mcp_tool_use/share/8ab0b25c-b72d-4cae-9c86-a852df8c6541) [MCP](https://agent.deepnlp.org/agent/mcp_tool_use?server=amap-mcp/amap-mcp-%E9%AB%98%E5%BE%B7%E5%9C%B0%E5%9B%BE-mcp) [使用MCP]() | | 浏览器操作 | Playwright、Puppeteer等 | [演示](https://agent.deepnlp.org/agent/mcp_tool_use?server=puppeteer/puppeteer) [MCP]() | | 图表、图形与图像生成 | everart、mcp-server-charts(AntV)、canva-mcp | [演示](https://agent.deepnlp.org/agent/mcp_tool_use/share/93d94694-e673-4c3a-832b-2557f985b540bd) [MCP]() | ## 2. 数据集介绍 我们提供以下几类主流AI智能体数据集,数据格式为消息列表JSON格式,同时包含奖励值等标量数据。 | 数据集名称 | 数据集描述 | 用户反馈支持 | 示例数据集下载 | 完整数据集下载 | |----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---- |-------|-----------------------------------------------------------------------------------------------------------------| | 工具调用多轮对话 | 工具调用多轮对话数据集采用消息列表格式,适用于AI搜索、深度研究、地图查询、金融数据处理等场景。 | 是 | 50条数据,[下载](https://www.deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-tool-use-dialogue-open-dataset-example) | 1000条数据,[下载](https://www.deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-tool-use-dialogue-open-dataset) | | 函数调用工具使用 | 该数据集以**消息**与**可用工具**作为输入,输出选定的**工具调用(tool_call)**结果,用于指示需调用的工具与参数。数据集采集自GPT、OpenAI o系列、Claude、通义千问(Qwen)、Kimi等前沿大语言模型的调用记录。 | 否 | 50条数据,[下载](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-function-calling-open-dataset-example) | 1000条数据,[下载](https://www.deepnlp.org/store/ai-agent/ai-agent/pub-deepnlp/agent-function-calling-open-dataset) | | 强化学习(RL) | 包含用户与助手的多轮对话会话、会话中用户反馈对应的奖励(如确认函数调用结果时的接受/点击、对回复的点赞/点踩等)。 | 是 | 50条数据,[下载](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset-example) | 1000条数据,[下载](https://deepnlp.org/store/dataset/dataset/pub-deepnlp/agent-reinforcement-learning-open-dataset) | ### 强化学习数据集(第三类) 我们采集用户对AI智能体工作流程的正负反馈。 **正向反馈**包括点击函数调用结果的**接受(ACCEPT)**按钮与**点赞(Upvote)**按钮,我们将此类反馈的奖励值设为1.0,并通过`reward_description`字段记录用户操作的详细说明。 **负向反馈**包括点击函数调用结果的**拒绝(REJECT)**按钮与**点踩(Downvote)**按钮,我们将此类反馈的奖励值设为-1.0。 **数据集字段说明** | 字段名 | 数据类型 | 字段描述 | |----------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| | trace_id | 字符串(String) | 用于唯一标识每一条新的用户请求或API调用 | | session_id | 字符串(String) | 对话会话标识符,一个会话包含多轮对话,每一次用户输入都会生成一个新的trace_id | | messages | JSON对象列表(List of Json Object) | 对话消息列表 | | message.reward | 标量值(scalar) | 用户对单条消息或函数调用结果的反馈,取值为ACCEPT/REJECT,正向反馈设为1.0,负向反馈设为-1.0 | | message.reward_description | 字符串(string) | 用户操作的详细描述。 | | reward | 标量值(scalar) | 整个对话会话的整体反馈,对应底部栏操作,正向反馈设为1.0,负向反馈设为-1.0 | | reward_description | 字符串(string) | 用户操作的详细描述,例如ACCEPT/REJECT、UPVOTE、DOWNVOTE等 | <img src="https://raw.githubusercontent.com/AI-Agent-Hub/mcp-marketplace/refs/heads/main/app/mcp_tool_use/docs/route_planning_mcp_v2.jpg" style="height:600px;" alt="AI智能体集市分类"> **注意:奖励值分为两个维度:单条函数调用或消息维度,以及完整对话会话维度。** - **单条消息或函数调用维度**:对应JSON路径`$.message.reward`,例如当用户对大语言模型(LLM)选择的错误工具不满意时,可能会点击拒绝按钮;在自动运行模式下,用户也可随时点击反馈按钮。 - **完整对话会话维度**:对应JSON字段`$reward`,对应底部栏的点赞、点踩等按钮操作。 **自动运行模式**下,函数调用无需用户确认,默认视为接受。 **正向奖励示例** json { "messages": [{...}], "reward": 1.0, "reward_description": "ACCEPT" } **负向奖励示例** json { "messages": [{...}], "reward": -1.0, "reward_description": "DOWNVOTE" }
提供机构:
maas
创建时间:
2025-10-13
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个高质量的AI智能体强化学习公开数据集,包含真实用户查询、多轮对话以及用户反馈(如接受/拒绝、点赞/点踩等),并标注了相应的奖励值(正反馈为1.0,负反馈为-1.0)。它覆盖多种工具使用场景和领域,旨在支持AI模型在函数调用和规划等任务上的训练。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务