EDR-200

Name: EDR-200
Creator: maas
Published: 2025-12-05 16:54:42
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/Salesforce/EDR-200

下载链接

链接失效反馈

官方服务：

资源简介：

# Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Paper: [Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics](https://arxiv.org/abs/2510.17797) Code: [https://github.com/SalesforceAIResearch/enterprise-deep-research](https://github.com/SalesforceAIResearch/enterprise-deep-research) ### Dataset Overview **EDR-200** contains 201 complete agentic research trajectories generated by Enterprise Deep Research—99 queries from DeepResearch Bench and 102 queries from DeepConsult. Unlike prior benchmarks that only capture final outputs, these trajectories expose the full reasoning process across search, reflection, and synthesis steps, enabling fine-grained analysis of agentic planning and decision-making dynamics. <div style="text-align: center;"> <img src="https://github.com/SalesforceAIResearch/enterprise-deep-research/blob/main/assets/edr_ppl.png?raw=true" alt="EDR System Overview" width="620" style="margin: auto;"> </div> **NOTE:** This dataset was generated using Gemini and should not be used to develop models that compete with Google. ### Getting Started Load the dataset with HuggingFace: ```python from datasets import load_dataset # Load the full dataset dataset = load_dataset("Salesforce/EDR-200") # Access a trajectory example = dataset['train'][0] print(f"Query: {example['query']}") print(f"Benchmark: {example['benchmark']}") print(f"Iterations: {example['num_loops']}") print(f"Report length: {len(example['report'].split())} words") # Parse trajectory (stored as JSON string) import json trajectory = json.loads(example['trajectory']) print(f"First iteration tool calls: {trajectory[0]['num_tool_calls']}") ``` ### Structure Each trajectory in EDR-200 contains: - **`query`**: The research question (e.g., "What are the key trends in enterprise AI adoption?") - **`num_loops`**: Number of research iterations performed - **`trajectory`**: Complete sequence of tool calls and intermediate outputs (JSON format) - **`report`**: Final markdown research report - **`benchmark`**: Source benchmark ("DeepResearch Bench" or "Deep Consult") #### Trajectory Format Each trajectory contains multiple iterations. Here's the structure: ```json [ { "iteration": 0, "num_tool_calls": 12, "tool_calls": [ { "id": "call_1", "type": "function", "function": { "name": "decompose_query", "arguments": {"query": "...", "knowledge_gap": "..."} }, "result": {"queries": [...]} }, { "id": "call_2", "type": "function", "function": { "name": "general_search", "arguments": {"query": "..."} }, "result": {"num_sources": 5, "sources": [...]} }, { "id": "call_3", "type": "function", "function": { "name": "generate_report", "arguments": {...} }, "result": {"updated_summary_length": 1250, "num_sources_cited": 5} }, { "id": "call_4", "type": "function", "function": { "name": "reflect_on_report", "arguments": {} }, "result": { "research_complete": false, "knowledge_gap": "...", "follow_up_query": "..." } } ], "running_report": "## Section 1...", "num_sources": 5 } ] ``` **Tool Types:** - `decompose_query`: Breaks down the research question into sub-queries for searches - `general_search`, `academic_search`, etc.: Execute searches and gather sources - `generate_report`: Synthesizes information into structured report sections - `reflect_on_report`: Identifies knowledge gaps and determines next steps ### Dataset Statistics | Metric | Value | |--------|-------| | Total Trajectories | 201 | | Avg. Iterations per Trajectory | 7.19 | | Avg. Tool Calls per Trajectory | 49.88 | | Avg. Tool Calls per Iteration | 6.93 | | Avg. Searches per Trajectory | 28.30 | | Avg. Report Length | 6,523 words | | Avg. Report Growth per Iteration | 600 words | ### Benchmark Results <div style="text-align: center;"> <img src="https://github.com/SalesforceAIResearch/enterprise-deep-research/blob/main/assets/leaderboard.png?raw=true" alt="Model Leaderboard" width="620" style="margin: auto;"> </div> ### Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people's lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP. ### Citation If you use our code or dataset in your work, please cite our paper: ```bibtex @article{prabhakar2025enterprisedeepresearch, title={Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics}, author={Prabhakar, Akshara and Ram, Roshan and Chen, Zixiang and Savarese, Silvio and Wang, Frank and Xiong, Caiming and Wang, Huan and Yao, Weiran}, journal={arXiv preprint arXiv:2510.17797}, year={2025} } ```

# 企业深度研究（Enterprise Deep Research）：面向企业分析的可操控多智能体深度研究 **论文**：[企业深度研究：面向企业分析的可操控多智能体深度研究](https://arxiv.org/abs/2510.17797) **代码仓库**：[https://github.com/SalesforceAIResearch/enterprise-deep-research](https://github.com/SalesforceAIResearch/enterprise-deep-research) ### 数据集概览 **EDR-200** 包含由企业深度研究生成的201条完整智能体研究轨迹——其中99条查询来自深度研究基准测试集（DeepResearch Bench），102条查询来自深度咨询基准测试集（DeepConsult）。与仅记录最终输出的现有基准测试集不同，这些轨迹完整展示了搜索、反思与合成全流程的推理过程，可实现对智能体规划与决策动态的细粒度分析。 <div style="text-align: center;"> <img src="https://github.com/SalesforceAIResearch/enterprise-deep-research/blob/main/assets/edr_ppl.png?raw=true" alt="EDR系统架构概览" width="620" style="margin: auto;"> </div> **注意**：本数据集基于Gemini生成，不得用于开发与谷歌（Google）竞争的模型。 ### 快速上手通过拥抱脸（HuggingFace）加载数据集： python from datasets import load_dataset # 加载完整数据集 dataset = load_dataset("Salesforce/EDR-200") # 访问单条轨迹 example = dataset['train'][0] print(f"查询：{example['query']}") print(f"基准测试集来源：{example['benchmark']}") print(f"迭代次数：{example['num_loops']}") print(f"报告字数：{len(example['report'].split())} 词") # 解析轨迹（以JSON字符串形式存储） import json trajectory = json.loads(example['trajectory']) print(f"首次迭代工具调用次数：{trajectory[0]['num_tool_calls']}") ### 数据集结构 EDR-200中的每条轨迹包含以下字段： - **`query`**：研究问题（例如："企业人工智能（AI）落地的关键趋势有哪些？"） - **`num_loops`**：执行的研究迭代次数 - **`trajectory`**：工具调用与中间输出的完整序列（JSON格式） - **`report`**：最终的Markdown格式研究报告 - **`benchmark`**：来源基准测试集（"DeepResearch Bench"或"DeepConsult"） #### 轨迹格式每条轨迹包含多个迭代环节，其结构如下： json [ { "iteration": 0, "num_tool_calls": 12, "tool_calls": [ { "id": "call_1", "type": "function", "function": { "name": "decompose_query", "arguments": {"query": "...", "knowledge_gap": "..."} }, "result": {"queries": [...]} }, { "id": "call_2", "type": "function", "function": { "name": "general_search", "arguments": {"query": "..."} }, "result": {"num_sources": 5, "sources": [...]} }, { "id": "call_3", "type": "function", "function": { "name": "generate_report", "arguments": {...} }, "result": {"updated_summary_length": 1250, "num_sources_cited": 5} }, { "id": "call_4", "type": "function", "function": { "name": "reflect_on_report", "arguments": {} }, "result": { "research_complete": false, "knowledge_gap": "...", "follow_up_query": "..." } } ], "running_report": "## Section 1...", "num_sources": 5 } ] **工具类型**： - `decompose_query`：将研究问题拆解为子查询以开展搜索 - `general_search`、`academic_search`等：执行搜索并收集数据源 - `generate_report`：将信息整合为结构化的报告章节 - `reflect_on_report`：识别知识缺口并确定后续研究步骤 ### 数据集统计指标 | 指标 | 数值 | |--------|-------| | 总轨迹数 | 201 | | 单轨迹平均迭代次数 | 7.19 | | 单轨迹平均工具调用次数 | 49.88 | | 单迭代平均工具调用次数 | 6.93 | | 单轨迹平均搜索次数 | 28.30 | | 单报告平均字数 | 6,523 | | 单迭代平均报告增长字数 | 600 | ### 基准测试结果 <div style="text-align: center;"> <img src="https://github.com/SalesforceAIResearch/enterprise-deep-research/blob/main/assets/leaderboard.png?raw=true" alt="模型性能排行榜" width="620" style="margin: auto;"> </div> ### 伦理考量本数据集仅用于学术论文相关的研究目的。本团队开发的模型、数据集与代码并未针对所有下游场景进行专门设计与评估。我们强烈建议用户在部署该模型前，对其准确性、安全性与公平性相关的潜在问题进行评估与处理。我们鼓励用户考虑人工智能（AI）的普遍局限性，遵守适用法律法规，并在选择应用场景时采用最佳实践，尤其是在错误或滥用可能严重影响个人生活、权利或安全的高风险场景中。如需了解更多应用场景相关指导，请参考我们的可接受使用政策（AUP）与AI可接受使用政策（AI AUP）。 ### 引用方式如果您在工作中使用了本团队的代码或数据集，请引用以下论文： bibtex @article{prabhakar2025enterprisedeepresearch, title={Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics}, author={Prabhakar, Akshara and Ram, Roshan and Chen, Zixiang and Savarese, Silvio and Wang, Frank and Xiong, Caiming and Wang, Huan and Yao, Weiran}, journal={arXiv preprint arXiv:2510.17797}, year={2025} }

提供机构：

maas

创建时间：

2025-10-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集