five

HERB

收藏
魔搭社区2025-08-15 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Salesforce/HERB
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for HERB ## Dataset Description [HERB]() is a benchmark for evaluating LLM agents’ ability to perform **Deep Search** and **Long Context Reasoning**. It is generated using a synthetic data pipeline that simulates business workflows across product planning, development, and support stages, generating interconnected content with realistic noise and multi-hop questions with guaranteed ground-truth answers. ### Directory Structure ``` data/ ├── metadata/ │ ├── customers_data.json │ ├── salesforce_team.json │ └── employee.json └── products/ ├── TrendForce.json ├── ContextForce.json ├── CollaborationForce.json └── ... (other product files) ``` ### Contents #### 1. `metadata/` This folder contains supporting data about employees and customers involved in products. - **customers_data.json** Contains a list of customer profiles, each with fields such as `name`, `role`, `company`, and a unique `id` (e.g., `CUST-0001`). - **salesforce_team.json** Describes the organizational structure of the Salesforce team, including VPs, engineering leads, engineers, and QA specialists. The structure is hierarchical, with each leader listing their direct reports and their roles. - **employee.json** A mapping of employee IDs to detailed employee profiles, including `employee_id`, `name`, `role`, `location`, and `org`. This file is used to resolve references in other files (such as team or product assignments). #### 2. `products/` This folder contains data for each product in SynthEKG/HERB. Each product has its own JSON file, named as `.json`. > **RAG Evaluation Note:** For RAG evaluations, do not use the `team` and `customers` fields directly to answer questions. These fields are provided only for oracle/long-context evaluationsettings only. For RAG evaluations, these should be inferred from either other artifacts (e.g., Slack messages) or from `metadata/*`. Each product file typically contains: - **team**: List of employee IDs (`eid_...`) who are part of the product team. - **customers**: List of customer IDs (`CUST-...`) associated with the product. - **artifacts**: Array of **Slack messages/ meeting transcripts/ meeting chats/ documents/ urls/ pull requests/ answerable questions/ unanswerable questions** related to the product. Example structure from `TrendForce.json`: ```json { "team": ["eid_792d7501", "eid_82e9fcef", ...], "customers": ["CUST-0010", "CUST-0075", ...], "slack": [ { "sender": "eid_36319f22", "message": "Hi team, I just wanted to kick off our discussion...", "timestamp": "2026-03-12T08:24:00", "id": "20260312-0-df79b" }, ... ], ..... } ``` ## Paper Information - Paper: https://arxiv.org/abs/2506.23139 - Code: https://github.com/SalesforceAIResearch/HERB ## Citation ```bibtex @article{choubey2025benchmarkingdeepsearchheterogeneous, title={Benchmarking Deep Search over Heterogeneous Enterprise Data}, author={Prafulla Kumar Choubey and Xiangyu Peng and Shilpa Bhagavath and Kung-Hsiang Huang and Caiming Xiong and Chien-Sheng Wu}, year={2025}, eprint={2506.23139}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.23139}, } ``` ## Ethical Considerations This dataset was generated using GPT-4o and should not be used to develop models that compete with OpenAI. This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people's lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

# HERB 数据集卡片 ## 数据集概述 [HERB]() 是用于评估大语言模型(LLM)智能体(AI Agent)执行深度搜索(Deep Search)与长上下文推理能力的基准测试集。该数据集通过合成数据流水线生成,模拟覆盖产品规划、开发与支持全阶段的业务工作流,生成带有真实噪声的互联内容,以及带有确定标准答案的多跳问题。 ### 目录结构 data/ ├── metadata/ │ ├── customers_data.json │ ├── salesforce_team.json │ └── employee.json └── products/ ├── TrendForce.json ├── ContextForce.json ├── CollaborationForce.json └── ... (other product files) ### 数据内容 #### 1. `metadata/` 该文件夹存储与产品相关的员工与客户辅助数据。 - **customers_data.json** 包含客户简档列表,每条简档带有`name`(姓名)、`role`(职位)、`company`(所属公司)与唯一`id`(标识符,例如`CUST-0001`)等字段。 - **salesforce_team.json** 描述Salesforce团队的层级组织结构,涵盖副总裁、工程负责人、工程师与QA专员,每位领导者会列出其直接下属及对应职位。 - **employee.json** 实现员工ID到详细员工简档的映射,包含`employee_id`(员工ID)、`name`(姓名)、`role`(职位)、`location`(工作地点)与`org`(所属部门),该文件用于解析其他文件中的引用(如团队或产品分配信息)。 #### 2. `products/` 该文件夹存储SynthEKG/HERB中各产品的相关数据,每个产品对应一个以`.json`命名的独立文件。 > **检索增强生成(RAG)评估说明:** 在RAG评估场景中,请勿直接使用`team`与`customers`字段回答问题。此类字段仅用于基准/长上下文评估设置。对于RAG评估,相关信息应从其他工件(如Slack消息)或`metadata/*`目录中推断得到。 每个产品文件通常包含以下内容: - **team**:产品团队所包含的员工ID列表(格式为`eid_...`)。 - **customers**:与该产品关联的客户ID列表(格式为`CUST-...`)。 - **artifacts**:与该产品相关的**Slack消息、会议纪要、会议聊天、文档、链接、拉取请求、可回答问题与不可回答问题**数组。 `TrendForce.json`的示例结构如下: json { "team": ["eid_792d7501", "eid_82e9fcef", ...], "customers": ["CUST-0010", "CUST-0075", ...], "slack": [ { "sender": "eid_36319f22", "message": "Hi team, I just wanted to kick off our discussion...", "timestamp": "2026-03-12T08:24:00", "id": "20260312-0-df79b" }, ... ], ..... } ## 论文信息 - 论文:https://arxiv.org/abs/2506.23139 - 代码:https://github.com/SalesforceAIResearch/HERB ## 引用 bibtex @article{choubey2025benchmarkingdeepsearchheterogeneous, title={Benchmarking Deep Search over Heterogeneous Enterprise Data}, author={Prafulla Kumar Choubey and Xiangyu Peng and Shilpa Bhagavath and Kung-Hsiang Huang and Caiming Xiong and Chien-Sheng Wu}, year={2025}, eprint={2506.23139}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.23139}, } ## 伦理考量 本数据集通过GPT-4o生成,不得用于开发与OpenAI形成竞争关系的模型。本次发布仅用于支撑学术论文的研究用途。本团队的模型、数据集与代码并未针对所有下游场景进行专门设计与评估。我们强烈建议用户在部署该模型前,对其准确性、安全性与公平性相关的潜在问题进行评估与处理。我们鼓励用户考量人工智能的通用局限性,遵守适用法律法规,并在选择使用场景时采用最佳实践,尤其针对那些错误或滥用可能严重影响人们生活、权利或安全的高风险场景。如需了解使用场景的进一步指导,请参考我们的AUP与AI AUP。
提供机构:
maas
创建时间:
2025-08-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作