perplexity-ai/draco
收藏Hugging Face2026-02-20 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/perplexity-ai/draco
下载链接
链接失效反馈官方服务:
资源简介:
DRACO基准测试是一个跨领域的基准测试,用于评估深度研究系统的准确性、完整性和客观性。它包含复杂的、开放性的研究任务,每个任务都配有专家制定的评分标准。任务涵盖10个领域,需要从40个国家的信息源中获取信息。每个任务都有详细的任务特定评分标准,平均包含约40个评估标准,分布在四个维度:事实准确性、分析的广度和深度、呈现质量和引用质量。任务来源于Perplexity Deep Research上的实际用户查询,经过系统性的重新表述、增强和过滤,以确保严谨性。评分标准由26位领域专家通过多阶段迭代审查过程和任务级饱和测试创建和验证。数据集格式为单个JSONL文件,包含100个条目,每个条目包含任务ID、领域、问题和答案(即评分标准)。
The DRACO Benchmark is a cross-domain benchmark for evaluating deep research systems on accuracy, completeness, and objectivity. It consists of complex, open-ended research tasks with expert-curated rubrics. Tasks span 10 domains and require drawing on information sources from 40 countries. Each task is paired with a detailed, task-specific rubric featuring an average of ~40 evaluation criteria across four axes: factual accuracy, breadth and depth of analysis, presentation quality, and citation quality. Each task originates from actual user queries on Perplexity Deep Research, which are systematically reformulated, augmented, and filtered to ensure rigor. Rubrics were created and validated by 26 domain experts through a multi-stage iterative review process and task-level saturation testing. The dataset is a single JSONL file (`test.jsonl`) with 100 entries, each containing a task ID, domain, problem, and answer (i.e., the rubric).
提供机构:
perplexity-ai



