search-dataset
收藏AI Search Providers Benchmark 数据集概述
数据集来源
- 来源: Talc AI SearchBench 仓库
- 文件位置:
dataset/data.jsonl
数据集结构
数据集包含以下字段:
- Question: 发送到AI搜索提供者的查询。
- Result: 提供者返回的摘要文本。
- Search Results: 包含提供者返回的网页链接、标题和描述。
- Response Time: 从提供者接收响应的时间,以毫秒为单位。
示例响应
json { "id": "c0683ac6-baee-4e2a-9290-8b734b777301", "question": "What did safety reviews conclude about the danger of experiments at the Large Hadron Collider?", "result": "Safety reviews have consistently concluded that the experiments at the Large Hadron Collider pose no significant risk to the public or the environment.", "search_results": [ { "title": "CERNs Safety Assessment", "url": "https://home.cern/science/experiments/safety", "description": "An overview of the safety measures and assessments conducted by CERN regarding the LHC experiments." }, { "title": "LHC Safety: Public Concerns Addressed", "url": "https://www.scientificamerican.com/article/lhc-safety-public-concerns/", "description": "This article addresses public concerns about the safety of the LHC and explains why these fears are unfounded." } ], "response_time": 10 }
数据集分类
数据集分为以下四大类别:
- Simple: 基本问题,需要最少的分析。
- Complex: 需要跨多个来源综合分析的问题。
- Hallucination Inducing: 包含错误前提的问题,用于测试AI的事实准确性。
- News: 答案因最近发展而变化的问题。
数据获取过程
- 数据获取: 使用简单的抓取脚本从各个AI搜索提供者处提取数据。
- 存储位置: 抓取的数据存储在
results目录中。
未来方向
- 更新计划: 定期更新基准,以反映AI搜索技术的最新进展。
- 用户反馈: 欢迎用户提供反馈,以改进基准的实用性和用户导向性。




