five

II-Search-Benchmark-Details

收藏
魔搭社区2025-12-05 更新2025-08-09 收录
下载链接:
https://modelscope.cn/datasets/Intelligent-Internet/II-Search-Benchmark-Details
下载链接
链接失效反馈
官方服务:
资源简介:
# Inspect-Search-Models-Benchmarking-Result Overall result | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B | II-Search-CIR-4B| | --- | --- | --- | --- | --- |--- | | OpenAI/SimpleQA | 76.8 | 80.1 | 81.8 | 91.8 | 91.8 | | Google/Frames | 30.7 | 24.8 | 34.0 | 67.5 | 72.2 | | Seal_0 | 6.31 | 2.7 | 1.8 | 22.5 | 26.4| Simple QA (SerpDev) | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B |II-Search-CIR-4B| | --- | --- | --- | --- | --- |--- | | Pass rate % | 76.8 | 80.1 | 81.8 | 91.8 |91.8 | | # Search | 1.0 | 0.9 | 2.1 | 2.2 | 2.5| | # Visit | 0.1 | 1.9 | 6.4 | 3.5 | 5.3| | # Tool used | 1.1 | 2.8 | 8.5 | 5.7 | 7.8| Frames (SerpDev) | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B |II-Search-CIR-4B| | --- | --- | --- | --- | --- |--- | | Pass rate % | 30.7 | 24.8 | 34.0 | 67.5 |72.2 | | # Search | 1.1 | 1.0 | 7.4 | 4.2 |6.1 | | # Visit | 0.1 | 3.7 | 7.2 | 3.2 |5.0 | | # Tool used | 1.2 | 4.7 | 14.6 | 7.4 |11.1 | Seal_0 (SerpDev) | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B | II-Search-CIR-4B| | --- | --- | --- | --- | --- |--- | | Pass rate % | 6.31 | 2.7 | 1.8 | 22.5 | 26.4| | # Search | 0.9 | 0.9 | 6.6 | 4.3 | 5.9| | # Visit | 0.1 | 5.2 | 10.0 | 5.7 | 7.7| | # Tool used | 1.0 | 6.1 | 16.6 | 10.0 | 13.5|

# 检索模型基准测试结果集(Inspect-Search-Models-Benchmarking-Result) 整体基准测试结果 | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B | II-Search-CIR-4B | | --- | --- | --- | --- | --- | --- | | OpenAI/SimpleQA数据集 | 76.8 | 80.1 | 81.8 | 91.8 | 91.8 | | Google/Frames数据集 | 30.7 | 24.8 | 34.0 | 67.5 | 72.2 | | Seal_0数据集 | 6.31 | 2.7 | 1.8 | 22.5 | 26.4 | 简单问答任务(Simple QA, 搜索开发基准(SerpDev)) | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B | II-Search-CIR-4B | | --- | --- | --- | --- | --- | --- | | 通过率(%) | 76.8 | 80.1 | 81.8 | 91.8 | 91.8 | | 检索次数 | 1.0 | 0.9 | 2.1 | 2.2 | 2.5 | | 访问次数 | 0.1 | 1.9 | 6.4 | 3.5 | 5.3 | | 工具使用次数 | 1.1 | 2.8 | 8.5 | 5.7 | 7.8 | 框架任务(Frames, 搜索开发基准(SerpDev)) | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B | II-Search-CIR-4B | | --- | --- | --- | --- | --- | --- | | 通过率(%) | 30.7 | 24.8 | 34.0 | 67.5 | 72.2 | | 检索次数 | 1.1 | 1.0 | 7.4 | 4.2 | 6.1 | | 访问次数 | 0.1 | 3.7 | 7.2 | 3.2 | 5.0 | | 工具使用次数 | 1.2 | 4.7 | 14.6 | 7.4 | 11.1 | Seal_0任务(搜索开发基准(SerpDev)) | | Qwen 4B | Jan 4B | WebSailor-3B | II-Search-4B | II-Search-CIR-4B | | --- | --- | --- | --- | --- | --- | | 通过率(%) | 6.31 | 2.7 | 1.8 | 22.5 | 26.4 | | 检索次数 | 0.9 | 0.9 | 6.6 | 4.3 | 5.9 | | 访问次数 | 0.1 | 5.2 | 10.0 | 5.7 | 7.7 | | 工具使用次数 | 1.0 | 6.1 | 16.6 | 10.0 | 13.5 |
提供机构:
maas
创建时间:
2025-08-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作