five

autohub-benchmark

收藏
魔搭社区2025-12-05 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/opencsg/autohub-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
# autohub-benchmark This project designs common use scenarios for web-based code, model, and dataset hosting platforms, and provides corresponding prompts and ground truth. These resources can be used to evaluate the localization performance of visual language models (VLMs) in specialized scenarios. ## Model Hosting Platform GUI Inference | Model | Platform | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) | |-----------|--------------|--------------|-----------|-------------|---------------------| | AriaUI | Huggingface | 70.8 | 12.5 | 6.7 | 100.0 | | | ModelScope | 57.6 | 14.2 | 28.2 | 100.0 | | | OpenCSG | 81.0 | 9.5 | 9.5 | 100.0 | | CogAgent | Huggingface | 73.3 | 26.7 | 0.0 | 100.0 | | | ModelScope | 57.9 | 29.1 | 13.0 | 96.3 | | | OpenCSG | 57.1 | 19.0 | 23.8 | 100.0 | | Qwen3B | Huggingface | 8.3 | 15.8 | 19.2 | 41.7 | | | ModelScope | 0.0 | 28.6 | 20.6 | 49.2 | | | OpenCSG | 4.8 | 4.8 | 9.5 | 19.0 | | Qwen7B | Huggingface | 73.3 | 11.7 | 10.8 | 95.8 | | | ModelScope | 55.5 | 30.2 | 8.5 | 95.2 | | | OpenCSG | 71.4 | 14.3 | 14.3 | 100.0 | | SeeClick | Huggingface | 39.2 | 36.7 | 24.2 | 100.0 | | | ModelScope | 52.4 | 29.0 | 18.6 | 100.0 | | | OpenCSG | 52.4 | 14.3 | 33.3 | 100.0 | | ShowUI | Huggingface | 30.0 | 45.0 | 11.7 | 86.7 | | | ModelScope | 43.3 | 26.7 | 14.3 | 88.9 | | | OpenCSG | 23.8 | 52.4 | 9.5 | 85.7 | Summary: | Model | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) | |-----------|--------------|-----------|-------------|---------------------| | AriaUI | 67.7 | 19.3 | 11.4 | 100.0 | | CogAgent | 63.3 | 34.8 | 3.0 | 98.7 | | Qwen3B | 4.5 | 10.8 | 12.9 | 62.6 | | Qwen7B | 66.9 | 18.8 | 10.1 | 100.0 | | SeeClick | 45.6 | 26.2 | 26.8 | 97.9 | | ShowUI | 32.1 | 42.8 | 11.6 | 85.0 | ## Code Hosting Platform GUI Inference | Model | Platform | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) | |-----------|--------------|--------------|-----------|-------------|---------------------| | AriaUI | GitCode | 57.1 | 28.5 | 14.3 | 100.0 | | | Gitea | 71.4 | 28.5 | 0.0 | 100.0 | | | Gitee | 57.1 | 28.5 | 14.3 | 100.0 | | | Github | 71.4 | 14.3 | 14.3 | 100.0 | | | GitLab | 71.4 | 14.3 | 14.3 | 100.0 | | CogAgent | GitCode | 71.4 | 28.5 | 0.0 | 100.0 | | | Gitea | 71.4 | 28.5 | 0.0 | 100.0 | | | Gitee | 100.0 | 0.0 | 0.0 | 100.0 | | | Github | 57.1 | 42.8 | 0.0 | 100.0 | | | GitLab | 85.7 | 14.3 | 0.0 | 100.0 | | Qwen3B | GitCode | 14.2 | 28.5 | 42.8 | 85.7 | | | Gitea | 14.2 | 57.1 | 14.2 | 85.7 | | | Gitee | 14.2 | 42.8 | 28.5 | 100.0 | | | Github | 0.0 | 28.5 | 57.1 | 85.7 | | | GitLab | 14.2 | 28.5 | 28.5 | 71.4 | | Qwen7B | GitCode | 71.4 | 0.0 | 28.5 | 100.0 | | | Gitea | 57.1 | 28.5 | 14.2 | 100.0 | | | Gitee | 28.5 | 57.1 | 14.2 | 100.0 | | | Github | 0.0 | 14.2 | 85.7 | 100.0 | | | GitLab | 85.7 | 14.2 | 0.0 | 100.0 | | SeeClick | GitCode | 28.5 | 48.5 | 28.5 | 100.0 | | | Gitea | 28.5 | 28.5 | 48.5 | 100.0 | | | Gitee | 28.5 | 57.1 | 14.2 | 100.0 | | | Github | 14.2 | 57.1 | 28.5 | 100.0 | | | GitLab | 0.0 | 71.4 | 28.5 | 100.0 | | ShowUI | GitCode | 28.5 | 48.5 | 14.2 | 85.7 | | | Gitea | 57.1 | 48.5 | 0.0 | 100.0 | | | Gitee | 57.1 | 28.5 | 0.0 | 85.7 | | | Github | 48.5 | 14.2 | 28.5 | 85.7 | | | GitLab | 48.5 | 14.2 | 14.2 | 71.4 | Summary: | Model | Platform | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) | |-----------|--------------|--------------|-----------|-------------|---------------------| | AriaUI | 65.7 | 22.8 | 11.4 | 100.0 | | CogAgent | 62.9 | 22.8 | 0.0 | 100.0 | | Qwen3B | 11.4 | 37.1 | 37.1 | 85.7 | | Qwen7B | 48.5 | 22.9 | 28.6 | 100.0 | | SeeClick | 20.0 | 51.4 | 28.6 | 100.0 | | ShowUI | 45.7 | 28.6 | 11.4 | 85.7 |

# autohub-benchmark 本项目针对基于Web的代码、模型与数据集托管平台设计了通用使用场景,并配套提供了对应的提示词(Prompt)与基准真值(ground truth)。上述资源可用于评估视觉语言模型(Visual Language Model,VLMs)在垂直专业场景下的落地性能。 ## 模型托管平台图形用户界面(GUI)推理任务 | 模型 | 平台 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) | |-----------|--------------|--------------|-----------|-------------|---------------------| | AriaUI | Huggingface | 70.8 | 12.5 | 6.7 | 100.0 | | | ModelScope | 57.6 | 14.2 | 28.2 | 100.0 | | | OpenCSG | 81.0 | 9.5 | 9.5 | 100.0 | | CogAgent | Huggingface | 73.3 | 26.7 | 0.0 | 100.0 | | | ModelScope | 57.9 | 29.1 | 13.0 | 96.3 | | | OpenCSG | 57.1 | 19.0 | 23.8 | 100.0 | | Qwen3B | Huggingface | 8.3 | 15.8 | 19.2 | 41.7 | | | ModelScope | 0.0 | 28.6 | 20.6 | 49.2 | | | OpenCSG | 4.8 | 4.8 | 9.5 | 19.0 | | Qwen7B | Huggingface | 73.3 | 11.7 | 10.8 | 95.8 | | | ModelScope | 55.5 | 30.2 | 8.5 | 95.2 | | | OpenCSG | 71.4 | 14.3 | 14.3 | 100.0 | | SeeClick | Huggingface | 39.2 | 36.7 | 24.2 | 100.0 | | | ModelScope | 52.4 | 29.0 | 18.6 | 100.0 | | | OpenCSG | 52.4 | 14.3 | 33.3 | 100.0 | | ShowUI | Huggingface | 30.0 | 45.0 | 11.7 | 86.7 | | | ModelScope | 43.3 | 26.7 | 14.3 | 88.9 | | | OpenCSG | 23.8 | 52.4 | 9.5 | 85.7 | ### 汇总结果 | 模型 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) | |-----------|--------------|-----------|-------------|---------------------| | AriaUI | 67.7 | 19.3 | 11.4 | 100.0 | | CogAgent | 63.3 | 34.8 | 3.0 | 98.7 | | Qwen3B | 4.5 | 10.8 | 12.9 | 62.6 | | Qwen7B | 66.9 | 18.8 | 10.1 | 100.0 | | SeeClick | 45.6 | 26.2 | 26.8 | 97.9 | | ShowUI | 32.1 | 42.8 | 11.6 | 85.0 | ## 代码托管平台图形用户界面(GUI)推理任务 | 模型 | 平台 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) | |-----------|--------------|--------------|-----------|-------------|---------------------| | AriaUI | GitCode | 57.1 | 28.5 | 14.3 | 100.0 | | | Gitea | 71.4 | 28.5 | 0.0 | 100.0 | | | Gitee(码云)| 57.1 | 28.5 | 14.3 | 100.0 | | | GitHub | 71.4 | 14.3 | 14.3 | 100.0 | | | GitLab | 71.4 | 14.3 | 14.3 | 100.0 | | CogAgent | GitCode | 71.4 | 28.5 | 0.0 | 100.0 | | | Gitea | 71.4 | 28.5 | 0.0 | 100.0 | | | Gitee(码云)| 100.0 | 0.0 | 0.0 | 100.0 | | | GitHub | 57.1 | 42.8 | 0.0 | 100.0 | | | GitLab | 85.7 | 14.3 | 0.0 | 100.0 | | Qwen3B | GitCode | 14.2 | 28.5 | 42.8 | 85.7 | | | Gitea | 14.2 | 57.1 | 14.2 | 85.7 | | | Gitee(码云)| 14.2 | 42.8 | 28.5 | 100.0 | | | GitHub | 0.0 | 28.5 | 57.1 | 85.7 | | | GitLab | 14.2 | 28.5 | 28.5 | 71.4 | | Qwen7B | GitCode | 71.4 | 0.0 | 28.5 | 100.0 | | | Gitea | 57.1 | 28.5 | 14.2 | 100.0 | | | Gitee(码云)| 28.5 | 57.1 | 14.2 | 100.0 | | | GitHub | 0.0 | 14.2 | 85.7 | 100.0 | | | GitLab | 85.7 | 14.2 | 0.0 | 100.0 | | SeeClick | GitCode | 28.5 | 48.5 | 28.5 | 100.0 | | | Gitea | 28.5 | 28.5 | 48.5 | 100.0 | | | Gitee(码云)| 28.5 | 57.1 | 14.2 | 100.0 | | | GitHub | 14.2 | 57.1 | 28.5 | 100.0 | | | GitLab | 0.0 | 71.4 | 28.5 | 100.0 | | ShowUI | GitCode | 28.5 | 48.5 | 14.2 | 85.7 | | | Gitea | 57.1 | 48.5 | 0.0 | 100.0 | | | Gitee(码云)| 57.1 | 28.5 | 0.0 | 85.7 | | | GitHub | 48.5 | 14.2 | 28.5 | 85.7 | | | GitLab | 48.5 | 14.2 | 14.2 | 71.4 | ### 汇总结果 | 模型 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) | |-----------|--------------|-----------|-------------|---------------------| | AriaUI | 65.7 | 22.8 | 11.4 | 100.0 | | CogAgent | 62.9 | 22.8 | 0.0 | 100.0 | | Qwen3B | 11.4 | 37.1 | 37.1 | 85.7 | | Qwen7B | 48.5 | 22.9 | 28.6 | 100.0 | | SeeClick | 20.0 | 51.4 | 28.6 | 100.0 | | ShowUI | 45.7 | 28.6 | 11.4 | 85.7 |
提供机构:
maas
创建时间:
2025-07-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作