autohub-benchmark
收藏魔搭社区2025-12-05 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/opencsg/autohub-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
# autohub-benchmark
This project designs common use scenarios for web-based code, model, and dataset hosting platforms, and provides corresponding prompts and ground truth. These resources can be used to evaluate the localization performance of visual language models (VLMs) in specialized scenarios.
## Model Hosting Platform GUI Inference
| Model | Platform | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) |
|-----------|--------------|--------------|-----------|-------------|---------------------|
| AriaUI | Huggingface | 70.8 | 12.5 | 6.7 | 100.0 |
| | ModelScope | 57.6 | 14.2 | 28.2 | 100.0 |
| | OpenCSG | 81.0 | 9.5 | 9.5 | 100.0 |
| CogAgent | Huggingface | 73.3 | 26.7 | 0.0 | 100.0 |
| | ModelScope | 57.9 | 29.1 | 13.0 | 96.3 |
| | OpenCSG | 57.1 | 19.0 | 23.8 | 100.0 |
| Qwen3B | Huggingface | 8.3 | 15.8 | 19.2 | 41.7 |
| | ModelScope | 0.0 | 28.6 | 20.6 | 49.2 |
| | OpenCSG | 4.8 | 4.8 | 9.5 | 19.0 |
| Qwen7B | Huggingface | 73.3 | 11.7 | 10.8 | 95.8 |
| | ModelScope | 55.5 | 30.2 | 8.5 | 95.2 |
| | OpenCSG | 71.4 | 14.3 | 14.3 | 100.0 |
| SeeClick | Huggingface | 39.2 | 36.7 | 24.2 | 100.0 |
| | ModelScope | 52.4 | 29.0 | 18.6 | 100.0 |
| | OpenCSG | 52.4 | 14.3 | 33.3 | 100.0 |
| ShowUI | Huggingface | 30.0 | 45.0 | 11.7 | 86.7 |
| | ModelScope | 43.3 | 26.7 | 14.3 | 88.9 |
| | OpenCSG | 23.8 | 52.4 | 9.5 | 85.7 |
Summary:
| Model | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) |
|-----------|--------------|-----------|-------------|---------------------|
| AriaUI | 67.7 | 19.3 | 11.4 | 100.0 |
| CogAgent | 63.3 | 34.8 | 3.0 | 98.7 |
| Qwen3B | 4.5 | 10.8 | 12.9 | 62.6 |
| Qwen7B | 66.9 | 18.8 | 10.1 | 100.0 |
| SeeClick | 45.6 | 26.2 | 26.8 | 97.9 |
| ShowUI | 32.1 | 42.8 | 11.6 | 85.0 |
## Code Hosting Platform GUI Inference
| Model | Platform | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) |
|-----------|--------------|--------------|-----------|-------------|---------------------|
| AriaUI | GitCode | 57.1 | 28.5 | 14.3 | 100.0 |
| | Gitea | 71.4 | 28.5 | 0.0 | 100.0 |
| | Gitee | 57.1 | 28.5 | 14.3 | 100.0 |
| | Github | 71.4 | 14.3 | 14.3 | 100.0 |
| | GitLab | 71.4 | 14.3 | 14.3 | 100.0 |
| CogAgent | GitCode | 71.4 | 28.5 | 0.0 | 100.0 |
| | Gitea | 71.4 | 28.5 | 0.0 | 100.0 |
| | Gitee | 100.0 | 0.0 | 0.0 | 100.0 |
| | Github | 57.1 | 42.8 | 0.0 | 100.0 |
| | GitLab | 85.7 | 14.3 | 0.0 | 100.0 |
| Qwen3B | GitCode | 14.2 | 28.5 | 42.8 | 85.7 |
| | Gitea | 14.2 | 57.1 | 14.2 | 85.7 |
| | Gitee | 14.2 | 42.8 | 28.5 | 100.0 |
| | Github | 0.0 | 28.5 | 57.1 | 85.7 |
| | GitLab | 14.2 | 28.5 | 28.5 | 71.4 |
| Qwen7B | GitCode | 71.4 | 0.0 | 28.5 | 100.0 |
| | Gitea | 57.1 | 28.5 | 14.2 | 100.0 |
| | Gitee | 28.5 | 57.1 | 14.2 | 100.0 |
| | Github | 0.0 | 14.2 | 85.7 | 100.0 |
| | GitLab | 85.7 | 14.2 | 0.0 | 100.0 |
| SeeClick | GitCode | 28.5 | 48.5 | 28.5 | 100.0 |
| | Gitea | 28.5 | 28.5 | 48.5 | 100.0 |
| | Gitee | 28.5 | 57.1 | 14.2 | 100.0 |
| | Github | 14.2 | 57.1 | 28.5 | 100.0 |
| | GitLab | 0.0 | 71.4 | 28.5 | 100.0 |
| ShowUI | GitCode | 28.5 | 48.5 | 14.2 | 85.7 |
| | Gitea | 57.1 | 48.5 | 0.0 | 100.0 |
| | Gitee | 57.1 | 28.5 | 0.0 | 85.7 |
| | Github | 48.5 | 14.2 | 28.5 | 85.7 |
| | GitLab | 48.5 | 14.2 | 14.2 | 71.4 |
Summary:
| Model | Platform | Accuracy (%) | Error (%) | Invalid (%) | Completion Rate (%) |
|-----------|--------------|--------------|-----------|-------------|---------------------|
| AriaUI | 65.7 | 22.8 | 11.4 | 100.0 |
| CogAgent | 62.9 | 22.8 | 0.0 | 100.0 |
| Qwen3B | 11.4 | 37.1 | 37.1 | 85.7 |
| Qwen7B | 48.5 | 22.9 | 28.6 | 100.0 |
| SeeClick | 20.0 | 51.4 | 28.6 | 100.0 |
| ShowUI | 45.7 | 28.6 | 11.4 | 85.7 |
# autohub-benchmark
本项目针对基于Web的代码、模型与数据集托管平台设计了通用使用场景,并配套提供了对应的提示词(Prompt)与基准真值(ground truth)。上述资源可用于评估视觉语言模型(Visual Language Model,VLMs)在垂直专业场景下的落地性能。
## 模型托管平台图形用户界面(GUI)推理任务
| 模型 | 平台 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) |
|-----------|--------------|--------------|-----------|-------------|---------------------|
| AriaUI | Huggingface | 70.8 | 12.5 | 6.7 | 100.0 |
| | ModelScope | 57.6 | 14.2 | 28.2 | 100.0 |
| | OpenCSG | 81.0 | 9.5 | 9.5 | 100.0 |
| CogAgent | Huggingface | 73.3 | 26.7 | 0.0 | 100.0 |
| | ModelScope | 57.9 | 29.1 | 13.0 | 96.3 |
| | OpenCSG | 57.1 | 19.0 | 23.8 | 100.0 |
| Qwen3B | Huggingface | 8.3 | 15.8 | 19.2 | 41.7 |
| | ModelScope | 0.0 | 28.6 | 20.6 | 49.2 |
| | OpenCSG | 4.8 | 4.8 | 9.5 | 19.0 |
| Qwen7B | Huggingface | 73.3 | 11.7 | 10.8 | 95.8 |
| | ModelScope | 55.5 | 30.2 | 8.5 | 95.2 |
| | OpenCSG | 71.4 | 14.3 | 14.3 | 100.0 |
| SeeClick | Huggingface | 39.2 | 36.7 | 24.2 | 100.0 |
| | ModelScope | 52.4 | 29.0 | 18.6 | 100.0 |
| | OpenCSG | 52.4 | 14.3 | 33.3 | 100.0 |
| ShowUI | Huggingface | 30.0 | 45.0 | 11.7 | 86.7 |
| | ModelScope | 43.3 | 26.7 | 14.3 | 88.9 |
| | OpenCSG | 23.8 | 52.4 | 9.5 | 85.7 |
### 汇总结果
| 模型 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) |
|-----------|--------------|-----------|-------------|---------------------|
| AriaUI | 67.7 | 19.3 | 11.4 | 100.0 |
| CogAgent | 63.3 | 34.8 | 3.0 | 98.7 |
| Qwen3B | 4.5 | 10.8 | 12.9 | 62.6 |
| Qwen7B | 66.9 | 18.8 | 10.1 | 100.0 |
| SeeClick | 45.6 | 26.2 | 26.8 | 97.9 |
| ShowUI | 32.1 | 42.8 | 11.6 | 85.0 |
## 代码托管平台图形用户界面(GUI)推理任务
| 模型 | 平台 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) |
|-----------|--------------|--------------|-----------|-------------|---------------------|
| AriaUI | GitCode | 57.1 | 28.5 | 14.3 | 100.0 |
| | Gitea | 71.4 | 28.5 | 0.0 | 100.0 |
| | Gitee(码云)| 57.1 | 28.5 | 14.3 | 100.0 |
| | GitHub | 71.4 | 14.3 | 14.3 | 100.0 |
| | GitLab | 71.4 | 14.3 | 14.3 | 100.0 |
| CogAgent | GitCode | 71.4 | 28.5 | 0.0 | 100.0 |
| | Gitea | 71.4 | 28.5 | 0.0 | 100.0 |
| | Gitee(码云)| 100.0 | 0.0 | 0.0 | 100.0 |
| | GitHub | 57.1 | 42.8 | 0.0 | 100.0 |
| | GitLab | 85.7 | 14.3 | 0.0 | 100.0 |
| Qwen3B | GitCode | 14.2 | 28.5 | 42.8 | 85.7 |
| | Gitea | 14.2 | 57.1 | 14.2 | 85.7 |
| | Gitee(码云)| 14.2 | 42.8 | 28.5 | 100.0 |
| | GitHub | 0.0 | 28.5 | 57.1 | 85.7 |
| | GitLab | 14.2 | 28.5 | 28.5 | 71.4 |
| Qwen7B | GitCode | 71.4 | 0.0 | 28.5 | 100.0 |
| | Gitea | 57.1 | 28.5 | 14.2 | 100.0 |
| | Gitee(码云)| 28.5 | 57.1 | 14.2 | 100.0 |
| | GitHub | 0.0 | 14.2 | 85.7 | 100.0 |
| | GitLab | 85.7 | 14.2 | 0.0 | 100.0 |
| SeeClick | GitCode | 28.5 | 48.5 | 28.5 | 100.0 |
| | Gitea | 28.5 | 28.5 | 48.5 | 100.0 |
| | Gitee(码云)| 28.5 | 57.1 | 14.2 | 100.0 |
| | GitHub | 14.2 | 57.1 | 28.5 | 100.0 |
| | GitLab | 0.0 | 71.4 | 28.5 | 100.0 |
| ShowUI | GitCode | 28.5 | 48.5 | 14.2 | 85.7 |
| | Gitea | 57.1 | 48.5 | 0.0 | 100.0 |
| | Gitee(码云)| 57.1 | 28.5 | 0.0 | 85.7 |
| | GitHub | 48.5 | 14.2 | 28.5 | 85.7 |
| | GitLab | 48.5 | 14.2 | 14.2 | 71.4 |
### 汇总结果
| 模型 | 准确率(%) | 错误率(%) | 无效率(%) | 完成率(%) |
|-----------|--------------|-----------|-------------|---------------------|
| AriaUI | 65.7 | 22.8 | 11.4 | 100.0 |
| CogAgent | 62.9 | 22.8 | 0.0 | 100.0 |
| Qwen3B | 11.4 | 37.1 | 37.1 | 85.7 |
| Qwen7B | 48.5 | 22.9 | 28.6 | 100.0 |
| SeeClick | 20.0 | 51.4 | 28.6 | 100.0 |
| ShowUI | 45.7 | 28.6 | 11.4 | 85.7 |
提供机构:
maas
创建时间:
2025-07-15



