ubuntu_osworld_verified_trajs
收藏魔搭社区2026-01-06 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/xlangai/ubuntu_osworld_verified_trajs
下载链接
链接失效反馈官方服务:
资源简介:
# OSWorld-Verified Model Trajectories
This repository contains trajectory results from various AI models evaluated on the OSWorld benchmark - a comprehensive evaluation environment for multimodal agents in real computer environments.
## Dataset Overview
This dataset includes evaluation trajectories and results from multiple state-of-the-art models tested on OSWorld tasks.
## File Structure
Each zip file contains complete evaluation trajectories including:
- Screenshots and action sequences
- Model reasoning traces
- Task completion results
- Performance metrics
## Evaluation Settings
Models were evaluated across different step limits:
- **15 steps** - Quick evaluation
- **50 steps** - Standard evaluation
- **100 steps** - Extended evaluation
And multiple runs.
## Task Domains
The evaluation covers diverse computer tasks including:
- **Office Applications** (LibreOffice Calc/Writer/Impress)
- **Daily Applications** (Chrome, VLC, Thunderbird)
- **Professional Tools** (GIMP, VS Code)
- **Multi-app Workflows**
- **Operating System Tasks**
## Usage
These trajectories can be used for:
- Model performance analysis
- Trajectory visualization and debugging
- Training data for computer use agents (not recommended)
- Benchmark comparison studies
- Research on multimodal agent behaviors
## Maintenance
This dataset is actively maintained and will be continuously updated.
## Citation
If you use this dataset in your research, please cite the OSWorld paper:
```bibtex
@article{osworld_verified,
title = {Introducing OSWorld-Verified},
author = {Tianbao Xie and Mengqi Yuan and Danyang Zhang and Xinzhuang Xiong and Zhennan Shen and Zilong Zhou and Xinyuan Wang and Yanxu Chen and Jiaqi Deng and Junda Chen and Bowen Wang and Haoyuan Wu and Jixuan Chen and Junli Wang and Dunjie Lu and Hao Hu and Tao Yu},
journal = {xlang.ai},
year = {2025},
month = {July},
url = "https://xlang.ai/blog/osworld-verified"
}
```
## Contact
For questions or contributions, please open an issue or contact the OSWorld team.
---
**Last Updated**: August 2025
**Total Models**: 15+ model variants
**Total Trajectories**: 1000+ evaluation episodes
# OSWorld-Verified 模型轨迹数据集
本仓库存储了多款AI模型在OSWorld基准测试环境(OSWorld benchmark)上的评估轨迹结果,OSWorld是面向真实计算机场景的多模态智能体综合评估平台。
## 数据集概览
本数据集收录了多款前沿模型在OSWorld任务上的评估轨迹与结果。
## 文件结构
每个压缩包均包含完整的评估轨迹,具体包括:
- 截图与动作序列
- 模型推理轨迹
- 任务完成情况
- 性能指标
## 评估设置
模型在不同步数限制下完成评估:
- **15步**:快速评估
- **50步**:标准评估
- **100步**:扩展评估
且包含多轮运行结果。
## 任务领域
本次评估覆盖多样化的计算机操作任务,包括:
- **办公应用**(LibreOffice Calc/Writer/Impress)
- **日常应用**(Chrome、VLC、Thunderbird)
- **专业工具**(GIMP、VS Code)
- **跨应用工作流**
- **操作系统任务**
## 使用场景
该轨迹数据集可用于:
- 模型性能分析
- 轨迹可视化与调试
- 计算机操作智能体训练数据(不推荐直接使用)
- 基准测试对比研究
- 多模态智能体行为研究
## 维护说明
本数据集将持续维护并定期更新。
## 引用说明
若您在研究中使用本数据集,请引用如下OSWorld相关论文:
bibtex
@article{osworld_verified,
title = {Introducing OSWorld-Verified},
author = {Tianbao Xie and Mengqi Yuan and Danyang Zhang and Xinzhuang Xiong and Zhennan Shen and Zilong Zhou and Xinyuan Wang and Yanxu Chen and Jiaqi Deng and Junda Chen and Bowen Wang and Haoyuan Wu and Jixuan Chen and Junli Wang and Dunjie Lu and Hao Hu and Tao Yu},
journal = {xlang.ai},
year = {2025},
month = {July},
url = "https://xlang.ai/blog/osworld-verified"
}
## 联系方式
如有疑问或贡献需求,请提交Issue或联系OSWorld团队。
---
**最后更新时间**:2025年8月
**模型总数**:15+款模型变体
**总轨迹数**:1000+个评估回合
提供机构:
maas
创建时间:
2025-08-18



