five

ubuntu_osworld_verified_trajs

收藏
魔搭社区2026-01-06 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/xlangai/ubuntu_osworld_verified_trajs
下载链接
链接失效反馈
官方服务:
资源简介:
# OSWorld-Verified Model Trajectories This repository contains trajectory results from various AI models evaluated on the OSWorld benchmark - a comprehensive evaluation environment for multimodal agents in real computer environments. ## Dataset Overview This dataset includes evaluation trajectories and results from multiple state-of-the-art models tested on OSWorld tasks. ## File Structure Each zip file contains complete evaluation trajectories including: - Screenshots and action sequences - Model reasoning traces - Task completion results - Performance metrics ## Evaluation Settings Models were evaluated across different step limits: - **15 steps** - Quick evaluation - **50 steps** - Standard evaluation - **100 steps** - Extended evaluation And multiple runs. ## Task Domains The evaluation covers diverse computer tasks including: - **Office Applications** (LibreOffice Calc/Writer/Impress) - **Daily Applications** (Chrome, VLC, Thunderbird) - **Professional Tools** (GIMP, VS Code) - **Multi-app Workflows** - **Operating System Tasks** ## Usage These trajectories can be used for: - Model performance analysis - Trajectory visualization and debugging - Training data for computer use agents (not recommended) - Benchmark comparison studies - Research on multimodal agent behaviors ## Maintenance This dataset is actively maintained and will be continuously updated. ## Citation If you use this dataset in your research, please cite the OSWorld paper: ```bibtex @article{osworld_verified, title = {Introducing OSWorld-Verified}, author = {Tianbao Xie and Mengqi Yuan and Danyang Zhang and Xinzhuang Xiong and Zhennan Shen and Zilong Zhou and Xinyuan Wang and Yanxu Chen and Jiaqi Deng and Junda Chen and Bowen Wang and Haoyuan Wu and Jixuan Chen and Junli Wang and Dunjie Lu and Hao Hu and Tao Yu}, journal = {xlang.ai}, year = {2025}, month = {July}, url = "https://xlang.ai/blog/osworld-verified" } ``` ## Contact For questions or contributions, please open an issue or contact the OSWorld team. --- **Last Updated**: August 2025 **Total Models**: 15+ model variants **Total Trajectories**: 1000+ evaluation episodes

# OSWorld-Verified 模型轨迹数据集 本仓库存储了多款AI模型在OSWorld基准测试环境(OSWorld benchmark)上的评估轨迹结果,OSWorld是面向真实计算机场景的多模态智能体综合评估平台。 ## 数据集概览 本数据集收录了多款前沿模型在OSWorld任务上的评估轨迹与结果。 ## 文件结构 每个压缩包均包含完整的评估轨迹,具体包括: - 截图与动作序列 - 模型推理轨迹 - 任务完成情况 - 性能指标 ## 评估设置 模型在不同步数限制下完成评估: - **15步**:快速评估 - **50步**:标准评估 - **100步**:扩展评估 且包含多轮运行结果。 ## 任务领域 本次评估覆盖多样化的计算机操作任务,包括: - **办公应用**(LibreOffice Calc/Writer/Impress) - **日常应用**(Chrome、VLC、Thunderbird) - **专业工具**(GIMP、VS Code) - **跨应用工作流** - **操作系统任务** ## 使用场景 该轨迹数据集可用于: - 模型性能分析 - 轨迹可视化与调试 - 计算机操作智能体训练数据(不推荐直接使用) - 基准测试对比研究 - 多模态智能体行为研究 ## 维护说明 本数据集将持续维护并定期更新。 ## 引用说明 若您在研究中使用本数据集,请引用如下OSWorld相关论文: bibtex @article{osworld_verified, title = {Introducing OSWorld-Verified}, author = {Tianbao Xie and Mengqi Yuan and Danyang Zhang and Xinzhuang Xiong and Zhennan Shen and Zilong Zhou and Xinyuan Wang and Yanxu Chen and Jiaqi Deng and Junda Chen and Bowen Wang and Haoyuan Wu and Jixuan Chen and Junli Wang and Dunjie Lu and Hao Hu and Tao Yu}, journal = {xlang.ai}, year = {2025}, month = {July}, url = "https://xlang.ai/blog/osworld-verified" } ## 联系方式 如有疑问或贡献需求,请提交Issue或联系OSWorld团队。 --- **最后更新时间**:2025年8月 **模型总数**:15+款模型变体 **总轨迹数**:1000+个评估回合
提供机构:
maas
创建时间:
2025-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作