five

Fangjun/RoomSpace

收藏
Hugging Face2024-06-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Fangjun/RoomSpace
下载链接
链接失效反馈
官方服务:
资源简介:
RoomSpace Benchmark数据集是一个用于空间推理评估的基准数据集,包含不同版本的房间样本。每个样本包括两个视角的图像(俯视图和北向视图)以及与之相关的问题和答案。问题和答案分为多种类型,如视觉问答、布局设置、TPP设置等,涵盖了不同的空间推理任务。数据集的分支结构展示了不同规模的样本数量,如100、1,000和10,000个房间样本。该数据集旨在为语言模型和空间推理任务提供真实世界的模拟基准。

RoomSpace Benchmark数据集是一个用于空间推理评估的基准数据集,包含不同版本的房间样本。每个样本包括两个视角的图像(俯视图和北向视图)以及与之相关的问题和答案。问题和答案分为多种类型,如视觉问答、布局设置、TPP设置等,涵盖了不同的空间推理任务。数据集的分支结构展示了不同规模的样本数量,如100、1,000和10,000个房间样本。该数据集旨在为语言模型和空间推理任务提供真实世界的模拟基准。
提供机构:
Fangjun
原始信息汇总

数据集概述

数据集配置

  • config_name: n3_m2_d144

    • features:
      • id: int64
      • image_top_down: image
      • image_north_facing: image
      • question_image: string
      • options_td: sequence: string
      • options_sd: sequence: string
      • answer_image_td: string
      • answer_image_sd: string
      • question_td_layout_yn: string
      • answer_td_layout_yn: string
      • question_td_layout_fr: string
      • answer_td_layout_fr: sequence: string
      • question_td_tpp_yn: string
      • answer_td_tpp_yn: string
      • question_td_tpp_fr: string
      • answer_td_tpp_fr: sequence: string
      • question_td_o2_yn: string
      • answer_td_o2_yn: string
      • question_td_o2_fr: string
      • answer_td_o2_fr: sequence: string
      • question_td_o2_d2_yn: string
      • answer_td_o2_d2_yn: string
      • question_td_o2_d2_fr: string
      • answer_td_o2_d2_fr: sequence: string
      • question_td_o2_d3_yn: string
      • answer_td_o2_d3_yn: string
      • question_td_o2_d3_fr: string
      • answer_td_o2_d3_fr: sequence: string
      • question_td_o2_layout_yn: string
      • answer_td_o2_layout_yn: string
      • question_td_o2_layout_fr: string
      • answer_td_o2_layout_fr: sequence: string
      • question_td_o2_d2_layout_yn: string
      • answer_td_o2_d2_layout_yn: string
      • question_td_o2_d2_layout_fr: string
      • answer_td_o2_d2_layout_fr: sequence: string
      • question_td_o2_d3_layout_yn: string
      • answer_td_o2_d3_layout_yn: string
      • question_td_o2_d3_layout_fr: string
      • answer_td_o2_d3_layout_fr: sequence: string
      • question_sd_layout_yn: string
      • answer_sd_layout_yn: string
      • question_sd_tpp_yn: string
      • answer_sd_tpp_yn: string
      • question_sd_o2_yn: string
      • answer_sd_o2_yn: string
      • question_sd_o2_fr: string
      • answer_sd_o2_fr: sequence: string
      • question_sd_o2_d2_yn: string
      • answer_sd_o2_d2_yn: string
      • question_sd_o2_d2_fr: string
      • answer_sd_o2_d2_fr: sequence: string
      • question_sd_o2_d3_yn: string
      • answer_sd_o2_d3_yn: string
      • question_sd_o2_d3_fr: string
      • answer_sd_o2_d3_fr: sequence: string
      • question_sd_o2_layout_yn: string
      • answer_sd_o2_layout_yn: string
      • question_sd_o2_layout_fr: string
      • answer_sd_o2_layout_fr: sequence: string
      • question_sd_o2_d2_layout_yn: string
      • answer_sd_o2_d2_layout_yn: string
      • question_sd_o2_d2_layout_fr: string
      • answer_sd_o2_d2_layout_fr: sequence: string
      • question_sd_o2_d3_layout_yn: string
      • answer_sd_o2_d3_layout_yn: string
      • question_sd_o2_d3_layout_fr: string
      • answer_sd_o2_d3_layout_fr: sequence: string
    • splits:
      • test:
        • num_bytes: 66958010.0
        • num_examples: 100
    • download_size: 65902762
    • dataset_size: 66958010.0
  • config_name: n5_m4_d144

    • features:
      • id: int64
      • image_top_down: image
      • image_north_facing: image
      • question_image: string
      • options_td: sequence: string
      • options_sd: sequence: string
      • answer_image_td: string
      • answer_image_sd: string
      • question_td_layout_yn: string
      • answer_td_layout_yn: string
      • question_td_layout_fr: string
      • answer_td_layout_fr: sequence: string
      • question_td_tpp_yn: string
      • answer_td_tpp_yn: string
      • question_td_tpp_fr: string
      • answer_td_tpp_fr: sequence: string
      • question_td_o2_yn: string
      • answer_td_o2_yn: string
      • question_td_o2_fr: string
      • answer_td_o2_fr: sequence: string
      • question_td_o2_d2_yn: string
      • answer_td_o2_d2_yn: string
      • question_td_o2_d2_fr: string
      • answer_td_o2_d2_fr: sequence: string
      • question_td_o2_d3_yn: string
      • answer_td_o2_d3_yn: string
      • question_td_o2_d3_fr: string
      • answer_td_o2_d3_fr: sequence: string
      • question_td_o2_layout_yn: string
      • answer_td_o2_layout_yn: string
      • question_td_o2_layout_fr: string
      • answer_td_o2_layout_fr: sequence: string
      • question_td_o2_d2_layout_yn: string
      • answer_td_o2_d2_layout_yn: string
      • question_td_o2_d2_layout_fr: string
      • answer_td_o2_d2_layout_fr: sequence: string
      • question_td_o2_d3_layout_yn: string
      • answer_td_o2_d3_layout_yn: string
      • question_td_o2_d3_layout_fr: string
      • answer_td_o2_d3_layout_fr: sequence: string
      • question_sd_layout_yn: string
      • answer_sd_layout_yn: string
      • question_sd_tpp_yn: string
      • answer_sd_tpp_yn: string
      • question_sd_o2_yn: string
      • answer_sd_o2_yn: string
      • question_sd_o2_fr: string
      • answer_sd_o2_fr: sequence: string
      • question_sd_o2_d2_yn: string
      • answer_sd_o2_d2_yn: string
      • question_sd_o2_d2_fr: string
      • answer_sd_o2_d2_fr: sequence: string
      • question_sd_o2_d3_yn: string
      • answer_sd_o2_d3_yn: string
      • question_sd_o2_d3_fr: string
      • answer_sd_o2_d3_fr: sequence: string
      • question_sd_o2_layout_yn: string
      • answer_sd_o2_layout_yn: string
      • question_sd_o2_layout_fr: string
      • answer_sd_o2_layout_fr: sequence: string
      • question_sd_o2_d2_layout_yn: string
      • answer_sd_o2_d2_layout_yn: string
      • question_sd_o2_d2_layout_fr: string
      • answer_sd_o2_d2_layout_fr: sequence: string
      • question_sd_o2_d3_layout_yn: string
      • answer_sd_o2_d3_layout_yn: string
      • question_sd_o2_d3_layout_fr: string
      • answer_sd_o2_d3_layout_fr: sequence: string
    • splits:
      • test:
        • num_bytes: 67434227.0
        • num_examples: 100
    • download_size: 65991028
    • dataset_size: 67434227.0

数据集结构

  • Images:

    • image_top_down: Top-down images of the rooms.
    • image_north_facing: North-facing rectangular images, assuming a robot stands at the south door, facing north.
  • Questions & Answers:

    • Visual QA: (image_top_down + question_image) -- answer_image_td, (image_north_facing + question_image) -- answer_image_sd
    • Layout Setting: (question_td_layout_yn -- answer_td_layout_yn), (question_td_layout_fr -- answer_td_layout_fr)
    • TPP Setting: (question_td_tpp_yn -- answer_td_tpp_yn), (question_td_tpp_fr -- answer_td_tpp_fr)
    • O2 Setting: (question_td_o2_yn -- answer_td_o2_yn), (question_td_o2_fr -- answer_td_o2_fr), (question_sd_o2_yn -- answer_sd_o2_yn), (question_sd_o2_fr -- answer_sd_o2_fr)
    • O2+D2 Setting: (question_td_o2_d2_yn -- answer_td_o2_d2_yn), (question_td_o2_d2_fr -- answer_td_o2_d2_fr), (question_sd_o2_d2_yn -- answer_sd_o2_d2_yn), (question_sd_o2_d2_fr -- answer_sd_o2_d2_fr)
    • O2+D3 Setting: (question_td_o2_d3_yn -- answer_td_o2_d3_yn), (question_td_o2_d3_fr -- answer_td_o2_d3_fr), (question_sd_o2_d3_yn -- answer_sd_o2_d3_yn), (question_sd_o2_d3_fr -- answer_sd_o2_d3_fr)
    • Layout + O2 + D2 Setting: (question_td_o2_d2_layout_yn -- answer_td_o2_d2_layout_yn), (question_td_o2_d2_layout_fr -- answer_td_o2_d2_layout_fr), (question_sd_o2_d2_layout_yn -- answer_sd_o2_d2_layout_yn), (question_sd_o2_d2_layout_fr -- answer_sd_o2_d2_layout_fr)
    • Layout + O2 + D3 Setting: (question_td_o2_d3_layout_yn -- answer_td_o2_d3_layout_yn), (question_td_o2_d3_layout_fr -- answer_td_o2_d3_layout_fr), (question_sd_o2_d3_layout_yn -- answer_sd_o2_d3_layout_yn), (question_sd_o2_d3_layout_fr -- answer_sd_o2_d3_layout_fr)
  • Question Types:

    • fr: Find Relation
    • yn: Yes/No
  • Points of View:

    • td: Top-Down View
    • sd: North Facing View (from the south door)
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
RoomSpace是一个用于评估语言模型空间推理能力的基准数据集,结合图像(俯视图和北向视图)与文本问答对,模拟真实世界房间场景。该数据集提供多视角(如俯视图和北向视图)和多类型问题(如视觉问答、布局设置等),旨在测试模型在定性推理任务中的性能,支持不同规模版本(如100、1K、10K样本)以进行灵活评估。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作