jeasinema/SQA3D
收藏Hugging Face2023-01-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jeasinema/SQA3D
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- question-answering
tags:
- 3D vision
- embodied AI
size_categories:
- 10K<n<100K
---
SQA3D: Situated Question Answering in 3D Scenes (ICLR 2023, https://arxiv.org/abs/2210.07474)
===
1. Download the [SQA3D dataset](https://zenodo.org/record/7544818/files/sqa_task.zip?download=1) under `assets/data/`. The following files should be used:
```
./assets/data/sqa_task/balanced/*
./assets/data/sqa_task/answer_dict.json
```
2. The dataset has been splited into `train`, `val` and `test`. For each category, we offer both question file, ex. `v1_balanced_questions_train_scannetv2.json`, and annotations, ex. `v1_balanced_sqa_annotations_train_scannetv2.json`
- The format of question file:
Run the following code:
```python
import json
q = json.load(open('v1_balanced_questions_train_scannetv2.json', 'r'))
# Print the total number of questions
print('#questions: ', len(q['questions']))
print(q['questions'][0])
```
The output is:
```json
{
"alternative_situation":
[
"I stand looking out of the window in thought and a radiator is right in front of me.",
"I am looking outside through the window behind the desk."
],
"question": "What color is the desk to my right?",
"question_id": 220602000000,
"scene_id": "scene0380_00",
"situation": "I am facing a window and there is a desk on my right and a chair behind me."
}
```
The following fileds are **useful**: `question`, `question_id`, `scene_id`, `situation`.
- The format of annotations:
Run the following code:
```python
import json
a = json.load(open('v1_balanced_sqa_annotations_train_scannetv2.json', 'r'))
# Print the total number of annotations, should be the same as questions
print('#annotations: ', len(a['annotations']))
print(a['annotations'][0])
```
The output is
```json
{
"answer_type": "other",
"answers":
[
{
"answer": "brown",
"answer_confidence": "yes",
"answer_id": 1
}
],
"position":
{
"x": -0.9651003385573296,
"y": -1.2417634435553606,
"z": 0
},
"question_id": 220602000000,
"question_type": "N/A",
"rotation":
{
"_w": 0.9950041652780182,
"_x": 0,
"_y": 0,
"_z": 0.09983341664682724
},
"scene_id": "scene0380_00"
}
```
The following fields are **useful**: `answers[0]['answer']`, `question_id`, `scene_id`.
**Note**: To find the answer of a question in the question file, you need to use lookup with `question_id`.
3. We provide the mapping between answers and class labels in `answer_dict.json`
```python
import json
j = json.load(open('answer_dict.json', 'r'))
print('Total classes: ', len(j[0]))
print('The class label of answer \'table\' is: ', j[0]['table'])
print('The corresponding answer of class 123 is: ', j[1]['123'])
```
4. Loader, model and training code can be found at https://github.com/SilongYong/SQA3D
许可证:CC BY 4.0
任务类别:
- 问答(Question Answering)
标签:
- 三维视觉(3D vision)
- 具身人工智能(embodied AI)
数据规模:10K < n < 100K
---
SQA3D:三维场景中的情境问答(ICLR 2023,https://arxiv.org/abs/2210.07474)
===
1. 请将[SQA3D数据集](https://zenodo.org/record/7544818/files/sqa_task.zip?download=1)下载至`assets/data/`目录下,需使用以下文件:
./assets/data/sqa_task/balanced/*
./assets/data/sqa_task/answer_dict.json
2. 该数据集已划分为训练集(train)、验证集(val)与测试集(test)。针对每个子集,我们均提供了对应的问题文件与标注文件,例如问题文件示例为`v1_balanced_questions_train_scannetv2.json`,标注文件示例为`v1_balanced_sqa_annotations_train_scannetv2.json`。
- 问题文件格式:
运行以下代码:
python
import json
q = json.load(open('v1_balanced_questions_train_scannetv2.json', 'r'))
# 打印总问题数量
print('#questions: ', len(q['questions']))
print(q['questions'][0])
输出内容为:
json
{
"alternative_situation":
[
"我驻足窗前沉思,眼前正对着一台散热器。",
"我正透过书桌后方的窗户向外眺望。"
],
"question": "我右侧的书桌是什么颜色?",
"question_id": 220602000000,
"scene_id": "scene0380_00",
"situation": "我面朝窗户,右侧摆放着一张书桌,身后有一把椅子。"
}
以下字段为**实用有效字段**:`question`、`question_id`、`scene_id`、`situation`。
- 标注文件格式:
运行以下代码:
python
import json
a = json.load(open('v1_balanced_sqa_annotations_train_scannetv2.json', 'r'))
# 打印总标注数量,应与问题数量一致
print('#annotations: ', len(a['annotations']))
print(a['annotations'][0])
输出内容为:
json
{
"answer_type": "other",
"answers":
[
{
"answer": "棕色",
"answer_confidence": "确认",
"answer_id": 1
}
],
"position":
{
"x": -0.9651003385573296,
"y": -1.2417634435553606,
"z": 0
},
"question_id": 220602000000,
"question_type": "N/A",
"rotation":
{
"_w": 0.9950041652780182,
"_x": 0,
"_y": 0,
"_z": 0.09983341664682724
},
"scene_id": "scene0380_00"
}
以下字段为**实用有效字段**:`answers[0]['answer']`、`question_id`、`scene_id`。
**注意**:需通过`question_id`进行匹配,方可在问题文件中查询到对应问题的答案。
3. 我们在`answer_dict.json`中提供了答案与类别标签的映射关系,运行以下代码:
python
import json
j = json.load(open('answer_dict.json', 'r'))
print('总类别数: ', len(j[0]))
print('答案'桌子'对应的类别标签为: ', j[0]['table'])
print('类别123对应的答案为: ', j[1]['123'])
4. 加载器、模型与训练代码可在https://github.com/SilongYong/SQA3D 获取。
提供机构:
jeasinema
原始信息汇总
数据集概述
数据集名称
SQA3D: Situated Question Answering in 3D Scenes
出版物
ICLR 2023
数据集链接
数据集大小
10K<n<100K
许可
CC-BY-4.0
任务类别
- Question-Answering
标签
- 3D Vision
- Embodied AI
数据集结构
- 分为
train,val,test三个部分。 - 每个部分包含问题文件和标注文件。
问题文件格式
- 包含字段:
question,question_id,scene_id,situation - 示例输出: json { "question": "What color is the desk to my right?", "question_id": 220602000000, "scene_id": "scene0380_00", "situation": "I am facing a window and there is a desk on my right and a chair behind me." }
标注文件格式
- 包含字段:
answers[0][answer],question_id,scene_id - 示例输出: json { "answers": [ { "answer": "brown", "answer_confidence": "yes", "answer_id": 1 } ], "question_id": 220602000000, "scene_id": "scene0380_00" }
答案与类别标签映射
- 提供
answer_dict.json文件,用于映射答案和类别标签。
相关代码
- 加载器、模型和训练代码可在 GitHub 获取。
搜集汇总
数据集介绍

构建方式
jeasinema/SQA3D数据集的构建,是在三维视觉与Embodied AI领域背景下,针对具体场景中的问题回答任务而设计。数据集通过精心挑选和标注的场景与问题,形成了训练、验证和测试三个子集,每个子集均包含问题文件和对应的标注文件,确保了数据的一致性和可用性。
特点
该数据集显著的特点在于其结合了三维场景的视觉信息与Embodied AI的情境理解。数据集规模适中,包含了10K至100K的实例,覆盖了多种场景和问题类型。每个问题均配有详细的场景描述和答案标注,且提供了答案类别与标签的映射,便于模型训练和评估。
使用方法
使用jeasinema/SQA3D数据集时,用户需先下载相应的数据文件,并根据文件结构进行数据加载。数据集提供了问题文件和标注文件的格式示例,以及如何通过问题ID在标注文件中查找答案的方法。此外,数据集还提供了答案类别与标签的映射文件,方便用户在模型训练时进行类别转换和评估。
背景与挑战
背景概述
SQA3D数据集,全称为Situated Question Answering in 3D Scenes,是专注于三维场景中的定位问答任务的研究成果,由Silong Yong等研究人员在2023年ICLR会议中提出。该数据集旨在解决三维环境中,智能体如何根据其所在位置提出问题并获取答案的问题,对Embodied AI领域的发展具有推动作用。数据集包含了在ScanNet V2三维场景中捕捉的数以万计的问题和答案,这些问题和答案均与场景中的具体位置和对象相关联,为三维视觉理解和智能体交互提供了丰富的资源。
当前挑战
在构建SQA3D数据集的过程中,研究人员面临了诸多挑战。首先,如何确保问题与答案的准确性以及它们在三维场景中的定位精确性是一个关键问题。其次,构建一个大规模且多样化的问题集合,以覆盖广泛的场景和对象,同时保持数据平衡,也是一个重大的挑战。此外,数据集的构建还需要解决标注一致性、数据集的可用性和易于使用等问题,以确保研究社区能够有效地利用该数据集进行研究和开发。
常用场景
经典使用场景
在三维视觉领域,SQA3D数据集提供了一个置于三维场景中的问题回答任务,其经典使用场景在于通过虚拟化身(Embodied AI)模拟人类在三维空间中的互动,以回答有关场景的具体问题。该数据集要求模型不仅理解三维空间结构,还需结合具体情境进行推理。
解决学术问题
SQA3D数据集解决了传统问题回答系统无法处理三维空间信息的问题,为三维视觉与自然语言处理结合的领域带来了新的研究挑战。它促进了学术研究中对三维场景理解、物体定位以及情境感知的回答生成等问题的探讨,对提升人工智能的空间认知能力具有重要意义。
衍生相关工作
基于SQA3D数据集,学术界已衍生出一系列相关工作,包括但不限于三维场景理解、Embodied AI模型的设计与优化、场景描述生成等。这些研究进一步扩展了数据集的应用范围,推动了相关技术的进步,为人工智能领域带来了新的研究方向和突破。
以上内容由遇见数据集搜集并总结生成



