thomas-yanxin/robovqa-mirror
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/thomas-yanxin/robovqa-mirror
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: uid
dtype: string
- name: video
dtype: string
- name: question_type
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 189067260
num_examples: 801388
- name: val
num_bytes: 411776
num_examples: 1921
download_size: 23049505
dataset_size: 189479036
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: val
path: data/val-*
---
## Dataset Structure
### Data Fields
- `uid` (string): Unique identifier for the episode.
- `video` (string): Path or URL to the video file.
- `task_type` (string): Type of task (e.g., planning, success evaluation).
- `question` (string): Natural language question.
- `answer` (string): Ground truth answer (free-form or binary).
### Preprocess
```python
def parse_task_data(text_data: str) -> list:
"""
Parses a string containing task data to extract task type, question, and answer.
Handles multiple Q: A: pairs within a single text block.
Args:
text_data: The input string containing the task information.
Returns:
A list of dictionaries, where each dictionary represents a parsed task
and contains 'task_type', 'question', and 'answer' keys.
"""
parsed_results = []
# Split the input into multiple <task:...> blocks
task_blocks = re.findall(r'(<task:[^>]+>.*?)(?=<task:|$)', text_data, re.DOTALL)
for block in task_blocks:
# Extract task type
task_type_match = re.search(r'<task:([^>]+)>', block)
task_type = task_type_match.group(1) if task_type_match else "unknown"
# Remove the task tag for easier processing
clean_block = re.sub(r'<task:[^>]+>', '', block, 1).strip()
# Match everything from start up to <PRED>A: as question, then capture answer
qa_pairs = re.findall(r'(.*?)Q: (.*?) <PRED>A: (.*?)</PRED>', clean_block, re.DOTALL)
for prefix, q_suffix, raw_answer in qa_pairs:
# Combine both parts of the question
question = (prefix + "Q: " + q_suffix).strip()
# Clean answer by removing nested tags
answer = re.sub(
r'<PRED:ANSWER>|<PRED:DISCRETE>|<PRED:BINARY>|</PRED:BINARY>|</PRED:DISCRETE>|</PRED:ANSWER>|\n',
'',
raw_answer
).strip()
parsed_results.append({
"question_type": task_type,
"question": question,
"answer": answer
})
return parsed_results
提供机构:
thomas-yanxin



