aprilavrilivan/zoo-bus-vqa
收藏Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aprilavrilivan/zoo-bus-vqa
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: annotations
list:
- name: area
dtype: float64
- name: bbox
list: float64
- name: category
dtype: string
- name: category_id
dtype: int64
- name: iscrowd
dtype: int64
- name: score
dtype: float64
- name: question
dtype: string
- name: answer
dtype: string
- name: question_type
dtype: string
- name: source_id
dtype: string
- name: id
dtype: int64
splits:
- name: train
num_bytes: 70252918606.75
num_examples: 84018
download_size: 2202343567
dataset_size: 70252918606.75
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Zoo-Bus VQA
## Dataset Summary
Zoo-Bus VQA is a synthetic visual question answering dataset built for spatial reasoning and object-centric grounding.
Each image contains a generated scene with:
- benches
- stop signs
- people
- animals (`zebra`, `elephant`, `giraffe`)
- a clock object used as the bus/agent
- a red heading dot indicating the clock's current facing direction
The dataset is designed to support fine-tuning and evaluation on structured visual reasoning tasks such as:
- counting
- nearest-object reasoning
- grouping and assignment
- ordering by distance
- geometric direction
- heading-relative direction
- obstacle avoidance
- arrival/proximity reasoning
## Dataset Size
- `84,018` QA pairs
- `1,792` source images
- average of about `46.9` QA pairs per image
- split: `train`
## Data Generation
The dataset was generated in two stages:
1. **Scene synthesis**
- Images were generated with a custom scene generator.
- Base scenes contain benches and stop signs.
- Variant scenes add people, animals, and a clock object.
- A red dot is rendered in front of the clock to encode heading direction.
2. **QA generation**
- Object detections were produced with an Ultralytics YOLO model.
- Questions and answers were generated with custom GRAID question classes.
- The pipeline uses detector-aware filtering and geometry-based stability checks to avoid ambiguous samples.
## Supported Reasoning Types
The dataset includes the following question types:
- `CountPeople`: Count the total number of people visible in the scene.
- `CountAnimals`: Count the total number of animals visible in the scene.
- `CountPeopleAtBench`: Count how many people are associated with a specific numbered bench.
- `CountAnimalsAtStopSign`: Count how many animals are associated with a specific numbered stop sign.
- `ListBenchesWithAtLeastKPeople`: List all numbered benches that have at least a given number of people.
- `ListStopSignsWithAtLeastKAnimals`: List all numbered stop signs that have at least a given number of animals.
- `ArrivedAtBench`: Decide whether the clock is close enough to a specific bench to be considered arrived there.
- `ArrivedAtAnimalsAroundStopSigns`: Decide whether the clock is close enough to at least one animal in the group around a specific stop sign.
- `ClosestBench`: Identify which numbered bench is nearest to the clock.
- `ClosestStopSign`: Identify which numbered stop sign is nearest to the clock.
- `PairwiseCloserBench`: Compare two numbered benches and decide which one is closer to the clock.
- `PairwiseCloserStopSign`: Compare two numbered stop signs and decide which one is closer to the clock.
- `ClosestToFurthestBenches`: Order all numbered benches from nearest to farthest relative to the clock.
- `ClosestToFurthestStopSigns`: Order all numbered stop signs from nearest to farthest relative to the clock.
- `GeometricDirectionToBench`: Determine the compass direction of a specific bench relative to the clock.
- `GeometricDirectionToStopSign`: Determine the compass direction of a specific stop sign relative to the clock.
- `AvoidObstacleToReachBench`: Determine whether the clock should go straight, turn left, or turn right to reach a specific bench while avoiding blocking objects.
- `AvoidObstacleToReachStopSign`: Determine whether the clock should go straight, turn left, or turn right to reach a specific stop sign while avoiding blocking objects.
- `BusHeadingDirection`: Infer the current heading direction of the clock from the red dot placed in front of it.
- `TurnDirectionToBench`: Decide how the clock should turn in order to face a specific bench.
- `TurnDirectionToStopSign`: Decide how the clock should turn in order to face a specific stop sign.
- `BenchRelativeToHeading`: Determine where a specific bench lies relative to the clock’s current heading, such as front, left, or back-right.
- `StopSignRelativeToHeading`: Determine where a specific stop sign lies relative to the clock’s current heading, such as front, right, or back-left.
- `CountPersonAtClosestBench`: Count how many people are at the bench that is closest to the clock.
- `ClosestBenchWithPerson`: Identify the nearest bench to the clock that has at least one person.
- `AvoidObstacleToReachClosestBench`: Determine whether the clock should go straight, turn left, or turn right to reach the nearest bench while avoiding blocking objects.
- `AvoidObstacleToReachClosestStopSign`: Determine whether the clock should go straight, turn left, or turn right to reach the nearest stop sign while avoiding blocking objects.
- `DirectionToClosestBench`: Determine the compass direction of the nearest bench relative to the clock.
- `DirectionToClosestStopSign`: Determine the compass direction of the nearest stop sign relative to the clock.
## Data Fields
Each row contains:
- `image`: the RGB scene image
- `annotations`: detected object annotations in COCO-style format
- `question`: the question text
- `answer`: the ground-truth answer
- `question_type`: question class name
- `source_id`: source image filename
- `id`: unique row id
## Example Usage
```python
from datasets import load_dataset
ds = load_dataset("aprilavrilivan/zoo-bus-vqa")
print(ds["train"][0])
image = ds["train"][0]["image"]
question = ds["train"][0]["question"]
answer = ds["train"][0]["answer"]
提供机构:
aprilavrilivan



