aprilavrilivan/zoo-bus-vqa

Name: aprilavrilivan/zoo-bus-vqa
Creator: aprilavrilivan
Published: 2026-03-22 08:30:54
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/aprilavrilivan/zoo-bus-vqa

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: image dtype: image - name: annotations list: - name: area dtype: float64 - name: bbox list: float64 - name: category dtype: string - name: category_id dtype: int64 - name: iscrowd dtype: int64 - name: score dtype: float64 - name: question dtype: string - name: answer dtype: string - name: question_type dtype: string - name: source_id dtype: string - name: id dtype: int64 splits: - name: train num_bytes: 70252918606.75 num_examples: 84018 download_size: 2202343567 dataset_size: 70252918606.75 configs: - config_name: default data_files: - split: train path: data/train-* --- # Zoo-Bus VQA ## Dataset Summary Zoo-Bus VQA is a synthetic visual question answering dataset built for spatial reasoning and object-centric grounding. Each image contains a generated scene with: - benches - stop signs - people - animals (`zebra`, `elephant`, `giraffe`) - a clock object used as the bus/agent - a red heading dot indicating the clock's current facing direction The dataset is designed to support fine-tuning and evaluation on structured visual reasoning tasks such as: - counting - nearest-object reasoning - grouping and assignment - ordering by distance - geometric direction - heading-relative direction - obstacle avoidance - arrival/proximity reasoning ## Dataset Size - `84,018` QA pairs - `1,792` source images - average of about `46.9` QA pairs per image - split: `train` ## Data Generation The dataset was generated in two stages: 1. **Scene synthesis** - Images were generated with a custom scene generator. - Base scenes contain benches and stop signs. - Variant scenes add people, animals, and a clock object. - A red dot is rendered in front of the clock to encode heading direction. 2. **QA generation** - Object detections were produced with an Ultralytics YOLO model. - Questions and answers were generated with custom GRAID question classes. - The pipeline uses detector-aware filtering and geometry-based stability checks to avoid ambiguous samples. ## Supported Reasoning Types The dataset includes the following question types: - `CountPeople`: Count the total number of people visible in the scene. - `CountAnimals`: Count the total number of animals visible in the scene. - `CountPeopleAtBench`: Count how many people are associated with a specific numbered bench. - `CountAnimalsAtStopSign`: Count how many animals are associated with a specific numbered stop sign. - `ListBenchesWithAtLeastKPeople`: List all numbered benches that have at least a given number of people. - `ListStopSignsWithAtLeastKAnimals`: List all numbered stop signs that have at least a given number of animals. - `ArrivedAtBench`: Decide whether the clock is close enough to a specific bench to be considered arrived there. - `ArrivedAtAnimalsAroundStopSigns`: Decide whether the clock is close enough to at least one animal in the group around a specific stop sign. - `ClosestBench`: Identify which numbered bench is nearest to the clock. - `ClosestStopSign`: Identify which numbered stop sign is nearest to the clock. - `PairwiseCloserBench`: Compare two numbered benches and decide which one is closer to the clock. - `PairwiseCloserStopSign`: Compare two numbered stop signs and decide which one is closer to the clock. - `ClosestToFurthestBenches`: Order all numbered benches from nearest to farthest relative to the clock. - `ClosestToFurthestStopSigns`: Order all numbered stop signs from nearest to farthest relative to the clock. - `GeometricDirectionToBench`: Determine the compass direction of a specific bench relative to the clock. - `GeometricDirectionToStopSign`: Determine the compass direction of a specific stop sign relative to the clock. - `AvoidObstacleToReachBench`: Determine whether the clock should go straight, turn left, or turn right to reach a specific bench while avoiding blocking objects. - `AvoidObstacleToReachStopSign`: Determine whether the clock should go straight, turn left, or turn right to reach a specific stop sign while avoiding blocking objects. - `BusHeadingDirection`: Infer the current heading direction of the clock from the red dot placed in front of it. - `TurnDirectionToBench`: Decide how the clock should turn in order to face a specific bench. - `TurnDirectionToStopSign`: Decide how the clock should turn in order to face a specific stop sign. - `BenchRelativeToHeading`: Determine where a specific bench lies relative to the clock’s current heading, such as front, left, or back-right. - `StopSignRelativeToHeading`: Determine where a specific stop sign lies relative to the clock’s current heading, such as front, right, or back-left. - `CountPersonAtClosestBench`: Count how many people are at the bench that is closest to the clock. - `ClosestBenchWithPerson`: Identify the nearest bench to the clock that has at least one person. - `AvoidObstacleToReachClosestBench`: Determine whether the clock should go straight, turn left, or turn right to reach the nearest bench while avoiding blocking objects. - `AvoidObstacleToReachClosestStopSign`: Determine whether the clock should go straight, turn left, or turn right to reach the nearest stop sign while avoiding blocking objects. - `DirectionToClosestBench`: Determine the compass direction of the nearest bench relative to the clock. - `DirectionToClosestStopSign`: Determine the compass direction of the nearest stop sign relative to the clock. ## Data Fields Each row contains: - `image`: the RGB scene image - `annotations`: detected object annotations in COCO-style format - `question`: the question text - `answer`: the ground-truth answer - `question_type`: question class name - `source_id`: source image filename - `id`: unique row id ## Example Usage ```python from datasets import load_dataset ds = load_dataset("aprilavrilivan/zoo-bus-vqa") print(ds["train"][0]) image = ds["train"][0]["image"] question = ds["train"][0]["question"] answer = ds["train"][0]["answer"]

提供机构：

aprilavrilivan

5,000+

优质数据集

54 个

任务类型

进入经典数据集