Synthetic Shape Dataset for Numerosity: Exploring Six Configurations of Complexity
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11194221
下载链接
链接失效反馈官方服务:
资源简介:
The Synthetic Shape Dataset for Numerosity is a meticulously crafted collection designed to advance the understanding and training of machine learning models in the realm of numerosity perception, mirroring human neurocognitive abilities. Comprising six distinct configurations ranging from fundamental to intricate, this dataset offers a comprehensive exploration of shape complexity.
Each configuration presents a unique array of synthetic images, where shapes are dynamically generated and randomly distributed against contrasting backgrounds. Configuration 1 serves as the foundation, featuring only white circles against a black background, with each circle sharing a uniform size. As complexity escalates through the configurations, additional elements are introduced, including variations in shape type, size, orientation, and pixel intensity.
One notable feature of this dataset is that no shapes overlap or touch each other, ensuring clarity and precision in each image. The total dataset comprises 73,686 images, with each configuration meticulously crafted to offer distinct challenges for numerosity perception tasks.
The shape generation process is divided into two types of shape sizes:
Bounded: In this category, there is no correlation between the pixel count for shapes contained in an image and the target numerosity count. Each of the six configurations contains 9,212 images, totaling 55,272 images.
Unbounded: Here, there is a correlation between the pixel count for shapes in an image and the target numerosity count. Each configuration consists of 3,069 images, totaling 18,414 images.
The configurations are as follows:
Configuration 1: Features only white circles against a black background, with uniform circle size.
Configuration 2: Similar to Configuration 1, but circles vary in size.
Configuration 3: Presents a black background with full white circles, triangles, squares, and pentagons. Shapes have a uniform orientation but do not share a uniform size.
Configuration 4: Similar to Configuration 3, but shapes do not have a uniform orientation.
Configuration 5: Similar to Configuration 4, but shapes also vary in pixel intensity, exhibiting different shades of grey.
Configuration 6: Features a white background with full black circles, triangles, squares, and pentagons. Shapes do not have a uniform orientation.
The dataset is split into training and test sets:
Training Set: Contains 61,440 images, with each image containing 1-8 shapes, evenly distributed across target numerosity counts.
Test Set: Comprises 12,246 images, including a variety of numerosity counts ranging from 0 to 12 shapes per image.
Additionally, each image is accompanied by structured label information provided in a CSV file:
id: A unique number assigned to each image.
config: Indicates the configuration of the image.
target: Denotes the label for the image, representing the number of shapes contained within it.
shape: Specifies whether the image contains bounded or unbounded shapes. Options include 'bounded' or 'unbounded'.
numerosity_id: Identifies the image in the creation process (not crucial for end-users).
shape_id: Identifies the image in the creation process (not crucial for end-users).
split: Labels each image as belonging to either the training or test dataset. Options are 'train' or 'test'.
path: Provides the path to the image file.
This dataset serves as a valuable resource for researchers seeking to explore and enhance machine learning models' ability to comprehend numerosity in various contexts.
创建时间:
2024-05-15



