eezy/basic_shapes_1000
收藏Hugging Face2023-06-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/eezy/basic_shapes_1000
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: mixed
features:
- name: svg
dtype: string
- name: png
dtype: image
- name: layer_mask
dtype: image
- name: object_mask
dtype: image
- name: segments
dtype:
array3_d:
shape:
- -1
- 512
- 512
dtype: bool
splits:
- name: train
num_bytes: 825420657
num_examples: 3200
- name: validation
num_bytes: 103491703
num_examples: 400
- name: test
num_bytes: 14362883
num_examples: 400
download_size: 79499715
dataset_size: 943275243
- config_name: circles
features:
- name: svg
dtype: string
- name: png
dtype: image
- name: layer_mask
dtype: image
- name: object_mask
dtype: image
- name: segments
dtype:
array3_d:
shape:
- -1
- 512
- 512
dtype: bool
splits:
- name: train
num_bytes: 202172900
num_examples: 800
- name: validation
num_bytes: 25380696
num_examples: 100
- name: test
num_bytes: 3587893
num_examples: 100
download_size: 28664837
dataset_size: 231141489
- config_name: squares
features:
- name: svg
dtype: string
- name: png
dtype: image
- name: layer_mask
dtype: image
- name: object_mask
dtype: image
- name: segments
dtype:
array3_d:
shape:
- -1
- 512
- 512
dtype: bool
splits:
- name: train
num_bytes: 209226435
num_examples: 800
- name: validation
num_bytes: 26362720
num_examples: 100
- name: test
num_bytes: 3590905
num_examples: 100
download_size: 10376213
dataset_size: 239180060
- config_name: squares_and_circles
features:
- name: svg
dtype: string
- name: png
dtype: image
- name: layer_mask
dtype: image
- name: object_mask
dtype: image
- name: segments
dtype:
array3_d:
shape:
- -1
- 512
- 512
dtype: bool
splits:
- name: train
num_bytes: 207141741
num_examples: 800
- name: validation
num_bytes: 25735545
num_examples: 100
- name: test
num_bytes: 3590235
num_examples: 100
download_size: 20138547
dataset_size: 236467521
- config_name: scer
features:
- name: svg
dtype: string
- name: png
dtype: image
- name: layer_mask
dtype: image
- name: object_mask
dtype: image
- name: segments
dtype:
array3_d:
shape:
- -1
- 512
- 512
dtype: bool
splits:
- name: train
num_bytes: 206879581
num_examples: 800
- name: validation
num_bytes: 26012748
num_examples: 100
- name: test
num_bytes: 3593856
num_examples: 100
download_size: 20320118
dataset_size: 236486185
---
# Dataset Card for BasicShapes1000
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://eezy.com
### Dataset Summary
This is a synthetic dataset containing randomly-generated SVGs with various shapes
### Supported Tasks and Leaderboards
NA
### Languages
NA
## Dataset Structure
The dataset is composed of 4 base domains, plus a 'mixed' domain that is a superset of the other 4:
* `circles` - only circles
* `squares` - only squares
* `squares_and_circles` - circles and squares present in the same svg
* `scer` - squares, circles, ellipses, and rectangles present in the same svg
* `mixed` - an aggregation of all of the above
### Data Instances
There's stuff there
### Data Fields
Each example has 4 fields:
* `svg` - the raw svg as a string
* `png` - a raster rendering of the svg with a white background
* `object_mask` - a black/white mask that defines the outlines of the svg objects
* `layer_mask` - a greyscale mask that defines layers of svg objects - overlap regions are brighter. Created by making all the objects white and semi-transparent
* `segments` - a numpy array in the shape `(N,512,512), dtype='bool'` where N is the number of svg objects. The array is a mask of each object with `True` in the area of the object
### Data Splits
Train & validation include the layer and object masks, test does not
## Dataset Creation
Generated by randomly inserting objects into an SVG.
### Curation Rationale
Objects should have at least 50% of their bounding box visible - i.e. no big circle completely obscuring a little circle
### Source Data
`/dev/urandom`
#### Initial Data Collection and Normalization
NA
#### Who are the source language producers?
NA
### Annotations
see [Data Fields](#data-fields)
#### Annotation process
see [Data Fields](#data-fields)
#### Who are the annotators?
Imagemagick/pysvg
### Personal and Sensitive Information
Unlikely
## Considerations for Using the Data
Please do not use for world domination.
### Social Impact of Dataset
NA
### Discussion of Biases
Dataset is highly biased against triangles and concave shapes
### Other Known Limitations
Color selection is pretty limited.
## Additional Information
### Dataset Curators
[Aleks Clark](https://github.com/aleksclark)
### Licensing Information
CC-BY
### Citation Information
Link it I guess?
### Contributions
Thanks to [@aleksclark](https://github.com/aleksclark) for adding this dataset.
提供机构:
eezy
原始信息汇总
数据集卡片 for BasicShapes1000
数据集描述
数据集摘要
这是一个包含随机生成的SVG文件的合成数据集,包含各种形状。
支持的任务和排行榜
NA
语言
NA
数据集结构
数据集由4个基础域和一个“混合”域组成,混合域是其他4个域的超集:
circles- 仅包含圆形squares- 仅包含正方形squares_and_circles- 包含圆形和正方形scer- 包含正方形、圆形、椭圆形和矩形mixed- 包含上述所有形状
数据实例
NA
数据字段
每个示例包含以下字段:
svg- 原始SVG字符串png- 带有白色背景的SVG光栅渲染object_mask- 定义SVG对象轮廓的黑白掩码layer_mask- 定义SVG对象层的灰度掩码 - 重叠区域更亮。通过使所有对象为白色和半透明创建segments- 形状为(N,512,512), dtype=bool的numpy数组,其中N是SVG对象的数量。该数组是每个对象的掩码,对象区域为True
数据分割
训练和验证集包含层和对象掩码,测试集不包含
数据集创建
通过随机插入对象到SVG中生成。
策划理由
对象应至少有50%的边界框可见 - 即没有大圆完全遮挡小圆
源数据
/dev/urandom
初始数据收集和规范化
NA
源语言生产者
NA
注释
参见数据字段
注释过程
参见数据字段
注释者
Imagemagick/pysvg
个人和敏感信息
不太可能
使用数据的注意事项
请勿用于世界统治。
数据集的社会影响
NA
偏见讨论
数据集对三角形和凹形有高度偏见
其他已知限制
颜色选择相当有限。
附加信息
数据集策展人
许可信息
CC-BY
引用信息
链接它吧
贡献
感谢@aleksclark添加此数据集。
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个合成数据集,包含1000个随机生成的SVG图形,涵盖圆形、方形、椭圆和矩形等多种形状,但不包括三角形和凹形。每个图形示例提供SVG原始字符串、PNG渲染图及多种掩码信息,适用于图形识别和处理任务。数据集采用CC-BY许可,由Aleks Clark创建。
以上内容由遇见数据集搜集并总结生成



