five

eezy/basic_shapes_1000

收藏
Hugging Face2023-06-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/eezy/basic_shapes_1000
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: mixed features: - name: svg dtype: string - name: png dtype: image - name: layer_mask dtype: image - name: object_mask dtype: image - name: segments dtype: array3_d: shape: - -1 - 512 - 512 dtype: bool splits: - name: train num_bytes: 825420657 num_examples: 3200 - name: validation num_bytes: 103491703 num_examples: 400 - name: test num_bytes: 14362883 num_examples: 400 download_size: 79499715 dataset_size: 943275243 - config_name: circles features: - name: svg dtype: string - name: png dtype: image - name: layer_mask dtype: image - name: object_mask dtype: image - name: segments dtype: array3_d: shape: - -1 - 512 - 512 dtype: bool splits: - name: train num_bytes: 202172900 num_examples: 800 - name: validation num_bytes: 25380696 num_examples: 100 - name: test num_bytes: 3587893 num_examples: 100 download_size: 28664837 dataset_size: 231141489 - config_name: squares features: - name: svg dtype: string - name: png dtype: image - name: layer_mask dtype: image - name: object_mask dtype: image - name: segments dtype: array3_d: shape: - -1 - 512 - 512 dtype: bool splits: - name: train num_bytes: 209226435 num_examples: 800 - name: validation num_bytes: 26362720 num_examples: 100 - name: test num_bytes: 3590905 num_examples: 100 download_size: 10376213 dataset_size: 239180060 - config_name: squares_and_circles features: - name: svg dtype: string - name: png dtype: image - name: layer_mask dtype: image - name: object_mask dtype: image - name: segments dtype: array3_d: shape: - -1 - 512 - 512 dtype: bool splits: - name: train num_bytes: 207141741 num_examples: 800 - name: validation num_bytes: 25735545 num_examples: 100 - name: test num_bytes: 3590235 num_examples: 100 download_size: 20138547 dataset_size: 236467521 - config_name: scer features: - name: svg dtype: string - name: png dtype: image - name: layer_mask dtype: image - name: object_mask dtype: image - name: segments dtype: array3_d: shape: - -1 - 512 - 512 dtype: bool splits: - name: train num_bytes: 206879581 num_examples: 800 - name: validation num_bytes: 26012748 num_examples: 100 - name: test num_bytes: 3593856 num_examples: 100 download_size: 20320118 dataset_size: 236486185 --- # Dataset Card for BasicShapes1000 ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://eezy.com ### Dataset Summary This is a synthetic dataset containing randomly-generated SVGs with various shapes ### Supported Tasks and Leaderboards NA ### Languages NA ## Dataset Structure The dataset is composed of 4 base domains, plus a 'mixed' domain that is a superset of the other 4: * `circles` - only circles * `squares` - only squares * `squares_and_circles` - circles and squares present in the same svg * `scer` - squares, circles, ellipses, and rectangles present in the same svg * `mixed` - an aggregation of all of the above ### Data Instances There's stuff there ### Data Fields Each example has 4 fields: * `svg` - the raw svg as a string * `png` - a raster rendering of the svg with a white background * `object_mask` - a black/white mask that defines the outlines of the svg objects * `layer_mask` - a greyscale mask that defines layers of svg objects - overlap regions are brighter. Created by making all the objects white and semi-transparent * `segments` - a numpy array in the shape `(N,512,512), dtype='bool'` where N is the number of svg objects. The array is a mask of each object with `True` in the area of the object ### Data Splits Train & validation include the layer and object masks, test does not ## Dataset Creation Generated by randomly inserting objects into an SVG. ### Curation Rationale Objects should have at least 50% of their bounding box visible - i.e. no big circle completely obscuring a little circle ### Source Data `/dev/urandom` #### Initial Data Collection and Normalization NA #### Who are the source language producers? NA ### Annotations see [Data Fields](#data-fields) #### Annotation process see [Data Fields](#data-fields) #### Who are the annotators? Imagemagick/pysvg ### Personal and Sensitive Information Unlikely ## Considerations for Using the Data Please do not use for world domination. ### Social Impact of Dataset NA ### Discussion of Biases Dataset is highly biased against triangles and concave shapes ### Other Known Limitations Color selection is pretty limited. ## Additional Information ### Dataset Curators [Aleks Clark](https://github.com/aleksclark) ### Licensing Information CC-BY ### Citation Information Link it I guess? ### Contributions Thanks to [@aleksclark](https://github.com/aleksclark) for adding this dataset.
提供机构:
eezy
原始信息汇总

数据集卡片 for BasicShapes1000

数据集描述

数据集摘要

这是一个包含随机生成的SVG文件的合成数据集,包含各种形状。

支持的任务和排行榜

NA

语言

NA

数据集结构

数据集由4个基础域和一个“混合”域组成,混合域是其他4个域的超集:

  • circles - 仅包含圆形
  • squares - 仅包含正方形
  • squares_and_circles - 包含圆形和正方形
  • scer - 包含正方形、圆形、椭圆形和矩形
  • mixed - 包含上述所有形状

数据实例

NA

数据字段

每个示例包含以下字段:

  • svg - 原始SVG字符串
  • png - 带有白色背景的SVG光栅渲染
  • object_mask - 定义SVG对象轮廓的黑白掩码
  • layer_mask - 定义SVG对象层的灰度掩码 - 重叠区域更亮。通过使所有对象为白色和半透明创建
  • segments - 形状为(N,512,512), dtype=bool的numpy数组,其中N是SVG对象的数量。该数组是每个对象的掩码,对象区域为True

数据分割

训练和验证集包含层和对象掩码,测试集不包含

数据集创建

通过随机插入对象到SVG中生成。

策划理由

对象应至少有50%的边界框可见 - 即没有大圆完全遮挡小圆

源数据

/dev/urandom

初始数据收集和规范化

NA

源语言生产者

NA

注释

参见数据字段

注释过程

参见数据字段

注释者

Imagemagick/pysvg

个人和敏感信息

不太可能

使用数据的注意事项

请勿用于世界统治。

数据集的社会影响

NA

偏见讨论

数据集对三角形和凹形有高度偏见

其他已知限制

颜色选择相当有限。

附加信息

数据集策展人

Aleks Clark

许可信息

CC-BY

引用信息

链接它吧

贡献

感谢@aleksclark添加此数据集。

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个合成数据集,包含1000个随机生成的SVG图形,涵盖圆形、方形、椭圆和矩形等多种形状,但不包括三角形和凹形。每个图形示例提供SVG原始字符串、PNG渲染图及多种掩码信息,适用于图形识别和处理任务。数据集采用CC-BY许可,由Aleks Clark创建。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作