creative-graphic-design/PosterErase

Name: creative-graphic-design/PosterErase
Creator: creative-graphic-design
Published: 2023-11-19 14:43:14
License: 暂无描述

Hugging Face2023-11-19 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/creative-graphic-design/PosterErase

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - machine-generated language: - zh language_creators: - found license: - cc-by-sa-4.0 multilinguality: - monolingual pretty_name: PosterErase size_categories: [] source_datasets: - original tags: - graphic design task_categories: - other task_ids: [] --- # Dataset Card for PosterErase [![CI](https://github.com/shunk031/huggingface-datasets_PosterErase/actions/workflows/ci.yaml/badge.svg)](https://github.com/shunk031/huggingface-datasets_PosterErase/actions/workflows/ci.yaml) ## Table of Contents - [Dataset Card Creation Guide](#dataset-card-creation-guide) - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Initial Data Collection and Normalization](#initial-data-collection-and-normalization) - [Who are the source language producers?](#who-are-the-source-language-producers) - [Annotations](#annotations) - [Annotation process](#annotation-process) - [Who are the annotators?](#who-are-the-annotators) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://github.com/alimama-creative/Self-supervised-Text-Erasing - **Repository:** https://github.com/shunk031/huggingface-datasets_PosterErase - **Paper (Preprint):** https://arxiv.org/abs/2204.12743 - **Paper (ACMMM2022):** https://dl.acm.org/doi/abs/10.1145/3503161.3547905 ### Dataset Summary ### Supported Tasks and Leaderboards [More Information Needed] ### Languages The language data in PKU-PosterLayout is in Chinese (BCP-47 zh). ## Dataset Structure ### Data Instances To use PosterErase dataset, you need to download the dataset via [Alibaba Cloud](https://tianchi.aliyun.com/dataset/134810). Then place the downloaded files in the following structure and specify its path. ``` /path/to/datasets ├── erase_1.zip ├── erase_2.zip ├── erase_3.zip ├── erase_4.zip ├── erase_5.zip └── erase_6.zip ``` ```python import datasets as ds dataset = ds.load_dataset( path="shunk031/PosterErase", data_dir="/path/to/datasets/", ) ``` ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data [More Information Needed] #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information You can find the following statement in [the license section](https://tianchi.aliyun.com/dataset/134810#license) of t[he dataset distribution location](https://tianchi.aliyun.com/dataset/134810). > The dataset is distributed under the CC BY-SA 4.0 license. However, the license setting on that page appears to be set to [CC-BY-SA-NC 4.0](http://creativecommons.org/licenses/by-sa/4.0/?spm=a2c22.12282016.0.0.7abc5a92qnyxdR). ### Citation Information ```bibtex @inproceedings{jiang2022self, title={Self-supervised text erasing with controllable image synthesis}, author={Jiang, Gangwei and Wang, Shiyao and Ge, Tiezheng and Jiang, Yuning and Wei, Ying and Lian, Defu}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, pages={1973--1983}, year={2022} } ``` ### Contributions Thanks to [alimama-creative](https://github.com/alimama-creative) for creating this dataset.

--- 注释生成者: - 机器生成语言: - zh 语言生成方式: - 公开采集(found) 授权协议: - 知识共享署名-相同方式共享4.0(CC BY-SA 4.0) 多语言属性: - 单语言数据集展示名: PosterErase 数据规模类别: [] 源数据集: - 原创标签: - 平面设计(graphic design) 任务类别: - 其他任务子项: [] --- # PosterErase 数据集卡片 [![CI](https://github.com/shunk031/huggingface-datasets_PosterErase/actions/workflows/ci.yaml/badge.svg)](https://github.com/shunk031/huggingface-datasets_PosterErase/actions/workflows/ci.yaml) ## 目录 - [数据集卡片创建指南](#dataset-card-creation-guide) - [目录](#table-of-contents) - [数据集概述](#dataset-description) - [数据集总结](#dataset-summary) - [支持任务与基准测试集](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [初始数据收集与标准化](#initial-data-collection-and-normalization) - [源语言生产者](#who-are-the-source-language-producers) - [注释](#annotations) - [注释流程](#annotation-process) - [注释者](#who-are-the-annotators) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [授权信息](#licensing-information) - [引用信息](#citation-information) - [贡献致谢](#contributions) ## 数据集概述 - **主页链接**: https://github.com/alimama-creative/Self-supervised-Text-Erasing - **代码仓库**: https://github.com/shunk031/huggingface-datasets_PosterErase - **预印本论文**: https://arxiv.org/abs/2204.12743 - **ACMMM2022 会议论文**: https://dl.acm.org/doi/abs/10.1145/3503161.3547905 ### 数据集总结 [需补充更多信息] ### 支持任务与基准测试集 [需补充更多信息] ### 语言本数据集PKU-PosterLayout中的语言数据采用中文（BCP-47语言标签：zh）。 ## 数据集结构 ### 数据实例使用PosterErase数据集前，需先通过[阿里云天池平台](https://tianchi.aliyun.com/dataset/134810)下载该数据集，随后将下载的文件按照如下结构放置，并指定对应路径： /path/to/datasets ├── erase_1.zip ├── erase_2.zip ├── erase_3.zip ├── erase_4.zip ├── erase_5.zip └── erase_6.zip 可通过如下Python代码加载数据集： python import datasets as ds dataset = ds.load_dataset( path="shunk031/PosterErase", data_dir="/path/to/datasets/", ) ### 数据字段 [需补充更多信息] ### 数据划分 [需补充更多信息] ## 数据集构建 ### 构建初衷 [需补充更多信息] ### 源数据 [需补充更多信息] #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生产者 [需补充更多信息] ### 注释 [需补充更多信息] #### 注释流程 [需补充更多信息] #### 注释者 [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 授权信息可在该数据集发布页面的[授权板块](https://tianchi.aliyun.com/dataset/134810#license)中找到如下声明： > 本数据集采用知识共享署名-相同方式共享4.0（CC BY-SA 4.0）许可协议进行分发。但该页面上的许可设置实际显示为[CC-BY-SA-NC 4.0](http://creativecommons.org/licenses/by-sa/4.0/?spm=a2c22.12282016.0.0.7abc5a92qnyxdR)。 ### 引用信息 bibtex @inproceedings{jiang2022self, title={可控图像合成的自监督文本擦除方法}, author={Jiang, Gangwei and Wang, Shiyao and Ge, Tiezheng and Jiang, Yuning and Wei, Ying and Lian, Defu}, booktitle={第30届ACM国际多媒体会议论文集}, pages={1973--1983}, year={2022} } ### 贡献致谢感谢[alimama-creative](https://github.com/alimama-creative)团队创建本数据集。

提供机构：

creative-graphic-design

原始信息汇总

数据集卡片：PosterErase

数据集描述

语言：中文 (BCP-47 zh)
许可证：CC BY-SA 4.0
标签：graphic design
任务类别：other

数据集结构

数据实例

数据集需要从 Alibaba Cloud 下载，并按照以下结构放置：

/path/to/datasets ├── erase_1.zip ├── erase_2.zip ├── erase_3.zip ├── erase_4.zip ├── erase_5.zip └── erase_6.zip

加载数据集的示例代码：

python import datasets as ds

dataset = ds.load_dataset( path="shunk031/PosterErase", data_dir="/path/to/datasets/", )

许可证信息

数据集遵循 CC BY-SA 4.0 许可证。

引用信息

bibtex @inproceedings{jiang2022self, title={Self-supervised text erasing with controllable image synthesis}, author={Jiang, Gangwei and Wang, Shiyao and Ge, Tiezheng and Jiang, Yuning and Wei, Ying and Lian, Defu}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, pages={1973--1983}, year={2022} }

搜集汇总

数据集介绍

构建方式

PosterErase数据集的构建源于对平面设计领域文本擦除技术的深入研究。该数据集通过机器生成的方式，系统性地收集并标注了中文海报图像中的文本区域，旨在为自监督文本擦除任务提供高质量的基准数据。其构建过程严格遵循学术规范，确保了数据的可靠性与一致性，为后续的算法开发与评估奠定了坚实基础。

特点

PosterErase数据集在图形设计领域展现出鲜明的特色。作为单语种中文数据集，它专注于海报图像中的文本擦除任务，涵盖了多样化的设计风格与文本布局。数据集以CC BY-SA 4.0许可证发布，促进了学术共享与协作。其结构清晰，支持灵活的数据加载与处理，为研究者提供了便捷的实验平台。

使用方法

使用PosterErase数据集时，需从指定云平台下载压缩文件，并按照要求组织目录结构。通过Hugging Face的datasets库，用户可以轻松加载数据集，并指定本地数据路径进行访问。该数据集适用于自监督文本擦除模型的训练与评估，为图形设计与计算机视觉的交叉研究提供了实用工具。

背景与挑战

背景概述

PosterErase数据集由阿里巴巴集团的研究团队于2022年构建，其核心研究聚焦于图形设计领域的文本擦除任务。该数据集旨在支持自监督学习框架下的可控图像合成，为海报设计中的文本元素移除提供基准数据。通过结合机器生成的标注与真实场景的海报图像，PosterErase推动了多媒体内容编辑技术的发展，并在ACM Multimedia等顶级会议上得到认可，对计算机视觉与图形设计的交叉领域产生了显著影响。

当前挑战

PosterErase数据集面临的挑战主要涵盖两个方面：在领域问题层面，文本擦除任务需处理复杂背景下的字体多样性、颜色融合及纹理干扰，确保移除文本后图像视觉连贯性；在构建过程中，数据收集依赖于真实海报图像，需克服版权合规与标注一致性难题，同时自监督方法对合成数据的真实性与可控性提出了较高要求。

常用场景

经典使用场景

在视觉内容生成与编辑领域，PosterErase数据集为自监督文本擦除任务提供了关键支持。该数据集通过机器生成的标注，专注于从海报图像中移除文本元素，同时保持背景视觉内容的完整性。其经典使用场景涉及训练深度学习模型，特别是生成对抗网络（GAN）和扩散模型，以实现精准的文本检测与擦除，为图像修复和内容编辑研究奠定数据基础。

衍生相关工作

基于PosterErase数据集，衍生出多项经典研究工作，如自监督文本擦除框架的提出，该框架结合了可控图像合成技术，显著提升了文本移除的精度与自然度。相关成果发表于ACM Multimedia 2022等顶级会议，推动了图像编辑模型的发展。后续研究进一步扩展了该数据集的应用，例如结合多模态学习进行文本-图像对齐，以及开发更高效的实时擦除算法，为视觉内容生成领域注入了创新动力。

数据集最近研究