StorySalon
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/HaoningWu/StorySalon
下载链接
链接失效反馈官方服务:
资源简介:
为了解决开放式视觉故事生成的数据短缺问题,我们从多个数据来源(线上视频和6个开源电子图书馆)中收集了大量文本-图像样本对序列(paired image-text sequences),并建立了一套完善的数据处理流水线,构建了一个具有多种多样人物、故事情节和风格的大规模数据集,命名为StorySalon。
多样的数据源:我们从视频(提供下载URLs)和开源电子书(遵循CC-BY 4.0许可证)中搜集了包含丰富人物、故事情节和艺术风格的视觉故事。
数据处理流水线:我们构建了包括视觉帧提取、重复帧筛除、异常帧检测、视觉-语言对齐、视觉描述文本生成、文字检测和后处理等多个步骤的完善的数据处理流水线,将元数据处理为适合模型训练的形式。随着元数据的扩充,该流水线可以很容易地完成迁移,进而进一步扩充StorySalon数据集的规模。
数据集优势:相较于以往仅包含不到10个角色且词汇量和故事长度有限的数据集,我们的StorySalon数据集具有规模更大的词汇表,包含数百个类别的数千个角色,因而更适合开放式任务。
To address the data scarcity issue in open-ended visual story generation, we collected a large number of paired image-text sequences from multiple sources (online videos and 6 open-source electronic libraries), established a comprehensive data processing pipeline, and constructed a large-scale dataset named StorySalon with diverse characters, story plots and artistic styles.
Diverse Data Sources: We collected visual stories rich in characters, plotlines and artistic styles from videos (with downloadable URLs provided) and open-source e-books (released under CC-BY 4.0 license).
Data Processing Pipeline: We built a comprehensive data processing pipeline covering multiple steps including visual frame extraction, duplicate frame filtering, abnormal frame detection, vision-language alignment, visual caption generation, text detection and post-processing, to convert raw metadata into a format suitable for model training. The pipeline can be easily transferred as metadata expands, further scaling up the size of the StorySalon dataset.
Dataset Advantages: Compared with previous datasets that only contain fewer than 10 characters with limited vocabulary and story length, our StorySalon dataset has a much larger vocabulary and includes thousands of characters across hundreds of categories, making it more suitable for open-ended tasks.
提供机构:
HaoningWu
创建时间:
2024-03-11
搜集汇总
数据集介绍

背景与挑战
背景概述
StorySalon是一个公开的图像文本条件图像生成数据集,由HaoningWu创建,包含19.4k个文件,总大小为9.0GB,采用MIT许可证。
以上内容由遇见数据集搜集并总结生成



