mrzjy/AniGamePersonaCaps
收藏Hugging Face2024-12-16 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/mrzjy/AniGamePersonaCaps
下载链接
链接失效反馈官方服务:
资源简介:
AniGamePersonaCap是一个多模态数据集,包含了来自3,860个Fandom wiki站点的633,565个动漫、漫画和游戏角色。数据集涵盖了图像和文本两种模态,图像模态包括角色形象的视觉内容,文本模态包括从HTML内容中提取的角色元信息、由视觉语言模型生成的视觉外观和性格描述、部分由人类编写的描述以及经过GPT-4o-mini匿名化处理的描述。数据集的结构包括角色的元数据(如角色名称、站点名称、URL等)和描述(包括外观和性格的描述)。数据集的收集过程涉及从Fandom站点抓取角色页面,并通过Qwen-VL模型进行图像分类和过滤。数据处理包括去重、分类和HTML解析。数据集的应用包括模型性能比较、幻觉分析、模型蒸馏、图像生成模型的微调等。
AniGamePersonaCap is a multimodal dataset that curates a collection of 633,565 beloved anime, manga, and game characters from 3,860 Fandom wiki sites. The dataset encompasses both image and text modalities. The image modality includes visual representations of character figures, while the text modality includes metadata extracted from HTML content, descriptions of visual appearance and inferred personality generated by Vision-Language Models, partially human-written descriptions, and anonymized versions adapted by GPT-4o-mini. The dataset structure includes metadata (such as character name, site name, URL, etc.) and descriptions (including appearance and personality). The data collection process involves scraping character pages from Fandom sites and classifying and filtering images using the Qwen-VL model. Data processing includes deduplication, classification, and HTML parsing. Potential applications of the dataset include model performance comparison, hallucination analysis, model distillation, and fine-tuning of text-to-image models.
提供机构:
mrzjy



