Syn-Vis-v0

Name: Syn-Vis-v0
Creator: maas
Published: 2025-10-24 14:44:20
License: 暂无描述

魔搭社区2025-10-24 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/retowyss/Syn-Vis-v0

下载链接

链接失效反馈

官方服务：

资源简介：

# Syn-Vis-v0: A Dataset of Synthetic Faces Syn-Vis-v0 (Synthetic Visage Version 0) is a dataset of 480 synthetic faces generated with Qwen-Image and Qwen-Image-Edit-2509. ![Face Samples](misc/banner-md.png) - **Diversity**: - The dataset is balanced across ethnicities - approximately 60 images per broad category (Asian, Black, Hispanic, White, Indian, Middle Eastern) and 120 ethnically ambiguous images. - Wide range of skin-tones, facial features, hairstyles, hair colors, nose shapes, eye shapes, and eye colors. - **Quality**: - Rendered at 2048x2048 resolution using Qwen-Image-Edit-2509 (BF16) and 50 steps. - Checked for artifacts, defects, and watermarks. - **Style**: semi-realistic, 3d-rendered CGI, with hints of photography and painterly accents. - **Captions**: Natural language descriptions consolidated from multiple caption sources using GPT-OSS-120B. - **Metadata**: Each image is accompanied by ethnicity/race analysis scores (0-100) across six categories (Asian, Indian, Black, White, Middle Eastern, Latino Hispanic) generated using DeepFace. - **Analysis Cards**: Each image has a corresponding analysis card showing similarity to other faces in the dataset. - **Size**: 1.6GB for the 480 images, 0.7GB of misc files (analysis cards, banners, ...). ![Face Similarities](misc/embedding-distances.png) ![Analysis Card (001-0051)](misc/cards/001-0051.png) ## Dataset Structure ```txt Syn-Vis-v0/ ├── images/ │ └── base/ # Main dataset images ├── metadata.csv # Root-level metadata file for Hugging Face preview ├── dataset_info.json # Schema definition for image and metadata fields ├── misc/ # Analysis cards, banners, etc. └── README.md ``` ## Metadata Format The `metadata.csv` contains the following columns: - `file_name`: Image filename (e.g., "001-0042.png") - `caption`: Consolidated natural language description - `race_asian`: Asian demographic score (0-100) - `race_indian`: Indian demographic score (0-100) - `race_black`: Black demographic score (0-100) - `race_white`: White demographic score (0-100) - `race_middle_eastern`: Middle Eastern demographic score (0-100) - `race_latino_hispanic`: Latino Hispanic demographic score (0-100) - `dominant_race`: Primary predicted demographic category ## Caption Quality Captions are consolidated from multiple JoyCaption-Beta-One outputs using GPT-OSS-120B and feature: - **Natural language**: Start with "The woman..." or "A woman..." - **Specific descriptions**: Physical features described precisely rather than using broad demographic categories - **Structured order**: Face → hair/physical features → clothing → background → lighting - **Style-neutral**: Remove technical photography terms and medium references - **Flowing narrative**: Natural sentences without section headers Example caption: > "The woman has a smooth medium‑brown complexion that catches a gentle, even glow. Her eyes are large, dark brown and framed by thick, dark lashes, giving them a calm, slightly serious look as she gazes directly forward..." ## Use Cases - Raw training data for small models - Base images for image-to-image generation tasks - Base images for style transfer - Whatever you want! ## Statistics - **Ethnicities**: (by Dominant Race Counts; some faces show similar scores across multiple categories) - White: 94 images - Latino Hispanic: 93 images - Asian: 90 images - Indian: 70 images - Black: 68 images - Middle Eastern: 65 images - **Skin tones**: Full spectrum from very light to very dark - **Facial features**: Wide variety of eye shapes, nose shapes, lip shapes - **Hair styles**: Various textures, colors, and arrangements - **Backgrounds**: Dark and light, plain and scenic - **Ages**: Almost exclusively 30 ± 5 years according to analysis with DeepFace. > Anecdotally, Asian, White, and Black, were predicted with a single high score (85+) much more frequently than Latino Hispanic, Indian and Middle Eastern. ## Ethical Considerations and Other Notes - The ethnicity/race labels are generated by automated analysis and should not be considered ground truth for real-world applications involving human subjects. Their primary purpose is to ensure coverage of wide range facial features. - Only female-presenting individuals are included. I decided against including male-presenting individuals because **beards** - I didn't know how well the classifiers would handle them (obscured features), so I decided to avoid that complexity. - All faces were explicitly declared female-presenting (in the prompt and the caption), however, DeepFace occasionally suggested some images may be male-presenting. - The dataset has a strong beauty bias and the faces are unusually symmetrical. ## Creation Process 1. **Initial Image Generation**: Generated an initial set of 5,500 images at 768x768 using Qwen-Image (FP8). Facial features were randomly selected from lists and then written into natural prompts by Qwen3:30b-a3b. The style prompt was "Photo taken with telephoto lens (130mm), low ISO, high shutter speed". 2. **Initial Analysis & Captioning**: Each of the 5,500 images was captioned three times using JoyCaption-Beta-One. These initial captions were then consolidated using Qwen3:30b-a3b. Concurrently, demographic analysis was run using DeepFace. 3. **Selection**: A balanced subset of 480 images was selected based on the aggregated demographic scores and visual inspection. 4. **Enhancement**: Minor errors like faint watermarks and artifacts were manually corrected using GIMP. 5. **Upscaling & Refinement**: The selected images were upscaled to 2048x2048 using Qwen-Image-Edit-2509 (BF16) with 50 steps at a CFG of 4. The prompt guided the model to transform the style to a high-quality 3d-rendered CGI portrait while maintaining the original likeness and composition. 6. **Final Captioning**: To ensure captions accurately reflected the final, upscaled images and accounted for any minor perspective shifts, the 480 images were fully re-captioned. Each image was captioned three times with JoyCaption-Beta-One, and these were consolidated into a final, high-quality description using GPT-OSS-120B. 7. **Final Analysis**: Each final image was analyzed using DeepFace to generate the demographic scores and similarity analysis cards present in the dataset. ### Models and Tools Used - **Qwen-Image**: Image Generation - **Qwen-Image-Edit-2509**: Image Refinement/Upscaling - **JoyCaption-Beta-One**: Captioning - **Qwen3:30b-a3b**: Prompt Writing & Initial Caption Consolidation - **GPT-OSS-120B**: Final Caption Consolidation - **Tools**: vLLM, DeepFace, Python, R, GIMP, ComfyUI ## Projects That Use Syn-Vis-v0 - Coming soon! - Your project here? ## Known Issues - 001-0309: Appears to be wearing a mask, likely introduced during the image-to-image upscaling step. ## License - **Images**: CC0 (Public Domain) - Individual synthetic images are released to the public domain - **Dataset compilation, metadata, and documentation**: CC-BY-SA-4.0 - The curation work, analysis, and documentation You may use these images and this dataset for any purpose, including commercial use. If you use this dataset, I will appreciate attribution. ## Citation ```bibtex @dataset{syn-vis-v0-2025, title={Syn-Vis-v0: A Synthetic Face Dataset}, author={Wyss, Reto}, year={2025}, url={https://huggingface.co/datasets/retowyss/Syn-Vis-v0}, note={Images: CC0 (Public Domain); Dataset compilation and documentation: CC-BY-SA-4.0} } ```

# Syn-Vis-v0：合成人脸数据集（Syn-Vis-v0） Syn-Vis-v0（Synthetic Visage Version 0，即合成人脸版本0）是一个包含480张合成人脸的数据集，由Qwen-Image和Qwen-Image-Edit-2509生成。 ![Face Samples](misc/banner-md.png) - **多样性**：该数据集在种族分布上保持均衡——六大宽泛类别（亚洲人、黑人、西班牙裔拉丁人、白人、印度人、中东人）各约60张图像，另有120张种族模糊的图像。涵盖丰富的肤色、面部特征、发型、发色、鼻型、眼型及虹膜颜色。 - **质量**：采用Qwen-Image-Edit-2509（BF16精度）以2048×2048分辨率渲染，共50步推理。已检查过伪影、缺陷与水印问题。 - **风格**：半写实的3D渲染计算机生成图像（Computer Generated Imagery, CGI），兼具摄影质感与绘画艺术笔触。 - **描述文本**：通过GPT-OSS-120B整合多来源的自然语言描述生成。 - **元数据**：每张图像均附带由DeepFace生成的六大类别（亚洲人、印度人、黑人、白人、中东人、拉丁裔西班牙人）的种族/人种分析得分（0-100区间）。 - **分析卡片**：每张图像均配有对应的分析卡片，展示其与数据集中其他人脸的相似度。 - **规模**：480张图像占用1.6GB存储空间，附属文件（分析卡片、横幅等）占用0.7GB。 ![Face Similarities](misc/embedding-distances.png) ![Analysis Card (001-0051)](misc/cards/001-0051.png) ## 数据集结构 txt Syn-Vis-v0/ ├── images/ │ └── base/ # 主数据集图像目录 ├── metadata.csv # 用于Hugging Face预览的根级元数据文件 ├── dataset_info.json # 图像与元数据字段的架构定义文件 ├── misc/ # 分析卡片、横幅等附属文件目录 └── README.md ## 元数据格式 `metadata.csv`包含以下字段： - `file_name`：图像文件名（例如："001-0042.png"） - `caption`：整合后的自然语言描述文本 - `race_asian`：亚洲人种人口统计得分（0-100） - `race_indian`：印度人种人口统计得分（0-100） - `race_black`：黑人种人口统计得分（0-100） - `race_white`：白人种人口统计得分（0-100） - `race_middle_eastern`：中东人种人口统计得分（0-100） - `race_latino_hispanic`：拉丁裔西班牙籍人种人口统计得分（0-100） - `dominant_race`：主要预测人口统计类别 ## 描述文本质量描述文本由GPT-OSS-120B整合多份JoyCaption-Beta-One生成的结果而来，具备以下特点： - **自然语言风格**：以"The woman..."或"A woman..."开头 - **精准描述**：对物理特征进行细致刻画，而非使用宽泛的人口统计类别 - **结构化顺序**：按面部→毛发/物理特征→衣着→背景→光线的逻辑排布 - **风格中立**：移除专业摄影术语与媒介相关表述 - **流畅叙事**：自然语句，无分段标题示例描述文本： > "这位女性拥有光滑的中等棕褐色肤色，在柔和均匀的光线映衬下泛着光泽。她的眼睛大而深邃，呈棕褐色，眼周环绕着浓密的深色睫毛，直视镜头时神情沉静且略带严肃..." ## 应用场景 - 小型模型的原始训练数据 - 图像到图像生成任务的基底图像 - 风格迁移任务的基底图像 - 任意你所需的用途！ ## 统计数据 - **种族分布（按主导种族计数；部分面孔在多个类别中得分相近）**：白人：94张拉丁裔西班牙人：93张亚洲人：90张印度人：70张黑人：68张中东人：65张 - **肤色**：覆盖从极浅到极深的全光谱范围 - **面部特征**：包含丰富多样的眼型、鼻型与唇型 - **发型**：涵盖多种纹理、发色与造型 - **背景**：涵盖明暗各异、简洁与场景丰富的多种背景 - **年龄**：经DeepFace分析，几乎全部为30±5岁的成年人。 > 据观察，亚洲人、白人与黑人的单一高得分（85+）预测频率远高于拉丁裔西班牙人、印度人与中东人。 ## 伦理考量与其他说明 - 种族/人种标签由自动化分析生成，不应作为涉及人类主体的真实世界应用中的真值（ground truth）。其核心用途是确保覆盖广泛的面部特征范围。 - 本数据集仅包含女性呈现特征的个体。未纳入男性呈现特征的个体，原因是胡须可能遮挡面部特征，且暂不确定分类器对这类遮挡的处理效果，因此选择规避该复杂度。 - 所有生成人脸均在提示词与描述文本中明确标注为女性呈现特征，但DeepFace偶尔会将部分图像判定为男性呈现特征。 - 该数据集存在较强的审美偏向，人脸对称性异于常规。 ## 生成流程 1. **初始图像生成**：使用Qwen-Image（FP8精度）生成5500张768×768分辨率的初始图像。面部特征从预设列表中随机选取，再由Qwen3:30b-a3b转化为自然语言提示词。风格提示词为"使用130mm长焦镜头拍摄，低ISO，高快门速度"。 2. **初始分析与描述**：对5500张图像各使用JoyCaption-Beta-One生成3次描述，再由Qwen3:30b-a3b整合为最终初始描述。同时使用DeepFace进行人口统计分析。 3. **筛选**：基于聚合后的人口统计得分与人工视觉检查，筛选出均衡分布的480张图像子集。 4. **修正**：使用GIMP手动修正轻微瑕疵，如淡水印与伪影。 5. **放大与优化**：使用Qwen-Image-Edit-2509（BF16精度）将选中图像放大至2048×2048分辨率，共50步推理，CFG（Classifier-Free Guidance）值为4。提示词引导模型将风格转化为高质量3D渲染计算机生成肖像，同时保留原始人像特征与构图。 6. **最终描述生成**：为确保描述文本准确反映放大后的最终图像，并适配可能存在的轻微视角变化，对480张图像重新进行描述：各使用JoyCaption-Beta-One生成3次描述，再由GPT-OSS-120B整合为高质量的最终描述。 7. **最终分析**：使用DeepFace对最终图像进行分析，生成人口统计得分与数据集中的相似度分析卡片。 ### 所用模型与工具 - **Qwen-Image**：图像生成 - **Qwen-Image-Edit-2509**：图像优化与放大 - **JoyCaption-Beta-One**：图像描述生成 - **Qwen3:30b-a3b**：提示词编写与初始描述整合 - **GPT-OSS-120B**：最终描述文本整合 - **工具**：vLLM、DeepFace、Python、R、GIMP、ComfyUI ## 已使用Syn-Vis-v0的项目 - 即将上线！ - 你的项目？ ## 已知问题 - 001-0309：图像中人物似乎佩戴口罩，大概率是图像到图像放大步骤引入的。 ## 许可协议 - **图像**：CC0（公共领域）—— 单张合成图像已发布至公共领域 - **数据集编译、元数据与文档**：CC-BY-SA-4.0—— 数据集的整理工作、分析内容与文档您可将本数据集与图像用于任何用途，包括商业用途。若您使用本数据集，我们将感谢您的署名。 ## 引用格式 bibtex @dataset{syn-vis-v0-2025, title={Syn-Vis-v0: 合成人脸数据集}, author={Wyss, Reto}, year={2025}, url={https://huggingface.co/datasets/retowyss/Syn-Vis-v0}, note={图像：CC0（公共领域）；数据集编译与文档：CC-BY-SA-4.0} }

提供机构：

maas

创建时间：

2025-10-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集