five

Syn-Vis-v0

收藏
魔搭社区2025-10-24 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/retowyss/Syn-Vis-v0
下载链接
链接失效反馈
官方服务:
资源简介:
# Syn-Vis-v0: A Dataset of Synthetic Faces Syn-Vis-v0 (Synthetic Visage Version 0) is a dataset of 480 synthetic faces generated with Qwen-Image and Qwen-Image-Edit-2509. ![Face Samples](misc/banner-md.png) - **Diversity**: - The dataset is balanced across ethnicities - approximately 60 images per broad category (Asian, Black, Hispanic, White, Indian, Middle Eastern) and 120 ethnically ambiguous images. - Wide range of skin-tones, facial features, hairstyles, hair colors, nose shapes, eye shapes, and eye colors. - **Quality**: - Rendered at 2048x2048 resolution using Qwen-Image-Edit-2509 (BF16) and 50 steps. - Checked for artifacts, defects, and watermarks. - **Style**: semi-realistic, 3d-rendered CGI, with hints of photography and painterly accents. - **Captions**: Natural language descriptions consolidated from multiple caption sources using GPT-OSS-120B. - **Metadata**: Each image is accompanied by ethnicity/race analysis scores (0-100) across six categories (Asian, Indian, Black, White, Middle Eastern, Latino Hispanic) generated using DeepFace. - **Analysis Cards**: Each image has a corresponding analysis card showing similarity to other faces in the dataset. - **Size**: 1.6GB for the 480 images, 0.7GB of misc files (analysis cards, banners, ...). ![Face Similarities](misc/embedding-distances.png) ![Analysis Card (001-0051)](misc/cards/001-0051.png) ## Dataset Structure ```txt Syn-Vis-v0/ ├── images/ │ └── base/ # Main dataset images ├── metadata.csv # Root-level metadata file for Hugging Face preview ├── dataset_info.json # Schema definition for image and metadata fields ├── misc/ # Analysis cards, banners, etc. └── README.md ``` ## Metadata Format The `metadata.csv` contains the following columns: - `file_name`: Image filename (e.g., "001-0042.png") - `caption`: Consolidated natural language description - `race_asian`: Asian demographic score (0-100) - `race_indian`: Indian demographic score (0-100) - `race_black`: Black demographic score (0-100) - `race_white`: White demographic score (0-100) - `race_middle_eastern`: Middle Eastern demographic score (0-100) - `race_latino_hispanic`: Latino Hispanic demographic score (0-100) - `dominant_race`: Primary predicted demographic category ## Caption Quality Captions are consolidated from multiple JoyCaption-Beta-One outputs using GPT-OSS-120B and feature: - **Natural language**: Start with "The woman..." or "A woman..." - **Specific descriptions**: Physical features described precisely rather than using broad demographic categories - **Structured order**: Face → hair/physical features → clothing → background → lighting - **Style-neutral**: Remove technical photography terms and medium references - **Flowing narrative**: Natural sentences without section headers Example caption: > "The woman has a smooth medium‑brown complexion that catches a gentle, even glow. Her eyes are large, dark brown and framed by thick, dark lashes, giving them a calm, slightly serious look as she gazes directly forward..." ## Use Cases - Raw training data for small models - Base images for image-to-image generation tasks - Base images for style transfer - Whatever you want! ## Statistics - **Ethnicities**: (by Dominant Race Counts; some faces show similar scores across multiple categories) - White: 94 images - Latino Hispanic: 93 images - Asian: 90 images - Indian: 70 images - Black: 68 images - Middle Eastern: 65 images - **Skin tones**: Full spectrum from very light to very dark - **Facial features**: Wide variety of eye shapes, nose shapes, lip shapes - **Hair styles**: Various textures, colors, and arrangements - **Backgrounds**: Dark and light, plain and scenic - **Ages**: Almost exclusively 30 ± 5 years according to analysis with DeepFace. > Anecdotally, Asian, White, and Black, were predicted with a single high score (85+) much more frequently than Latino Hispanic, Indian and Middle Eastern. ## Ethical Considerations and Other Notes - The ethnicity/race labels are generated by automated analysis and should not be considered ground truth for real-world applications involving human subjects. Their primary purpose is to ensure coverage of wide range facial features. - Only female-presenting individuals are included. I decided against including male-presenting individuals because **beards** - I didn't know how well the classifiers would handle them (obscured features), so I decided to avoid that complexity. - All faces were explicitly declared female-presenting (in the prompt and the caption), however, DeepFace occasionally suggested some images may be male-presenting. - The dataset has a strong beauty bias and the faces are unusually symmetrical. ## Creation Process 1. **Initial Image Generation**: Generated an initial set of 5,500 images at 768x768 using Qwen-Image (FP8). Facial features were randomly selected from lists and then written into natural prompts by Qwen3:30b-a3b. The style prompt was "Photo taken with telephoto lens (130mm), low ISO, high shutter speed". 2. **Initial Analysis & Captioning**: Each of the 5,500 images was captioned three times using JoyCaption-Beta-One. These initial captions were then consolidated using Qwen3:30b-a3b. Concurrently, demographic analysis was run using DeepFace. 3. **Selection**: A balanced subset of 480 images was selected based on the aggregated demographic scores and visual inspection. 4. **Enhancement**: Minor errors like faint watermarks and artifacts were manually corrected using GIMP. 5. **Upscaling & Refinement**: The selected images were upscaled to 2048x2048 using Qwen-Image-Edit-2509 (BF16) with 50 steps at a CFG of 4. The prompt guided the model to transform the style to a high-quality 3d-rendered CGI portrait while maintaining the original likeness and composition. 6. **Final Captioning**: To ensure captions accurately reflected the final, upscaled images and accounted for any minor perspective shifts, the 480 images were fully re-captioned. Each image was captioned three times with JoyCaption-Beta-One, and these were consolidated into a final, high-quality description using GPT-OSS-120B. 7. **Final Analysis**: Each final image was analyzed using DeepFace to generate the demographic scores and similarity analysis cards present in the dataset. ### Models and Tools Used - **Qwen-Image**: Image Generation - **Qwen-Image-Edit-2509**: Image Refinement/Upscaling - **JoyCaption-Beta-One**: Captioning - **Qwen3:30b-a3b**: Prompt Writing & Initial Caption Consolidation - **GPT-OSS-120B**: Final Caption Consolidation - **Tools**: vLLM, DeepFace, Python, R, GIMP, ComfyUI ## Projects That Use Syn-Vis-v0 - Coming soon! - Your project here? ## Known Issues - 001-0309: Appears to be wearing a mask, likely introduced during the image-to-image upscaling step. ## License - **Images**: CC0 (Public Domain) - Individual synthetic images are released to the public domain - **Dataset compilation, metadata, and documentation**: CC-BY-SA-4.0 - The curation work, analysis, and documentation You may use these images and this dataset for any purpose, including commercial use. If you use this dataset, I will appreciate attribution. ## Citation ```bibtex @dataset{syn-vis-v0-2025, title={Syn-Vis-v0: A Synthetic Face Dataset}, author={Wyss, Reto}, year={2025}, url={https://huggingface.co/datasets/retowyss/Syn-Vis-v0}, note={Images: CC0 (Public Domain); Dataset compilation and documentation: CC-BY-SA-4.0} } ```

# Syn-Vis-v0:合成人脸数据集(Syn-Vis-v0) Syn-Vis-v0(Synthetic Visage Version 0,即合成人脸版本0)是一个包含480张合成人脸的数据集,由Qwen-Image和Qwen-Image-Edit-2509生成。 ![Face Samples](misc/banner-md.png) - **多样性**: 该数据集在种族分布上保持均衡——六大宽泛类别(亚洲人、黑人、西班牙裔拉丁人、白人、印度人、中东人)各约60张图像,另有120张种族模糊的图像。 涵盖丰富的肤色、面部特征、发型、发色、鼻型、眼型及虹膜颜色。 - **质量**: 采用Qwen-Image-Edit-2509(BF16精度)以2048×2048分辨率渲染,共50步推理。 已检查过伪影、缺陷与水印问题。 - **风格**:半写实的3D渲染计算机生成图像(Computer Generated Imagery, CGI),兼具摄影质感与绘画艺术笔触。 - **描述文本**:通过GPT-OSS-120B整合多来源的自然语言描述生成。 - **元数据**:每张图像均附带由DeepFace生成的六大类别(亚洲人、印度人、黑人、白人、中东人、拉丁裔西班牙人)的种族/人种分析得分(0-100区间)。 - **分析卡片**:每张图像均配有对应的分析卡片,展示其与数据集中其他人脸的相似度。 - **规模**:480张图像占用1.6GB存储空间,附属文件(分析卡片、横幅等)占用0.7GB。 ![Face Similarities](misc/embedding-distances.png) ![Analysis Card (001-0051)](misc/cards/001-0051.png) ## 数据集结构 txt Syn-Vis-v0/ ├── images/ │ └── base/ # 主数据集图像目录 ├── metadata.csv # 用于Hugging Face预览的根级元数据文件 ├── dataset_info.json # 图像与元数据字段的架构定义文件 ├── misc/ # 分析卡片、横幅等附属文件目录 └── README.md ## 元数据格式 `metadata.csv`包含以下字段: - `file_name`:图像文件名(例如:"001-0042.png") - `caption`:整合后的自然语言描述文本 - `race_asian`:亚洲人种人口统计得分(0-100) - `race_indian`:印度人种人口统计得分(0-100) - `race_black`:黑人种人口统计得分(0-100) - `race_white`:白人种人口统计得分(0-100) - `race_middle_eastern`:中东人种人口统计得分(0-100) - `race_latino_hispanic`:拉丁裔西班牙籍人种人口统计得分(0-100) - `dominant_race`:主要预测人口统计类别 ## 描述文本质量 描述文本由GPT-OSS-120B整合多份JoyCaption-Beta-One生成的结果而来,具备以下特点: - **自然语言风格**:以"The woman..."或"A woman..."开头 - **精准描述**:对物理特征进行细致刻画,而非使用宽泛的人口统计类别 - **结构化顺序**:按面部→毛发/物理特征→衣着→背景→光线的逻辑排布 - **风格中立**:移除专业摄影术语与媒介相关表述 - **流畅叙事**:自然语句,无分段标题 示例描述文本: > "这位女性拥有光滑的中等棕褐色肤色,在柔和均匀的光线映衬下泛着光泽。她的眼睛大而深邃,呈棕褐色,眼周环绕着浓密的深色睫毛,直视镜头时神情沉静且略带严肃..." ## 应用场景 - 小型模型的原始训练数据 - 图像到图像生成任务的基底图像 - 风格迁移任务的基底图像 - 任意你所需的用途! ## 统计数据 - **种族分布(按主导种族计数;部分面孔在多个类别中得分相近)**: 白人:94张 拉丁裔西班牙人:93张 亚洲人:90张 印度人:70张 黑人:68张 中东人:65张 - **肤色**:覆盖从极浅到极深的全光谱范围 - **面部特征**:包含丰富多样的眼型、鼻型与唇型 - **发型**:涵盖多种纹理、发色与造型 - **背景**:涵盖明暗各异、简洁与场景丰富的多种背景 - **年龄**:经DeepFace分析,几乎全部为30±5岁的成年人。 > 据观察,亚洲人、白人与黑人的单一高得分(85+)预测频率远高于拉丁裔西班牙人、印度人与中东人。 ## 伦理考量与其他说明 - 种族/人种标签由自动化分析生成,不应作为涉及人类主体的真实世界应用中的真值(ground truth)。其核心用途是确保覆盖广泛的面部特征范围。 - 本数据集仅包含女性呈现特征的个体。未纳入男性呈现特征的个体,原因是胡须可能遮挡面部特征,且暂不确定分类器对这类遮挡的处理效果,因此选择规避该复杂度。 - 所有生成人脸均在提示词与描述文本中明确标注为女性呈现特征,但DeepFace偶尔会将部分图像判定为男性呈现特征。 - 该数据集存在较强的审美偏向,人脸对称性异于常规。 ## 生成流程 1. **初始图像生成**:使用Qwen-Image(FP8精度)生成5500张768×768分辨率的初始图像。面部特征从预设列表中随机选取,再由Qwen3:30b-a3b转化为自然语言提示词。风格提示词为"使用130mm长焦镜头拍摄,低ISO,高快门速度"。 2. **初始分析与描述**:对5500张图像各使用JoyCaption-Beta-One生成3次描述,再由Qwen3:30b-a3b整合为最终初始描述。同时使用DeepFace进行人口统计分析。 3. **筛选**:基于聚合后的人口统计得分与人工视觉检查,筛选出均衡分布的480张图像子集。 4. **修正**:使用GIMP手动修正轻微瑕疵,如淡水印与伪影。 5. **放大与优化**:使用Qwen-Image-Edit-2509(BF16精度)将选中图像放大至2048×2048分辨率,共50步推理,CFG(Classifier-Free Guidance)值为4。提示词引导模型将风格转化为高质量3D渲染计算机生成肖像,同时保留原始人像特征与构图。 6. **最终描述生成**:为确保描述文本准确反映放大后的最终图像,并适配可能存在的轻微视角变化,对480张图像重新进行描述:各使用JoyCaption-Beta-One生成3次描述,再由GPT-OSS-120B整合为高质量的最终描述。 7. **最终分析**:使用DeepFace对最终图像进行分析,生成人口统计得分与数据集中的相似度分析卡片。 ### 所用模型与工具 - **Qwen-Image**:图像生成 - **Qwen-Image-Edit-2509**:图像优化与放大 - **JoyCaption-Beta-One**:图像描述生成 - **Qwen3:30b-a3b**:提示词编写与初始描述整合 - **GPT-OSS-120B**:最终描述文本整合 - **工具**:vLLM、DeepFace、Python、R、GIMP、ComfyUI ## 已使用Syn-Vis-v0的项目 - 即将上线! - 你的项目? ## 已知问题 - 001-0309:图像中人物似乎佩戴口罩,大概率是图像到图像放大步骤引入的。 ## 许可协议 - **图像**:CC0(公共领域)—— 单张合成图像已发布至公共领域 - **数据集编译、元数据与文档**:CC-BY-SA-4.0—— 数据集的整理工作、分析内容与文档 您可将本数据集与图像用于任何用途,包括商业用途。若您使用本数据集,我们将感谢您的署名。 ## 引用格式 bibtex @dataset{syn-vis-v0-2025, title={Syn-Vis-v0: 合成人脸数据集}, author={Wyss, Reto}, year={2025}, url={https://huggingface.co/datasets/retowyss/Syn-Vis-v0}, note={图像:CC0(公共领域);数据集编译与文档:CC-BY-SA-4.0} }
提供机构:
maas
创建时间:
2025-10-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作