Syn-Vis-v0
收藏魔搭社区2025-10-24 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/retowyss/Syn-Vis-v0
下载链接
链接失效反馈官方服务:
资源简介:
# Syn-Vis-v0: A Dataset of Synthetic Faces
Syn-Vis-v0 (Synthetic Visage Version 0) is a dataset of 480 synthetic faces generated with Qwen-Image and Qwen-Image-Edit-2509.

- **Diversity**:
- The dataset is balanced across ethnicities - approximately 60 images per broad category (Asian, Black, Hispanic, White, Indian, Middle Eastern) and 120 ethnically ambiguous images.
- Wide range of skin-tones, facial features, hairstyles, hair colors, nose shapes, eye shapes, and eye colors.
- **Quality**:
- Rendered at 2048x2048 resolution using Qwen-Image-Edit-2509 (BF16) and 50 steps.
- Checked for artifacts, defects, and watermarks.
- **Style**: semi-realistic, 3d-rendered CGI, with hints of photography and painterly accents.
- **Captions**: Natural language descriptions consolidated from multiple caption sources using GPT-OSS-120B.
- **Metadata**: Each image is accompanied by ethnicity/race analysis scores (0-100) across six categories (Asian, Indian, Black, White, Middle Eastern, Latino Hispanic) generated using DeepFace.
- **Analysis Cards**: Each image has a corresponding analysis card showing similarity to other faces in the dataset.
- **Size**: 1.6GB for the 480 images, 0.7GB of misc files (analysis cards, banners, ...).


## Dataset Structure
```txt
Syn-Vis-v0/
├── images/
│ └── base/ # Main dataset images
├── metadata.csv # Root-level metadata file for Hugging Face preview
├── dataset_info.json # Schema definition for image and metadata fields
├── misc/ # Analysis cards, banners, etc.
└── README.md
```
## Metadata Format
The `metadata.csv` contains the following columns:
- `file_name`: Image filename (e.g., "001-0042.png")
- `caption`: Consolidated natural language description
- `race_asian`: Asian demographic score (0-100)
- `race_indian`: Indian demographic score (0-100)
- `race_black`: Black demographic score (0-100)
- `race_white`: White demographic score (0-100)
- `race_middle_eastern`: Middle Eastern demographic score (0-100)
- `race_latino_hispanic`: Latino Hispanic demographic score (0-100)
- `dominant_race`: Primary predicted demographic category
## Caption Quality
Captions are consolidated from multiple JoyCaption-Beta-One outputs using GPT-OSS-120B and feature:
- **Natural language**: Start with "The woman..." or "A woman..."
- **Specific descriptions**: Physical features described precisely rather than using broad demographic categories
- **Structured order**: Face → hair/physical features → clothing → background → lighting
- **Style-neutral**: Remove technical photography terms and medium references
- **Flowing narrative**: Natural sentences without section headers
Example caption:
> "The woman has a smooth medium‑brown complexion that catches a gentle, even glow. Her eyes are large, dark brown and framed by thick, dark lashes, giving them a calm, slightly serious look as she gazes directly forward..."
## Use Cases
- Raw training data for small models
- Base images for image-to-image generation tasks
- Base images for style transfer
- Whatever you want!
## Statistics
- **Ethnicities**: (by Dominant Race Counts; some faces show similar scores across multiple categories)
- White: 94 images
- Latino Hispanic: 93 images
- Asian: 90 images
- Indian: 70 images
- Black: 68 images
- Middle Eastern: 65 images
- **Skin tones**: Full spectrum from very light to very dark
- **Facial features**: Wide variety of eye shapes, nose shapes, lip shapes
- **Hair styles**: Various textures, colors, and arrangements
- **Backgrounds**: Dark and light, plain and scenic
- **Ages**: Almost exclusively 30 ± 5 years according to analysis with DeepFace.
> Anecdotally, Asian, White, and Black, were predicted with a single high score (85+) much more frequently than Latino Hispanic, Indian and Middle Eastern.
## Ethical Considerations and Other Notes
- The ethnicity/race labels are generated by automated analysis and should not be considered ground truth for real-world applications involving human subjects. Their primary purpose is to ensure coverage of wide range facial features.
- Only female-presenting individuals are included. I decided against including male-presenting individuals because **beards** - I didn't know how well the classifiers would handle them (obscured features), so I decided to avoid that complexity.
- All faces were explicitly declared female-presenting (in the prompt and the caption), however, DeepFace occasionally suggested some images may be male-presenting.
- The dataset has a strong beauty bias and the faces are unusually symmetrical.
## Creation Process
1. **Initial Image Generation**: Generated an initial set of 5,500 images at 768x768 using Qwen-Image (FP8). Facial features were randomly selected from lists and then written into natural prompts by Qwen3:30b-a3b. The style prompt was "Photo taken with telephoto lens (130mm), low ISO, high shutter speed".
2. **Initial Analysis & Captioning**: Each of the 5,500 images was captioned three times using JoyCaption-Beta-One. These initial captions were then consolidated using Qwen3:30b-a3b. Concurrently, demographic analysis was run using DeepFace.
3. **Selection**: A balanced subset of 480 images was selected based on the aggregated demographic scores and visual inspection.
4. **Enhancement**: Minor errors like faint watermarks and artifacts were manually corrected using GIMP.
5. **Upscaling & Refinement**: The selected images were upscaled to 2048x2048 using Qwen-Image-Edit-2509 (BF16) with 50 steps at a CFG of 4. The prompt guided the model to transform the style to a high-quality 3d-rendered CGI portrait while maintaining the original likeness and composition.
6. **Final Captioning**: To ensure captions accurately reflected the final, upscaled images and accounted for any minor perspective shifts, the 480 images were fully re-captioned. Each image was captioned three times with JoyCaption-Beta-One, and these were consolidated into a final, high-quality description using GPT-OSS-120B.
7. **Final Analysis**: Each final image was analyzed using DeepFace to generate the demographic scores and similarity analysis cards present in the dataset.
### Models and Tools Used
- **Qwen-Image**: Image Generation
- **Qwen-Image-Edit-2509**: Image Refinement/Upscaling
- **JoyCaption-Beta-One**: Captioning
- **Qwen3:30b-a3b**: Prompt Writing & Initial Caption Consolidation
- **GPT-OSS-120B**: Final Caption Consolidation
- **Tools**: vLLM, DeepFace, Python, R, GIMP, ComfyUI
## Projects That Use Syn-Vis-v0
- Coming soon!
- Your project here?
## Known Issues
- 001-0309: Appears to be wearing a mask, likely introduced during the image-to-image upscaling step.
## License
- **Images**: CC0 (Public Domain) - Individual synthetic images are released to the public domain
- **Dataset compilation, metadata, and documentation**: CC-BY-SA-4.0 - The curation work, analysis, and documentation
You may use these images and this dataset for any purpose, including commercial use. If you use this dataset, I will appreciate attribution.
## Citation
```bibtex
@dataset{syn-vis-v0-2025,
title={Syn-Vis-v0: A Synthetic Face Dataset},
author={Wyss, Reto},
year={2025},
url={https://huggingface.co/datasets/retowyss/Syn-Vis-v0},
note={Images: CC0 (Public Domain); Dataset compilation and documentation: CC-BY-SA-4.0}
}
```
# Syn-Vis-v0:合成人脸数据集(Syn-Vis-v0)
Syn-Vis-v0(Synthetic Visage Version 0,即合成人脸版本0)是一个包含480张合成人脸的数据集,由Qwen-Image和Qwen-Image-Edit-2509生成。

- **多样性**:
该数据集在种族分布上保持均衡——六大宽泛类别(亚洲人、黑人、西班牙裔拉丁人、白人、印度人、中东人)各约60张图像,另有120张种族模糊的图像。
涵盖丰富的肤色、面部特征、发型、发色、鼻型、眼型及虹膜颜色。
- **质量**:
采用Qwen-Image-Edit-2509(BF16精度)以2048×2048分辨率渲染,共50步推理。
已检查过伪影、缺陷与水印问题。
- **风格**:半写实的3D渲染计算机生成图像(Computer Generated Imagery, CGI),兼具摄影质感与绘画艺术笔触。
- **描述文本**:通过GPT-OSS-120B整合多来源的自然语言描述生成。
- **元数据**:每张图像均附带由DeepFace生成的六大类别(亚洲人、印度人、黑人、白人、中东人、拉丁裔西班牙人)的种族/人种分析得分(0-100区间)。
- **分析卡片**:每张图像均配有对应的分析卡片,展示其与数据集中其他人脸的相似度。
- **规模**:480张图像占用1.6GB存储空间,附属文件(分析卡片、横幅等)占用0.7GB。


## 数据集结构
txt
Syn-Vis-v0/
├── images/
│ └── base/ # 主数据集图像目录
├── metadata.csv # 用于Hugging Face预览的根级元数据文件
├── dataset_info.json # 图像与元数据字段的架构定义文件
├── misc/ # 分析卡片、横幅等附属文件目录
└── README.md
## 元数据格式
`metadata.csv`包含以下字段:
- `file_name`:图像文件名(例如:"001-0042.png")
- `caption`:整合后的自然语言描述文本
- `race_asian`:亚洲人种人口统计得分(0-100)
- `race_indian`:印度人种人口统计得分(0-100)
- `race_black`:黑人种人口统计得分(0-100)
- `race_white`:白人种人口统计得分(0-100)
- `race_middle_eastern`:中东人种人口统计得分(0-100)
- `race_latino_hispanic`:拉丁裔西班牙籍人种人口统计得分(0-100)
- `dominant_race`:主要预测人口统计类别
## 描述文本质量
描述文本由GPT-OSS-120B整合多份JoyCaption-Beta-One生成的结果而来,具备以下特点:
- **自然语言风格**:以"The woman..."或"A woman..."开头
- **精准描述**:对物理特征进行细致刻画,而非使用宽泛的人口统计类别
- **结构化顺序**:按面部→毛发/物理特征→衣着→背景→光线的逻辑排布
- **风格中立**:移除专业摄影术语与媒介相关表述
- **流畅叙事**:自然语句,无分段标题
示例描述文本:
> "这位女性拥有光滑的中等棕褐色肤色,在柔和均匀的光线映衬下泛着光泽。她的眼睛大而深邃,呈棕褐色,眼周环绕着浓密的深色睫毛,直视镜头时神情沉静且略带严肃..."
## 应用场景
- 小型模型的原始训练数据
- 图像到图像生成任务的基底图像
- 风格迁移任务的基底图像
- 任意你所需的用途!
## 统计数据
- **种族分布(按主导种族计数;部分面孔在多个类别中得分相近)**:
白人:94张
拉丁裔西班牙人:93张
亚洲人:90张
印度人:70张
黑人:68张
中东人:65张
- **肤色**:覆盖从极浅到极深的全光谱范围
- **面部特征**:包含丰富多样的眼型、鼻型与唇型
- **发型**:涵盖多种纹理、发色与造型
- **背景**:涵盖明暗各异、简洁与场景丰富的多种背景
- **年龄**:经DeepFace分析,几乎全部为30±5岁的成年人。
> 据观察,亚洲人、白人与黑人的单一高得分(85+)预测频率远高于拉丁裔西班牙人、印度人与中东人。
## 伦理考量与其他说明
- 种族/人种标签由自动化分析生成,不应作为涉及人类主体的真实世界应用中的真值(ground truth)。其核心用途是确保覆盖广泛的面部特征范围。
- 本数据集仅包含女性呈现特征的个体。未纳入男性呈现特征的个体,原因是胡须可能遮挡面部特征,且暂不确定分类器对这类遮挡的处理效果,因此选择规避该复杂度。
- 所有生成人脸均在提示词与描述文本中明确标注为女性呈现特征,但DeepFace偶尔会将部分图像判定为男性呈现特征。
- 该数据集存在较强的审美偏向,人脸对称性异于常规。
## 生成流程
1. **初始图像生成**:使用Qwen-Image(FP8精度)生成5500张768×768分辨率的初始图像。面部特征从预设列表中随机选取,再由Qwen3:30b-a3b转化为自然语言提示词。风格提示词为"使用130mm长焦镜头拍摄,低ISO,高快门速度"。
2. **初始分析与描述**:对5500张图像各使用JoyCaption-Beta-One生成3次描述,再由Qwen3:30b-a3b整合为最终初始描述。同时使用DeepFace进行人口统计分析。
3. **筛选**:基于聚合后的人口统计得分与人工视觉检查,筛选出均衡分布的480张图像子集。
4. **修正**:使用GIMP手动修正轻微瑕疵,如淡水印与伪影。
5. **放大与优化**:使用Qwen-Image-Edit-2509(BF16精度)将选中图像放大至2048×2048分辨率,共50步推理,CFG(Classifier-Free Guidance)值为4。提示词引导模型将风格转化为高质量3D渲染计算机生成肖像,同时保留原始人像特征与构图。
6. **最终描述生成**:为确保描述文本准确反映放大后的最终图像,并适配可能存在的轻微视角变化,对480张图像重新进行描述:各使用JoyCaption-Beta-One生成3次描述,再由GPT-OSS-120B整合为高质量的最终描述。
7. **最终分析**:使用DeepFace对最终图像进行分析,生成人口统计得分与数据集中的相似度分析卡片。
### 所用模型与工具
- **Qwen-Image**:图像生成
- **Qwen-Image-Edit-2509**:图像优化与放大
- **JoyCaption-Beta-One**:图像描述生成
- **Qwen3:30b-a3b**:提示词编写与初始描述整合
- **GPT-OSS-120B**:最终描述文本整合
- **工具**:vLLM、DeepFace、Python、R、GIMP、ComfyUI
## 已使用Syn-Vis-v0的项目
- 即将上线!
- 你的项目?
## 已知问题
- 001-0309:图像中人物似乎佩戴口罩,大概率是图像到图像放大步骤引入的。
## 许可协议
- **图像**:CC0(公共领域)—— 单张合成图像已发布至公共领域
- **数据集编译、元数据与文档**:CC-BY-SA-4.0—— 数据集的整理工作、分析内容与文档
您可将本数据集与图像用于任何用途,包括商业用途。若您使用本数据集,我们将感谢您的署名。
## 引用格式
bibtex
@dataset{syn-vis-v0-2025,
title={Syn-Vis-v0: 合成人脸数据集},
author={Wyss, Reto},
year={2025},
url={https://huggingface.co/datasets/retowyss/Syn-Vis-v0},
note={图像:CC0(公共领域);数据集编译与文档:CC-BY-SA-4.0}
}
提供机构:
maas
创建时间:
2025-10-24



