vThejas/FineBLEND

Name: vThejas/FineBLEND
Creator: vThejas
Published: 2026-04-09 21:31:32
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/vThejas/FineBLEND

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - image-to-text - text-to-image language: - en size_categories: - 1K<n<10K tags: - synthetic - blender - path-tracing - image-caption - diffusion configs: - config_name: default data_files: - split: train path: train/** - split: validation path: val/** - split: test path: test/** --- # FineBLEND A curated dataset of **7,500 path-traced image-caption pairs** rendered from 8 diverse Blender 3D scenes using the BlendFusion pipeline. ## Overview FineBLEND is constructed by rendering object-centric views from open-source 3D scenes using Blender's Cycles path tracer at 256x256 resolution. The pipeline applies multi-stage filtering (heuristic + VLM-based) and diversity-aware sampling to produce high-quality, visually diverse image-caption pairs suitable for training or evaluating diffusion models. **Scenes**: Barbershop, Bistro Interior/Exterior, Classroom, Emerald Square, Pavilion, Sun Temple, City Scene ## Dataset Structure | Split | Images | |-------|--------| | train | 4,500 | | val | 1,500 | | test | 1,500 | ### Columns | Column | Description | |--------|-------------| | `file_name` | PNG image filename | | `caption` | VLM-generated description (Qwen3-VL-8B-Instruct) | | `clip_score` | CLIP image-text alignment score | | `aesthetic_score` | LAION aesthetic predictor score | | `mean_brightness` | Mean pixel intensity (0-255) | | `pixel_variance` | Grayscale pixel variance | | `dark_fraction` | Fraction of dark pixels | ## Quality Metrics | Metric | Value | |--------|-------| | Mean CLIPScore | 25.91 +/- 3.37 | | Mean Aesthetic Score | 4.52 +/- 0.86 | ## Pipeline 1. **Object-centric camera placement** - cameras orbit each mesh object at 8 azimuths, fixed elevation, adaptive distance for consistent framing 2. **Heuristic filtering** - removes zero-fill, low-brightness, low-variance, and high-dark-fraction renders 3. **VLM-based filtering** - Qwen3-VL-8B-Instruct rejects uncaptionable images (extreme close-ups, truncations, ambiguous content) 4. **Caption generation** - factual, grounded descriptions from the same VLM 5. **Quality filtering** - CLIPScore and aesthetic score thresholds 6. **Diversity-aware sampling** - embedding-space deduplication to maximize visual diversity ## Citation If you use this dataset, please cite the BlendFusion paper.

许可证：CC BY 4.0 任务类别： - 图像到文本（image-to-text） - 文本到图像（text-to-image）语言： - 英语（en）数据规模类别： - 1000 < 数据量 < 10000 标签： - 合成（synthetic） - Blender（blender） - 路径追踪（path-tracing） - 图像字幕（image-caption） - 扩散（diffusion）配置项： - 配置名称：默认（default）数据文件： - 划分：训练集（train）路径：train/** - 划分：验证集（validation）路径：val/** - 划分：测试集（test）路径：test/** # FineBLEND 本数据集为精选的7500条路径追踪图像-字幕对（path-traced image-caption pairs），通过BlendFusion流程，基于8个多样化的Blender（Blender）3D场景渲染生成。 ## 概述 FineBLEND通过使用Blender的Cycles路径追踪器（Cycles path tracer），对开源3D场景渲染以物体为中心的视角，生成分辨率为256×256的图像。该流程采用多阶段过滤（启发式过滤+基于视觉语言模型（Visual Language Model, VLM）的过滤）与多样性感知采样，生成高质量、视觉多样性丰富的图像-字幕对，适用于扩散模型（diffusion models）的训练与评估。 **场景涵盖**：理发店、小酒馆室内/室外、教室、翡翠广场、亭阁、太阳神庙、城市场景。 ## 数据集结构 | 数据集划分 | 样本数量 | |-------|--------| | 训练集（train） | 4500 | | 验证集（val） | 1500 | | 测试集（test） | 1500 | ### 数据字段 | 字段名 | 字段说明 | |--------|-------------| | `file_name` | PNG格式图像文件名 | | `caption` | 由视觉语言模型生成的图像描述（基于Qwen3-VL-8B-Instruct） | | `clip_score` | CLIP图像-文本对齐分数（CLIP） | | `aesthetic_score` | LAION美学预测器评分（LAION） | | `mean_brightness` | 平均像素亮度（取值范围0~255） | | `pixel_variance` | 灰度像素方差 | | `dark_fraction` | 暗像素占比 | ## 质量指标 | 指标 | 数值 | |--------|-------| | 平均CLIPScore | 25.91 ± 3.37 | | 平均美学评分 | 4.52 ± 0.86 | ## 构建流程 1. **以物体为中心的相机布局**：相机围绕每个网格物体在8个方位角上环绕拍摄，采用固定仰角与自适应距离，确保构图一致性。 2. **启发式过滤**：移除零填充、低亮度、低方差以及高暗像素占比的渲染结果。 3. **基于视觉语言模型的过滤**：使用Qwen3-VL-8B-Instruct过滤无法生成有效字幕的图像（如极端特写、画面截断、内容模糊的图像）。 4. **字幕生成**：通过同一视觉语言模型生成符合事实、基于图像内容的描述文本。 5. **质量过滤**：基于CLIPScore与美学评分阈值进行筛选。 6. **多样性感知采样**：通过嵌入空间去重，最大化视觉多样性。 ## 引用说明若您使用本数据集，请引用BlendFusion相关论文。

提供机构：

vThejas

5,000+

优质数据集

54 个

任务类型

进入经典数据集