himanshu1257/industrial-defect-dataset
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/himanshu1257/industrial-defect-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- image-classification
tags:
- synthetic
- computer-vision
- manufacturing
- defect-detection
- stable-diffusion
- sdxl
size_categories:
- 1K<n<10K
---
# Synthetic Industrial Material Defect Dataset (10k)
## Dataset Summary
This dataset contains **5,000 highly detailed, synthetically generated images** of various industrial materials exhibiting different types of surface defects. It is designed to be used for training machine learning models in computer vision, specifically for quality control, manufacturing defect detection, and surface anomaly recognition.
All images were generated using **Stable Diffusion XL (SDXL)** to simulate macro photography under industrial lighting conditions, providing a diverse, noise-free, and highly controlled environment for ML training.
## Dataset Structure
The dataset follows a standard `ImageFolder` structure with an accompanying `metadata.csv` file, making it immediately ready for libraries like `datasets`, PyTorch, and TensorFlow.
### Classes (Defect Types)
The dataset is perfectly balanced across 5 primary categories (1,000 images per category):
1. `normal`: Perfect, flawless, pristine condition.
2. `scratch`: Deep scratches, surface gouges.
3. `crack`: Hairline cracks, fractured surfaces.
4. `stain`: Oil spills, dark stains, discoloration.
5. `dent`: Impact dents, warped surfaces.
### Material Types
To allow for multi-label classification or material-specific fine-tuning, each defect category contains images spanning 5 distinct material types (200 images per material/defect combination):
* Brushed Metal
* Ceramic Tile
* Industrial Fabric
* Concrete
* Polished Wood
## Data Fields
The included `metadata.csv` contains the following fields to assist with advanced filtering:
* **`file_name`**: The relative path to the image (e.g., `crack/brushed_metal_20260401_0.png`).
* **`label`**: The type of defect (Target variable for classification).
* **`material`**: The specific material featured in the image.
## Generation Details
The dataset was programmatically generated using `diffusers` and the SDXL pipeline.
* **Base Prompting:** "Macro photography of [Material] surface, [Defect Description], highly detailed, 8k resolution, close up industrial lighting"
* **Negative Prompting:** "blurry, out of focus, distorted, cartoon, 3d render, watermark"
* **Inference Steps:** 20
## Intended Uses
* **Image Classification:** Training CNNs (ResNet, EfficientNet) or Vision Transformers (ViT) to categorize material states.
* **Anomaly Detection:** Using the `normal` class to train autoencoders or one-class SVMs to detect out-of-distribution anomalous regions.
* **Synthetic Data Research:** Evaluating the efficacy of SDXL-generated data transferred to real-world industrial computer vision tasks.
提供机构:
himanshu1257



