five

himanshu1257/industrial-defect-dataset

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/himanshu1257/industrial-defect-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-classification tags: - synthetic - computer-vision - manufacturing - defect-detection - stable-diffusion - sdxl size_categories: - 1K<n<10K --- # Synthetic Industrial Material Defect Dataset (10k) ## Dataset Summary This dataset contains **5,000 highly detailed, synthetically generated images** of various industrial materials exhibiting different types of surface defects. It is designed to be used for training machine learning models in computer vision, specifically for quality control, manufacturing defect detection, and surface anomaly recognition. All images were generated using **Stable Diffusion XL (SDXL)** to simulate macro photography under industrial lighting conditions, providing a diverse, noise-free, and highly controlled environment for ML training. ## Dataset Structure The dataset follows a standard `ImageFolder` structure with an accompanying `metadata.csv` file, making it immediately ready for libraries like `datasets`, PyTorch, and TensorFlow. ### Classes (Defect Types) The dataset is perfectly balanced across 5 primary categories (1,000 images per category): 1. `normal`: Perfect, flawless, pristine condition. 2. `scratch`: Deep scratches, surface gouges. 3. `crack`: Hairline cracks, fractured surfaces. 4. `stain`: Oil spills, dark stains, discoloration. 5. `dent`: Impact dents, warped surfaces. ### Material Types To allow for multi-label classification or material-specific fine-tuning, each defect category contains images spanning 5 distinct material types (200 images per material/defect combination): * Brushed Metal * Ceramic Tile * Industrial Fabric * Concrete * Polished Wood ## Data Fields The included `metadata.csv` contains the following fields to assist with advanced filtering: * **`file_name`**: The relative path to the image (e.g., `crack/brushed_metal_20260401_0.png`). * **`label`**: The type of defect (Target variable for classification). * **`material`**: The specific material featured in the image. ## Generation Details The dataset was programmatically generated using `diffusers` and the SDXL pipeline. * **Base Prompting:** "Macro photography of [Material] surface, [Defect Description], highly detailed, 8k resolution, close up industrial lighting" * **Negative Prompting:** "blurry, out of focus, distorted, cartoon, 3d render, watermark" * **Inference Steps:** 20 ## Intended Uses * **Image Classification:** Training CNNs (ResNet, EfficientNet) or Vision Transformers (ViT) to categorize material states. * **Anomaly Detection:** Using the `normal` class to train autoencoders or one-class SVMs to detect out-of-distribution anomalous regions. * **Synthetic Data Research:** Evaluating the efficacy of SDXL-generated data transferred to real-world industrial computer vision tasks.
提供机构:
himanshu1257
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作