Aluminum alloy industrial materials defect

Name: Aluminum alloy industrial materials defect
Creator: figshare
Published: 2025-06-01 05:18:12
License: 暂无描述

DataCite Commons2025-06-01 更新2025-01-06 收录

下载链接：

https://figshare.com/articles/dataset/Aluminum_alloy_industrial_materials_defect/27922929/3

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200，patience: 50，batch: 16，imgsz: 640，pretrained: true，optimizer: SGD，close_mosaic: 10，iou: 0.7，momentum: 0.937，weight_decay: 0.0005，box: 7.5，cls: 0.5，dfl: 1.5，pose: 12.0，kobj: 1.0，save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.

本研究实验所使用的数据集，源自由天池飞阅云（Tianchi Feiyue Cloud）主办的2018年广东工业智能制造大数据智能算法大赛初赛数据集（https://tianchi.aliyun.com/competition/entrance/231682/introduction）。我们对该数据集进行了筛选，剔除了不符合本实验要求的图像。所有数据集均已完成训练与测试分类。图像分辨率统一为2560×1960。训练前，需使用labelimg工具对所有缺陷进行标注并保存为JSON文件，随后将所有JSON文件转换为TXT文件。最终，对整理完成的缺陷数据集开展检测与分类任务。 数据与文件结构说明 本项目基于YOLOv8增强算法，用于铝制品缺陷分类与检测任务。所有代码均已在搭载Anaconda与支持CUDA的GPU的Windows计算机上完成测试。以下操作指南可帮助用户在已部署Windows+CUDA GPU的系统中运行本仓库代码。 文件与变量 文件：defeat_dataset.zip 描述： 设置步骤 请遵循以下流程完成项目搭建： 下载项目仓库 从以下地址下载项目仓库压缩包defeat_dataset.zip。解压后进入项目文件夹，其中应包含子文件夹quexian_dataset。 下载数据 1. 下载数据压缩包defeat_dataset.zip。 2. 解压已下载的数据，并将‘defeat_dataset’文件夹移动至项目主文件夹内。 3. 确认你的defeat_dataset文件夹当前包含子文件夹quexian_dataset。 4. 在该文件夹内，你将找到各类子文件夹，例如addquexian-13、quexian_dataset、new_dataset-13等。 软件与环境配置 Python环境搭建 1. 下载并安装Anaconda。 2. 完成Anaconda安装后，激活Anaconda提示符。Windows系统下可点击开始菜单，搜索“Anaconda Prompt”并打开。 3. 创建基于Python 3.8的Conda环境，可自行命名，例如执行以下命令：`conda create -n yolov8 python=3.8` 4. 激活已创建的环境，若环境名为yolov8，则执行：`conda activate yolov8` 下载并安装Visual Studio Code。根据你的系统安装PyTorch：对于搭载CUDA GPU的Windows/Linux用户，执行命令：`conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge` 安装必要依赖库：安装scikit-learn：`conda install anaconda scikit-learn=0.24.1` 安装astropy：`conda install astropy=4.2.1` 安装pandas：`conda install anaconda pandas=1.2.4` 安装Matplotlib：`conda install conda-forge matplotlib=3.5.3` 安装scipy：`conda install scipy=1.10.1` 可复现性说明 PyTorch领域存在公认的事实：不同PyTorch版本、提交版本或不同平台之间，无法保证结果完全可复现。此外，即使使用相同的随机种子，CPU与GPU运行的结果也可能无法复现。本分析笔记中所有仅涉及模型评估的结果均具备完全可复现性。但当涉及GPU上的模型更新时，不同机器上的模型训练结果会存在差异。 数据获取途径 该数据的其他公开获取地址：https://tianchi.aliyun.com/dataset/public 数据源自以下来源：https://tianchi.aliyun.com/dataset/140666 数据可用性声明 本研究使用的10个数据集源自广东工业智慧大数据创新大赛——智能算法大赛复赛，数据集下载链接为https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us。官方提供的4356张图像包含单缺陷图像、多缺陷图像与无缺陷图像。我们仅选取其中的单缺陷与多缺陷图像，共计3233张。本次实验涉及的10类缺陷分别为：非导电（non-conductive）、污渍（effacement）、底角缺失（miss bottom corner）、泛橙（orange）、起皮（peel）、杂色（varicolored）、喷流痕迹（jet）、漆泡（lacquer bubble）、凹坑（jump into a pit）、漏底（divulge the bottom）与斑点（blotch）。每张图像包含一种或多种缺陷，缺陷图像的分辨率统一为2560×1920。通过调研相关文献，我们发现多数实验均采用这10类缺陷。因此我们额外选取了3类与上述10类缺陷差异显著且样本量充足的缺陷类型用于本实验。新增的3类数据集同样源自广东工业智慧大数据智能算法大赛初赛数据集，可从https://tianchi.aliyun.com/dataset/140666下载，共计3000张图像，其中瘀伤（bruise）、伪装瑕疵（camouflage）与涂层开裂（coating cracking）三类缺陷的样本量分别为109、73与43张。最终，我们将复赛的10类缺陷与初赛选取的3类缺陷融合为新的数据集，并基于该数据集开展本研究。在数据集处理阶段，我们尝试了多种划分比例，例如8:2、7:3、7:2:1等。经测试发现，不同划分比例对实验结果的影响较小。因此我们按照7:2:1的比例划分数据集：训练集占70%，验证集占20%，测试集占10%。同时，我们将随机数种子设置为0，以确保每次模型训练得到的结果一致。最后，我们对实验得到的平均精度均值（mean Average Precision, mAP）指标在该数据集上进行了3次重复测试。每次测试结果的差异极小，但为保证实验结果的准确性，我们取最高值与最低值的平均值作为最终结果：最高值为71.5%，最低值为71.1%，最终实验的平均检测精度为71.3%。本研究使用的所有数据与图像均来自公开来源，且原创作者已同意将这些材料以开放获取的形式发布。其他参数设置如下：训练轮次（epochs）：200，早停耐心值（patience）：50，批次大小（batch）：16，输入图像尺寸（imgsz）：640，预训练权重（pretrained）：true，优化器（optimizer）：SGD，关闭Mosaic数据增强的轮次（close_mosaic）：10，交并比阈值（iou）：0.7，动量（momentum）：0.937，权重衰减系数（weight_decay）：0.0005，边界框损失权重（box）：7.5，分类损失权重（cls）：0.5，分布焦点损失权重（dfl）：1.5，姿态估计损失权重（pose）：12.0，目标损失权重（kobj）：1.0，结果保存目录（save_dir）：runs/train 本稿件的支持信息部分提及了defeat_dataset.zip（原文存在拼写笔误，应为defect_dataset.zip）。原始数据存储于Figshare，DOI：10.6084/m9.figshare.27922929。系统中的results_images.zip包含实验结果图表。系统中的images_1.zip与images_2.zip包含生成稿件manuscript.tex所需的全部图像。

提供机构：

figshare

创建时间：

2024-12-03

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集