sagecontinuum/smokedataset

Name: sagecontinuum/smokedataset
Creator: sagecontinuum
Published: 2023-09-11 20:57:58
License: 暂无描述

Hugging Face2023-09-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/sagecontinuum/smokedataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: image dtype: image - name: label dtype: class_label: names: '0': cloud '1': other '2': smoke splits: - name: train num_bytes: 85556006 num_examples: 14318 - name: validation num_bytes: 22137739 num_examples: 3671 - name: test num_bytes: 11026374 num_examples: 1843 download_size: 132474880 dataset_size: 118720119 tags: - climate task_categories: - image-classification task_ids: - multi-label-image-classification license: mit --- # COMPARING SIMPLE DEEP LEARNING MODELS TO A COMPLEX MODEL FOR SMOKE DETECTION - **Homepage:** [Sage Continuum](https://sagecontinuum.org/) - **Author:** Jakub Szumny, Math and Computer Science Division, University of Illinois at Urbana-Champaign - **Mentors:** Bhupendra Raut, Seongha Park - **Repository:** [GitHub Repository](https://github.com/waggle-sensor/summer2023/tree/main/szumny) # Motivation - Forest fires are a major problem, and have detrimental effects on the environment. Current solutions to detecting forest fires are not efficient enough, and other machine learning models have far too long computational speeds and poor accuracies. This study is a continuation of the work done by UCSD, and their SmokeyNet deep learning architecture for smoke detection. - My goal is to compare many different deep learning models, in order to find the best model for this issue, and to find if a simple model can compare to a complex model. The models which I compared are: VGG16, UCSD SmokeyNet, Resnet18, Resnet34, and Resnet50. # Major Accomplishments - Created a large dataset of 41,000 images, comprised of many different wildfire events from HPWREN. I split the images into 5 different classes: sky, ground, horizon, cloud, and smoke. - Tested in many different ways, and found that the best results are when the classes: sky, ground, and horizon, are grouped together as other, and smoke and cloud are left separate. The major issue with this, is that smoke and clouds often look very similar. - On my dataset, created with HPWREN images, each model performed rather well, having about the same accuracy at around 90%. - Found that the VGG16 model with 3 features (smoke, cloud, other), was the best performing model on the testing dataset from ARM, and all the other models performed quite poorly. - Must keep in mind that the burning event was not very obvious in the ARM testing data, but it won’t always be cut and clear, so it is a great test to see which model perform best with the least. - With a FPR of about 13%, a TPR of about 96%, a FNR of about 4%, and a TNR of about 88%, the VGG16 model had the best results, on the ARM Data. - Created a plugin application to be able to test and use my model and algorithm on wild sage nodes, taking images and detecting smoke in real time. # Impact - The impact my research has made, is having created a large dataset for future research, and for better model creation. - Found that a simple model is very accurate and can compare to a complex model. - An algorithm which can compute and classify an entire image in a very short period of time. - This research can greatly help the fight against forest fires, in order to at one point solve the problem of forest fires, by being able to attend to them before they get out of control. # Future Direction - More work is needed on creating a more efficient model. There may be a different model which can perform even better on detecting smoke. - It is helpful as a dataset is already created, and through my Github repository, anyone can replicate my work, and try to improve on it. - Need to explore more ways to augment the images, by scaling the contrast levels, etc, as I believe this would be a good way to separate smoke from cloud from other. # Citation Dewangan A, Pande Y, Braun H-W, Vernon F, Perez I, Altintas I, Cottrell GW, Nguyen MH. FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection. Remote Sensing. 2022; 14(4):1007. https://doi.org/10.3390/rs14041007

数据集信息：特征： - 名称：image 数据类型：图像 - 名称：label 数据类型：类别标签： '0': 云（cloud） '1': 其他（other） '2': 烟雾（smoke）数据划分： - 名称：训练集（train）字节数：85556006 样本数：14318 - 名称：验证集（validation）字节数：22137739 样本数：3671 - 名称：测试集（test）字节数：11026374 样本数：1843 下载大小：132474880 数据集总大小：118720119 标签： - 气候（climate）任务类别： - 图像分类（image-classification）任务子类别： - 多标签图像分类（multi-label-image-classification）许可证：MIT（mit） # 用于烟雾检测的简单深度学习模型与复杂模型对比研究 - **主页：** [Sage 连续平台（Sage Continuum）](https://sagecontinuum.org/) - **作者：** Jakub Szumny，伊利诺伊大学厄巴纳-香槟分校数学与计算机科学分部 - **指导教师：** Bhupendra Raut、Seongha Park - **代码仓库：** [GitHub 仓库](https://github.com/waggle-sensor/summer2023/tree/main/szumny) # 研究动机 - 森林火灾是全球性重大问题，对生态环境造成严重负面影响。当前主流的森林火灾检测方案效率低下，现有机器学习模型普遍存在计算速度缓慢、准确率不足的缺陷。本研究是加州大学圣地亚哥分校（UCSD）此前工作的延续，其提出的SmokeyNet深度学习架构专用于野火烟雾检测。 - 本研究的目标为对比多款不同的深度学习模型，以筛选出适配该任务的最优模型，并验证简单模型能否媲美复杂模型的性能。本次对比的模型包括：VGG16、UCSD SmokeyNet、ResNet18、ResNet34以及ResNet50。 # 主要研究成果 - 构建了包含41000张图像的大型数据集，数据源自HPWREN的多起野火事件。本研究将图像划分为5个类别：天空、地面、地平线、云与烟雾。 - 开展多组对比测试后发现，将天空、地面与地平线合并为“其他”类别，仅保留烟雾与云作为独立类别时，模型综合表现最佳。该方案的核心难点在于烟雾与云的视觉特征高度相似，极易混淆。 - 在基于HPWREN图像构建的自有数据集上，所有测试模型均表现优异，准确率均维持在90%左右。 - 在ARM测试集上，采用3分类（烟雾、云、其他）设置的VGG16模型取得了最优性能，其余模型表现均欠佳。 - 需注意的是，ARM测试集中的燃烧事件特征并不显著，但实际野火场景往往并非一目了然，因此该测试集可作为检验模型低辨识度场景下性能的可靠基准。 - 在ARM数据集上，VGG16模型的假阳性率（FPR）约为13%，真阳性率（TPR）约为96%，假阴性率（FNR）约为4%，真阴性率（TNR）约为88%，整体性能最优。 - 开发了一款插件应用，可在野外Sage节点上部署并测试本研究的模型与算法，实现实时图像采集与烟雾检测。 # 研究影响 - 本研究构建的大型数据集可为后续相关研究与更优模型的开发提供重要支撑。 - 验证了简单深度学习模型同样具备较高的分类准确率，可与复杂模型相媲美。 - 开发了可在极短时间内完成整幅图像计算与分类的高效算法。 - 本研究可极大助力森林火灾防控工作，通过在火势失控前及时响应，最终为彻底解决森林火灾问题奠定基础。 # 未来研究方向 - 仍需进一步研发更高效的烟雾检测模型，或许存在其他更适配该任务的深度学习架构。 - 本研究已完成数据集构建，且通过本人的GitHub仓库，任何研究者均可复现本研究并尝试进一步优化模型性能。 - 需要探索更多图像增强方法，例如调整对比度、缩放图像等，笔者认为该类方法可有效提升烟雾、云与其他场景的区分度。 # 引用文献 Dewangan A, Pande Y, Braun H-W, Vernon F, Perez I, Altintas I, Cottrell GW, Nguyen MH. 《FIgLib 与 SmokeyNet：用于实时野火烟雾检测的数据集与深度学习模型》. 遥感. 2022; 14(4):1007. https://doi.org/10.3390/rs14041007

提供机构：

sagecontinuum

原始信息汇总

数据集概述

数据集信息

特征:
- image: 图像数据
- label: 标签数据，包含以下类别:
  - 0: cloud
  - 1: other
  - 2: smoke

数据集划分

训练集:
- 字节数: 85556006
- 样本数: 14318
验证集:
- 字节数: 22137739
- 样本数: 3671
测试集:
- 字节数: 11026374
- 样本数: 1843

数据集大小

下载大小: 132474880
数据集大小: 118720119

任务类别

image-classification

任务ID

multi-label-image-classification

许可证

mit

搜集汇总

数据集介绍

构建方式

SageContinuum的smokedataset数据集，是由Jakub Szumny构建的，旨在服务于森林火灾检测的研究。该数据集包含了从HPWREN收集的41,000张图像，涵盖野火事件的不同场景。图像被分为五个类别：天空、地面、地平线、云和烟雾。构建过程中，将天空、地面和地平线合并为‘其他’类别，而烟雾和云保持独立分类，以应对二者外观高度相似的问题。

使用方法

使用该数据集，研究者可以通过HuggingFace提供的接口轻松加载和利用数据。数据集的构建使其适用于多种深度学习模型，如VGG16、UCSD SmokeyNet、Resnet18等。用户可以参照数据集的README文件和相关的GitHub仓库，复制研究者的工作流程，进一步探索和改进烟雾检测模型。此外，数据集的开放许可允许用户自由使用和分发，促进了学术研究的共享与合作。

背景与挑战

背景概述

在应对森林火灾这一重大环境问题的挑战中，研究者Jakub Szumny在University of Illinois at Urbana-Champaign的Math and Computer Science Division，延续UCSD的研究工作，创建了sagecontinuum/smokedataset数据集。该数据集包含41,000张图像，旨在通过深度学习模型提高森林火灾烟雾检测的效率和准确性。此数据集的构建不仅为研究者提供了一个丰富的实验资源，也为烟雾检测领域贡献了重要的基准数据，对于推动相关技术的发展具有重要意义。

当前挑战

sagecontinuum/smokedataset数据集在构建和应用过程中面临的挑战主要包括：一是烟雾与云在视觉上的高度相似性，使得模型区分两者存在困难；二是模型在处理不明显燃烧事件的图像时性能可能下降。此外，尽管VGG16模型在测试数据上表现最佳，但如何在不同的环境和条件下保持高准确率和低误报率，仍是一个待解决的问题。未来，数据增强方法的探索以及更高效模型的开发将是克服这些挑战的关键。

常用场景

经典使用场景

在应对森林火灾这一全球性环境难题的背景下，sagecontinuum/smokedataset数据集应运而生。该数据集最经典的使用场景在于，通过其提供的图像和标签，研究者能够训练和评估深度学习模型在烟雾检测方面的性能，进而为森林火灾的早期发现提供技术支持。

解决学术问题

该数据集解决了深度学习模型在烟雾检测中准确度不足、计算速度过慢的学术问题。通过对比不同复杂度的模型，如VGG16和UCSD SmokeyNet，研究者发现即使是简单模型也能达到令人满意的准确度，这对于降低模型复杂性和提高计算效率具有重要意义。

实际应用

实际应用中，sagecontinuum/smokedataset数据集已被用于开发能够在短时间内计算并分类整个图像的算法，这一算法能够实时检测烟雾，对于森林火灾的防控具有显著的实际价值。

数据集最近研究