five

spectrum

收藏
魔搭社区2025-12-04 更新2025-06-28 收录
下载链接:
https://modelscope.cn/datasets/davidjam/spectrum
下载链接
链接失效反馈
官方服务:
资源简介:
# Spectrum Archive - 基于视觉Transformer的高光谱图像分类 一个利用视觉Transformer(ViT)特征提取和自定义Spectral Transformer模型进行高光谱图像分类的深度学习项目。 ## 📋 项目概述 本项目实现了一个用于将高光谱图像分为两类(Type 1 和 Type 2)的机器学习流水线。系统使用ViT对高光谱数据进行特征提取,并通过自定义的Spectral Transformer模型进行分类。 ### 主要特性 - **高光谱数据处理**:支持处理具有128个光谱波段的`.float`和`.hdr`文件 - **视觉Transformer特征提取**:使用预训练的ViT-B/16对每个光谱波段提取特征 - **自定义Spectral Transformer**:专为光谱数据分类设计的基于Transformer的架构 - **完整训练流水线**:包含训练、验证、测试和可视化工具 - **数据管理**:自动化数据组织与元数据生成 ## 🏗️ 项目结构 ``` spectrum_archive/ ├── train.py # 主训练脚本 ├── models.py # SpectralTransformer模型定义 ├── utils.py # 训练工具与辅助函数 ├── data_get.ipynb # 数据预处理与组织 ├── feature_extract_vit.ipynb # ViT特征提取流水线 ├── tech_pipeline.png # 技术流程可视化 ├── .gitignore # Git忽略配置 └── README.md # 本文件 ``` ## 🔧 技术架构 ### 数据流水线 1. **数据组织**([`data_get.ipynb`](data_get.ipynb)): - 将高光谱数据文件整理到`type1`和`type2`目录 - 生成包含文件路径和类型标签的元数据CSV - 验证所有样本的数据一致性 2. **特征提取**([`feature_extract_vit.ipynb`](feature_extract_vit.ipynb)): - 处理高光谱`.float`文件(520×696×128维) - 使用预训练ViT-B/16对每个光谱波段提取特征 - 每个波段生成768维特征向量 - 特征以pickle文件保存,便于高效加载 ### 模型架构 **多模态SpectralTransformer**([`models.py`](models.py:5)): - **灵活输入**:可根据`model_mode`选择高光谱特征、PNG特征或两者 - **输入**: - **`multimodal`模式**: - 高光谱特征:128波段 × 768特征(ViT提取) - PNG特征:768特征(RGB图像ViT提取) - **`vit_only`模式**: - 高光谱特征:128波段 × 768特征 - **`png_only`模式**: - PNG特征:768特征 - **架构**: - **公共分支(`multimodal`和`vit_only`)**: - 高光谱分支:输入投影 → 位置编码 → Transformer编码器 - **PNG特征处理(`multimodal`)**: - PNG分支:多层感知机+归一化 - **独立PNG处理(`png_only`)**: - 简单MLP直接处理PNG特征进行分类(不经过Transformer) - **融合方式(`multimodal`)**: - **早期融合**:PNG特征广播后加到每个光谱波段 - **晚期融合**:各自处理后特征拼接 - **注意力融合**:PNG特征作为查询,高光谱特征为键值做多头注意力 - **输出**:二分类(Type 1 vs Type 2) ### 训练流水线 **训练脚本**([`train.py`](train.py)): - **数据加载**:自定义数据集类,支持加载高光谱ViT特征和PNG特征 - **多模态训练**:两类特征输入模型,进行融合学习 - **可配置融合**:支持不同融合策略(early/late/attention) - **数据划分**:70%训练,20%验证,10%测试 - **优化器**:AdamW,学习率5e-4 - **训练特性**: - 早停机制 - 可选学习率调度 - 全面指标追踪 - 最优模型保存 **工具函数**([`utils.py`](utils.py)): - [`train_model()`](utils.py:10):完整训练循环,含验证 - [`test_model()`](utils.py:169):模型评估与指标输出 - [`plot_training_history()`](utils.py:232):训练过程可视化 ## 🚀 快速开始 ### 依赖安装 ```bash pip install torch torchvision pip install pandas numpy matplotlib pip install scikit-learn tqdm pip install opencv-python pillow ``` 如果运行时报错缺少什么依赖,那么也进行pip install即可。 ### 使用方法 1. **数据准备**: ```bash # 运行数据组织notebook jupyter notebook data_get.ipynb ``` 2. **特征提取**: ```bash # 对高光谱数据和PNG图像提取ViT特征 jupyter notebook feature_extract_vit.ipynb ``` 3. **模型训练**: 可通过命令行参数控制训练模式和融合方式。 **多模态训练(默认)**: ```bash python train.py --mode multimodal --fusion_method late # 其他融合方式:--fusion_method early, --fusion_method attention ``` **单模态训练**: **仅ViT特征**: ```bash python train.py --mode vit_only ``` **仅PNG特征**: ```bash python train.py --mode png_only ``` **其他参数**: 还可指定`batch_size`、`num_epochs`、`lr`、`weight_decay`、`patience`等: ```bash python train.py --mode multimodal --batch_size 64 --num_epochs 200 --lr 1e-3 ``` **模型对比脚本**: 可用shell脚本快速对比不同模式和融合策略: ```bash bash compare_models.sh ``` 该脚本会依次运行多模态(late、early、attention融合)、vit_only、png_only模式的训练,所有结果均记录到TensorBoard,便于对比。 **K折交叉验证**: 可用K折交叉验证脚本评估模型性能与过拟合情况: ```bash # 多模态K折验证 python kfold_validation.py --mode multimodal --fusion_method late --k_folds 5 --num_epochs 20 # 单模态K折验证 python kfold_validation.py --mode vit_only --k_folds 5 --num_epochs 20 python kfold_validation.py --mode png_only --k_folds 5 --num_epochs 20 ``` 该脚本支持: - 训练-验证准确率差异分析 - 各折方差评估 - 自动过拟合检测与分级 - 详细结果保存为CSV和汇总文件 **全自动K折评估**: 可用shell脚本自动运行所有模式和融合方式的K折验证: ```bash bash run_all_kfold_experiments.sh ``` 该脚本会依次执行: - 多模态(late、early、attention融合) - 仅VIT - 仅PNG 所有结果均保存为独立CSV/TXT文件和TensorBoard日志,便于全面对比。 ### TensorBoard日志 训练指标(损失、准确率)自动记录到TensorBoard,最优模型也会保存在对应日志目录。 查看TensorBoard日志: ```bash tensorboard --logdir runs ``` 然后在浏览器访问 `http://localhost:6006`。 ### 数据格式 项目期望高光谱数据格式如下: - **`.float`文件**:原始高光谱数据(520×696×128) - **`.hdr`文件**:包含波长信息的头文件 - **`.png`文件**:高光谱数据的RGB可视化 数据结构示例: ``` new_data/ ├── type1/ │ ├── a1.float │ ├── a1.hdr │ ├── a1.png │ └── ... ├── type2/ │ ├── h1.float │ ├── h1.hdr │ ├── h1.png │ └── ... ├── vit_features/ │ ├── a1_features.pkl │ └── ... ├── png_features/ │ ├── a1_png_features.pkl │ └── ... └── metadata.csv ``` ### 元数据格式 `metadata.csv` 包含以下列: - `type`:样本类别(type1/type2) - `float_data路径`:高光谱.float文件路径 - `hdr_data路径`:头文件路径 - `png_data路径`:RGB可视化.png文件路径 - `feature_path`:高光谱ViT特征路径(128×768) - `png_feature_path`:PNG ViT特征路径(1×768) ## 📊 模型性能 训练流水线提供全面评估指标: - **训练/验证损失与准确率曲线** - **混淆矩阵**(分类性能) - **分类报告**(精度、召回率、F1分数) - **早停机制**(防止过拟合) ## 🔬 技术细节 ### 高光谱数据处理 - **尺寸**:520×696空间 × 128光谱波段 - **数据类型**:32位float - **头偏移**:加载时跳过32,768字节 - **波长范围**:从.hdr文件提取 ### 特征提取策略 - **ViT模型**:torchvision预训练ViT-B/16 - **高光谱特征**:每个波段作为RGB图像处理 → 128×768特征 - **PNG特征**:RGB可视化直接处理 → 1×768特征 - **特征维度**: - 高光谱:每波段768,总128×768 - PNG:每图像768 - **归一化**:两类特征均用ImageNet统计量归一化 ### 多模态数据处理 项目支持多种输入配置: - **多模态**:融合高光谱和PNG特征 - **`vit_only`**:仅用ViT提取的高光谱特征 - **`png_only`**:仅用ViT提取的PNG特征 详细融合策略见“多模态SpectralTransformer”章节。 ### 模型超参数 - **输入维度**:768(ViT特征) - **模型维度**:256 - **注意力头数**:4 - **编码器层数**:2 - **前馈维度**:1024 - **批量大小**:32 - **学习率**:5e-4 - **权重衰减**:1e-4 ## 📁 文件说明 | 文件 | 说明 | |------|------| | [`train.py`](train.py) | 主训练脚本,含数据加载与模型训练 | | [`models.py`](models.py) | SpectralTransformer模型架构 | | [`utils.py`](utils.py) | 训练工具、评估与可视化函数 | | [`data_get.ipynb`](data_get.ipynb) | 数据预处理与元数据生成 | | [`feature_extract_vit.ipynb`](feature_extract_vit.ipynb) | ViT特征提取流水线 | | [`tech_pipeline.png`](tech_pipeline.png) | 技术流程可视化图 | ## 🎯 主要创新点 1. **光谱感知架构**:为高光谱数据定制的Transformer 2. **多尺度特征提取**:ViT特征捕捉每个波段的空间模式 3. **高效处理**:预提取特征加速训练 4. **全面评估**:详细指标与可视化,便于模型分析 ## 📈 未来展望 - 支持多类别分类 - 集成空间-光谱注意力机制 - 实时推理流水线 - 高级数据增强 - 模型集成方法 ## 🤝 贡献说明 本项目展示了高光谱图像分类的完整流水线,模块化设计便于扩展和修改各个组件。 ## 📄 许可 本项目属于高光谱图像分析与分类的科研计划。

# Spectrum Archive - Hyperspectral Image Classification based on Vision Transformer A deep learning project that utilizes Vision Transformer (ViT) feature extraction and a custom Spectral Transformer model for hyperspectral image classification. ## 📋 Project Overview This project implements a machine learning pipeline for classifying hyperspectral images into two categories: Type 1 and Type 2. The system uses ViT to extract features from hyperspectral data and a custom Spectral Transformer model for classification. ### Key Features - **Hyperspectral Data Processing**: Supports processing `.float` and `.hdr` files with 128 spectral bands - **Vision Transformer Feature Extraction**: Uses pre-trained ViT-B/16 to extract features from each spectral band - **Custom Spectral Transformer**: A Transformer-based architecture specifically designed for spectral data classification - **Complete Training Pipeline**: Includes training, validation, testing, and visualization tools - **Data Management**: Automated data organization and metadata generation ## 🏗️ Project Structure spectrum_archive/ ├── train.py # Main training script ├── models.py # SpectralTransformer model definition ├── utils.py # Training utilities and helper functions ├── data_get.ipynb # Data preprocessing and organization ├── feature_extract_vit.ipynb # ViT feature extraction pipeline ├── tech_pipeline.png # Technical workflow visualization ├── .gitignore # Git ignore configuration └── README.md # This file ## 🔧 Technical Architecture ### Data Pipeline 1. **Data Organization** ([`data_get.ipynb`](data_get.ipynb)): - Organize hyperspectral data files into `type1` and `type2` directories - Generate metadata CSV containing file paths and category labels - Validate data consistency across all samples 2. **Feature Extraction** ([`feature_extract_vit.ipynb`](feature_extract_vit.ipynb)): - Process hyperspectral `.float` files (520×696×128 dimensions) - Use pre-trained ViT-B/16 to extract features from each spectral band - Generate 768-dimensional feature vectors for each band - Save features as pickle files for efficient loading ### Model Architecture **Multimodal Spectral Transformer** ([`models.py`](models.py:5)): - **Flexible Input**: Supports selecting hyperspectral features, PNG features, or both via `model_mode` - **Inputs**: - **`multimodal` mode**: - Hyperspectral features: 128 bands × 768 features (extracted by ViT) - PNG features: 768 features (extracted from RGB images via ViT) - **`vit_only` mode**: - Hyperspectral features: 128 bands × 768 features - **`png_only` mode**: - PNG features: 768 features - **Architecture**: - **Common Branch (for `multimodal` and `vit_only`)**: - Hyperspectral branch: Input projection → Positional encoding → Transformer encoder - **PNG Feature Processing (for `multimodal`)**: - PNG branch: Multi-layer perceptron + normalization - **Standalone PNG Processing (for `png_only`)**: - Simple MLP directly processes PNG features for classification (without Transformer) - **Fusion Methods (for `multimodal`)**: - **Early Fusion**: Broadcast PNG features and add to each spectral band - **Late Fusion**: Concatenate features after separate processing - **Attention Fusion**: Use PNG features as queries, hyperspectral features as keys/values for multi-head attention - **Output**: Binary classification (Type 1 vs Type 2) ### Training Pipeline **Training Script** ([`train.py`](train.py)): - **Data Loading**: Custom dataset class supporting loading of hyperspectral ViT features and PNG features - **Multimodal Training**: Train the model with both feature types for fusion learning - **Configurable Fusion**: Supports different fusion strategies (early/late/attention) - **Data Split**: 70% training, 20% validation, 10% testing - **Optimizer**: AdamW, learning rate 5e-4 - **Training Features**: - Early stopping mechanism - Optional learning rate scheduling - Comprehensive metric tracking - Optimal model saving **Utility Functions** ([`utils.py`](utils.py)): - [`train_model()`](utils.py:10): Full training loop with validation - [`test_model()`](utils.py:169): Model evaluation and metric output - [`plot_training_history()`](utils.py:232): Training process visualization ## 🚀 Quick Start ### Dependency Installation bash pip install torch torchvision pip install pandas numpy matplotlib pip install scikit-learn tqdm pip install opencv-python pillow If you encounter missing dependency errors during runtime, install the corresponding package via pip. ### Usage 1. **Data Preparation**: bash # Run the data organization notebook jupyter notebook data_get.ipynb 2. **Feature Extraction**: bash # Extract ViT features from hyperspectral data and PNG images jupyter notebook feature_extract_vit.ipynb 3. **Model Training**: Control training mode and fusion method via command-line arguments. **Multimodal Training (Default)**: bash python train.py --mode multimodal --fusion_method late # Other fusion methods: --fusion_method early, --fusion_method attention **Single-modal Training**: **ViT-only Features**: bash python train.py --mode vit_only **PNG-only Features**: bash python train.py --mode png_only **Other Parameters**: You can also specify `batch_size`, `num_epochs`, `lr`, `weight_decay`, `patience`, etc.: bash python train.py --mode multimodal --batch_size 64 --num_epochs 200 --lr 1e-3 **Model Comparison Script**: Use the shell script to quickly compare different modes and fusion strategies: bash bash compare_models.sh This script will sequentially run training for multimodal (late, early, attention fusion), vit_only, and png_only modes. All results are logged to TensorBoard for easy comparison. **K-fold Cross Validation**: Use the K-fold cross validation script to evaluate model performance and overfitting: bash # Multimodal K-fold validation python kfold_validation.py --mode multimodal --fusion_method late --k_folds 5 --num_epochs 20 # Single-modal K-fold validation python kfold_validation.py --mode vit_only --k_folds 5 --num_epochs 20 python kfold_validation.py --mode png_only --k_folds 5 --num_epochs 20 This script supports: - Training-validation accuracy difference analysis - Variance evaluation across folds - Automatic overfitting detection and grading - Detailed results saved as CSV and summary files **Fully Automated K-fold Evaluation**: Use the shell script to automatically run K-fold validation for all modes and fusion strategies: bash bash run_all_kfold_experiments.sh This script will sequentially execute: - Multimodal (late, early, attention fusion) - ViT-only - PNG-only All results are saved as separate CSV/TXT files and TensorBoard logs for comprehensive comparison. ### TensorBoard Logging Training metrics (loss, accuracy) are automatically logged to TensorBoard, and the optimal model is saved in the corresponding log directory. To view TensorBoard logs: bash tensorboard --logdir runs Then access `http://localhost:6006` in your browser. ### Data Format The project expects the following hyperspectral data formats: - **`.float` files**: Raw hyperspectral data (520×696×128) - **`.hdr` files**: Header files containing wavelength information - **`.png` files**: RGB visualization of hyperspectral data Example data structure: new_data/ ├── type1/ │ ├── a1.float │ ├── a1.hdr │ ├── a1.png │ └── ... ├── type2/ │ ├── h1.float │ ├── h1.hdr │ ├── h1.png │ └── ... ├── vit_features/ │ ├── a1_features.pkl │ └── ... ├── png_features/ │ ├── a1_png_features.pkl │ └── ... └── metadata.csv ### Metadata Format The `metadata.csv` file contains the following columns: - `type`: Sample category (type1/type2) - `float_data_path`: Path to hyperspectral .float file - `hdr_data_path`: Path to header file - `png_data_path`: Path to RGB visualization .png file - `feature_path`: Path to hyperspectral ViT features (128×768) - `png_feature_path`: Path to PNG ViT features (1×768) ## 📊 Model Performance The training pipeline provides comprehensive evaluation metrics: - **Training/Validation loss and accuracy curves** - **Confusion matrix** (classification performance) - **Classification report** (precision, recall, F1 score) - **Early stopping mechanism** (prevent overfitting) ## 🔬 Technical Details ### Hyperspectral Data Processing - **Dimensions**: 520×696 spatial × 128 spectral bands - **Data Type**: 32-bit float - **Header Offset**: Skip 32,768 bytes when loading - **Wavelength Range**: Extracted from .hdr files ### Feature Extraction Strategy - **ViT Model**: Pre-trained ViT-B/16 from torchvision - **Hyperspectral Features**: Treat each band as an RGB image for processing → 128×768 features - **PNG Features**: Directly process RGB visualizations → 1×768 features - **Feature Dimensions**: - Hyperspectral: 768 per band, total 128×768 - PNG: 768 per image - **Normalization**: Both types of features are normalized using ImageNet statistics ### Multimodal Data Processing The project supports multiple input configurations: - **Multimodal**: Fuse hyperspectral and PNG features - **`vit_only`**: Only use hyperspectral features extracted by ViT - **`png_only`**: Only use PNG features extracted by ViT See the "Multimodal Spectral Transformer" section for detailed fusion strategies. ### Model Hyperparameters - **Input Dimension**: 768 (ViT features) - **Model Dimension**: 256 - **Number of Attention Heads**: 4 - **Encoder Layers**: 2 - **Feed-forward Dimension**: 1024 - **Batch Size**: 32 - **Learning Rate**: 5e-4 - **Weight Decay**: 1e-4 ## 📁 File Descriptions | File | Description | |------|-------------| | [`train.py`](train.py) | Main training script, including data loading and model training | | [`models.py`](models.py) | SpectralTransformer model architecture | | [`utils.py`](utils.py) | Training utilities, evaluation and visualization functions | | [`data_get.ipynb`](data_get.ipynb) | Data preprocessing and metadata generation | | [`feature_extract_vit.ipynb`](feature_extract_vit.ipynb) | ViT feature extraction pipeline | | [`tech_pipeline.png`](tech_pipeline.png) | Technical workflow visualization diagram | ## 🎯 Key Innovations 1. **Spectrum-aware Architecture**: Transformer customized for hyperspectral data 2. **Multi-scale Feature Extraction**: ViT features capture spatial patterns of each band 3. **Efficient Processing**: Pre-extracted features accelerate training 4. **Comprehensive Evaluation**: Detailed metrics and visualization for model analysis ## 📈 Future Outlook - Support for multi-class classification - Integration of spatial-spectral attention mechanisms - Real-time inference pipeline - Advanced data augmentation - Model ensemble methods ## 🤝 Contribution Notes This project demonstrates a complete pipeline for hyperspectral image classification, with modular design that facilitates extension and modification of individual components. ## 📄 License This project belongs to a research program for hyperspectral image analysis and classification.
提供机构:
maas
创建时间:
2025-06-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个高光谱图像分类项目,使用ViT进行特征提取和自定义Transformer模型进行分类,支持多模态数据处理和多种融合策略,适用于科研和深度学习应用。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作