spectrum
收藏魔搭社区2025-12-04 更新2025-06-28 收录
下载链接:
https://modelscope.cn/datasets/davidjam/spectrum
下载链接
链接失效反馈官方服务:
资源简介:
# Spectrum Archive - 基于视觉Transformer的高光谱图像分类
一个利用视觉Transformer(ViT)特征提取和自定义Spectral Transformer模型进行高光谱图像分类的深度学习项目。
## 📋 项目概述
本项目实现了一个用于将高光谱图像分为两类(Type 1 和 Type 2)的机器学习流水线。系统使用ViT对高光谱数据进行特征提取,并通过自定义的Spectral Transformer模型进行分类。
### 主要特性
- **高光谱数据处理**:支持处理具有128个光谱波段的`.float`和`.hdr`文件
- **视觉Transformer特征提取**:使用预训练的ViT-B/16对每个光谱波段提取特征
- **自定义Spectral Transformer**:专为光谱数据分类设计的基于Transformer的架构
- **完整训练流水线**:包含训练、验证、测试和可视化工具
- **数据管理**:自动化数据组织与元数据生成
## 🏗️ 项目结构
```
spectrum_archive/
├── train.py # 主训练脚本
├── models.py # SpectralTransformer模型定义
├── utils.py # 训练工具与辅助函数
├── data_get.ipynb # 数据预处理与组织
├── feature_extract_vit.ipynb # ViT特征提取流水线
├── tech_pipeline.png # 技术流程可视化
├── .gitignore # Git忽略配置
└── README.md # 本文件
```
## 🔧 技术架构
### 数据流水线
1. **数据组织**([`data_get.ipynb`](data_get.ipynb)):
- 将高光谱数据文件整理到`type1`和`type2`目录
- 生成包含文件路径和类型标签的元数据CSV
- 验证所有样本的数据一致性
2. **特征提取**([`feature_extract_vit.ipynb`](feature_extract_vit.ipynb)):
- 处理高光谱`.float`文件(520×696×128维)
- 使用预训练ViT-B/16对每个光谱波段提取特征
- 每个波段生成768维特征向量
- 特征以pickle文件保存,便于高效加载
### 模型架构
**多模态SpectralTransformer**([`models.py`](models.py:5)):
- **灵活输入**:可根据`model_mode`选择高光谱特征、PNG特征或两者
- **输入**:
- **`multimodal`模式**:
- 高光谱特征:128波段 × 768特征(ViT提取)
- PNG特征:768特征(RGB图像ViT提取)
- **`vit_only`模式**:
- 高光谱特征:128波段 × 768特征
- **`png_only`模式**:
- PNG特征:768特征
- **架构**:
- **公共分支(`multimodal`和`vit_only`)**:
- 高光谱分支:输入投影 → 位置编码 → Transformer编码器
- **PNG特征处理(`multimodal`)**:
- PNG分支:多层感知机+归一化
- **独立PNG处理(`png_only`)**:
- 简单MLP直接处理PNG特征进行分类(不经过Transformer)
- **融合方式(`multimodal`)**:
- **早期融合**:PNG特征广播后加到每个光谱波段
- **晚期融合**:各自处理后特征拼接
- **注意力融合**:PNG特征作为查询,高光谱特征为键值做多头注意力
- **输出**:二分类(Type 1 vs Type 2)
### 训练流水线
**训练脚本**([`train.py`](train.py)):
- **数据加载**:自定义数据集类,支持加载高光谱ViT特征和PNG特征
- **多模态训练**:两类特征输入模型,进行融合学习
- **可配置融合**:支持不同融合策略(early/late/attention)
- **数据划分**:70%训练,20%验证,10%测试
- **优化器**:AdamW,学习率5e-4
- **训练特性**:
- 早停机制
- 可选学习率调度
- 全面指标追踪
- 最优模型保存
**工具函数**([`utils.py`](utils.py)):
- [`train_model()`](utils.py:10):完整训练循环,含验证
- [`test_model()`](utils.py:169):模型评估与指标输出
- [`plot_training_history()`](utils.py:232):训练过程可视化
## 🚀 快速开始
### 依赖安装
```bash
pip install torch torchvision
pip install pandas numpy matplotlib
pip install scikit-learn tqdm
pip install opencv-python pillow
```
如果运行时报错缺少什么依赖,那么也进行pip install即可。
### 使用方法
1. **数据准备**:
```bash
# 运行数据组织notebook
jupyter notebook data_get.ipynb
```
2. **特征提取**:
```bash
# 对高光谱数据和PNG图像提取ViT特征
jupyter notebook feature_extract_vit.ipynb
```
3. **模型训练**:
可通过命令行参数控制训练模式和融合方式。
**多模态训练(默认)**:
```bash
python train.py --mode multimodal --fusion_method late
# 其他融合方式:--fusion_method early, --fusion_method attention
```
**单模态训练**:
**仅ViT特征**:
```bash
python train.py --mode vit_only
```
**仅PNG特征**:
```bash
python train.py --mode png_only
```
**其他参数**:
还可指定`batch_size`、`num_epochs`、`lr`、`weight_decay`、`patience`等:
```bash
python train.py --mode multimodal --batch_size 64 --num_epochs 200 --lr 1e-3
```
**模型对比脚本**:
可用shell脚本快速对比不同模式和融合策略:
```bash
bash compare_models.sh
```
该脚本会依次运行多模态(late、early、attention融合)、vit_only、png_only模式的训练,所有结果均记录到TensorBoard,便于对比。
**K折交叉验证**:
可用K折交叉验证脚本评估模型性能与过拟合情况:
```bash
# 多模态K折验证
python kfold_validation.py --mode multimodal --fusion_method late --k_folds 5 --num_epochs 20
# 单模态K折验证
python kfold_validation.py --mode vit_only --k_folds 5 --num_epochs 20
python kfold_validation.py --mode png_only --k_folds 5 --num_epochs 20
```
该脚本支持:
- 训练-验证准确率差异分析
- 各折方差评估
- 自动过拟合检测与分级
- 详细结果保存为CSV和汇总文件
**全自动K折评估**:
可用shell脚本自动运行所有模式和融合方式的K折验证:
```bash
bash run_all_kfold_experiments.sh
```
该脚本会依次执行:
- 多模态(late、early、attention融合)
- 仅VIT
- 仅PNG
所有结果均保存为独立CSV/TXT文件和TensorBoard日志,便于全面对比。
### TensorBoard日志
训练指标(损失、准确率)自动记录到TensorBoard,最优模型也会保存在对应日志目录。
查看TensorBoard日志:
```bash
tensorboard --logdir runs
```
然后在浏览器访问 `http://localhost:6006`。
### 数据格式
项目期望高光谱数据格式如下:
- **`.float`文件**:原始高光谱数据(520×696×128)
- **`.hdr`文件**:包含波长信息的头文件
- **`.png`文件**:高光谱数据的RGB可视化
数据结构示例:
```
new_data/
├── type1/
│ ├── a1.float
│ ├── a1.hdr
│ ├── a1.png
│ └── ...
├── type2/
│ ├── h1.float
│ ├── h1.hdr
│ ├── h1.png
│ └── ...
├── vit_features/
│ ├── a1_features.pkl
│ └── ...
├── png_features/
│ ├── a1_png_features.pkl
│ └── ...
└── metadata.csv
```
### 元数据格式
`metadata.csv` 包含以下列:
- `type`:样本类别(type1/type2)
- `float_data路径`:高光谱.float文件路径
- `hdr_data路径`:头文件路径
- `png_data路径`:RGB可视化.png文件路径
- `feature_path`:高光谱ViT特征路径(128×768)
- `png_feature_path`:PNG ViT特征路径(1×768)
## 📊 模型性能
训练流水线提供全面评估指标:
- **训练/验证损失与准确率曲线**
- **混淆矩阵**(分类性能)
- **分类报告**(精度、召回率、F1分数)
- **早停机制**(防止过拟合)
## 🔬 技术细节
### 高光谱数据处理
- **尺寸**:520×696空间 × 128光谱波段
- **数据类型**:32位float
- **头偏移**:加载时跳过32,768字节
- **波长范围**:从.hdr文件提取
### 特征提取策略
- **ViT模型**:torchvision预训练ViT-B/16
- **高光谱特征**:每个波段作为RGB图像处理 → 128×768特征
- **PNG特征**:RGB可视化直接处理 → 1×768特征
- **特征维度**:
- 高光谱:每波段768,总128×768
- PNG:每图像768
- **归一化**:两类特征均用ImageNet统计量归一化
### 多模态数据处理
项目支持多种输入配置:
- **多模态**:融合高光谱和PNG特征
- **`vit_only`**:仅用ViT提取的高光谱特征
- **`png_only`**:仅用ViT提取的PNG特征
详细融合策略见“多模态SpectralTransformer”章节。
### 模型超参数
- **输入维度**:768(ViT特征)
- **模型维度**:256
- **注意力头数**:4
- **编码器层数**:2
- **前馈维度**:1024
- **批量大小**:32
- **学习率**:5e-4
- **权重衰减**:1e-4
## 📁 文件说明
| 文件 | 说明 |
|------|------|
| [`train.py`](train.py) | 主训练脚本,含数据加载与模型训练 |
| [`models.py`](models.py) | SpectralTransformer模型架构 |
| [`utils.py`](utils.py) | 训练工具、评估与可视化函数 |
| [`data_get.ipynb`](data_get.ipynb) | 数据预处理与元数据生成 |
| [`feature_extract_vit.ipynb`](feature_extract_vit.ipynb) | ViT特征提取流水线 |
| [`tech_pipeline.png`](tech_pipeline.png) | 技术流程可视化图 |
## 🎯 主要创新点
1. **光谱感知架构**:为高光谱数据定制的Transformer
2. **多尺度特征提取**:ViT特征捕捉每个波段的空间模式
3. **高效处理**:预提取特征加速训练
4. **全面评估**:详细指标与可视化,便于模型分析
## 📈 未来展望
- 支持多类别分类
- 集成空间-光谱注意力机制
- 实时推理流水线
- 高级数据增强
- 模型集成方法
## 🤝 贡献说明
本项目展示了高光谱图像分类的完整流水线,模块化设计便于扩展和修改各个组件。
## 📄 许可
本项目属于高光谱图像分析与分类的科研计划。
# Spectrum Archive - Hyperspectral Image Classification based on Vision Transformer
A deep learning project that utilizes Vision Transformer (ViT) feature extraction and a custom Spectral Transformer model for hyperspectral image classification.
## 📋 Project Overview
This project implements a machine learning pipeline for classifying hyperspectral images into two categories: Type 1 and Type 2. The system uses ViT to extract features from hyperspectral data and a custom Spectral Transformer model for classification.
### Key Features
- **Hyperspectral Data Processing**: Supports processing `.float` and `.hdr` files with 128 spectral bands
- **Vision Transformer Feature Extraction**: Uses pre-trained ViT-B/16 to extract features from each spectral band
- **Custom Spectral Transformer**: A Transformer-based architecture specifically designed for spectral data classification
- **Complete Training Pipeline**: Includes training, validation, testing, and visualization tools
- **Data Management**: Automated data organization and metadata generation
## 🏗️ Project Structure
spectrum_archive/
├── train.py # Main training script
├── models.py # SpectralTransformer model definition
├── utils.py # Training utilities and helper functions
├── data_get.ipynb # Data preprocessing and organization
├── feature_extract_vit.ipynb # ViT feature extraction pipeline
├── tech_pipeline.png # Technical workflow visualization
├── .gitignore # Git ignore configuration
└── README.md # This file
## 🔧 Technical Architecture
### Data Pipeline
1. **Data Organization** ([`data_get.ipynb`](data_get.ipynb)):
- Organize hyperspectral data files into `type1` and `type2` directories
- Generate metadata CSV containing file paths and category labels
- Validate data consistency across all samples
2. **Feature Extraction** ([`feature_extract_vit.ipynb`](feature_extract_vit.ipynb)):
- Process hyperspectral `.float` files (520×696×128 dimensions)
- Use pre-trained ViT-B/16 to extract features from each spectral band
- Generate 768-dimensional feature vectors for each band
- Save features as pickle files for efficient loading
### Model Architecture
**Multimodal Spectral Transformer** ([`models.py`](models.py:5)):
- **Flexible Input**: Supports selecting hyperspectral features, PNG features, or both via `model_mode`
- **Inputs**:
- **`multimodal` mode**:
- Hyperspectral features: 128 bands × 768 features (extracted by ViT)
- PNG features: 768 features (extracted from RGB images via ViT)
- **`vit_only` mode**:
- Hyperspectral features: 128 bands × 768 features
- **`png_only` mode**:
- PNG features: 768 features
- **Architecture**:
- **Common Branch (for `multimodal` and `vit_only`)**:
- Hyperspectral branch: Input projection → Positional encoding → Transformer encoder
- **PNG Feature Processing (for `multimodal`)**:
- PNG branch: Multi-layer perceptron + normalization
- **Standalone PNG Processing (for `png_only`)**:
- Simple MLP directly processes PNG features for classification (without Transformer)
- **Fusion Methods (for `multimodal`)**:
- **Early Fusion**: Broadcast PNG features and add to each spectral band
- **Late Fusion**: Concatenate features after separate processing
- **Attention Fusion**: Use PNG features as queries, hyperspectral features as keys/values for multi-head attention
- **Output**: Binary classification (Type 1 vs Type 2)
### Training Pipeline
**Training Script** ([`train.py`](train.py)):
- **Data Loading**: Custom dataset class supporting loading of hyperspectral ViT features and PNG features
- **Multimodal Training**: Train the model with both feature types for fusion learning
- **Configurable Fusion**: Supports different fusion strategies (early/late/attention)
- **Data Split**: 70% training, 20% validation, 10% testing
- **Optimizer**: AdamW, learning rate 5e-4
- **Training Features**:
- Early stopping mechanism
- Optional learning rate scheduling
- Comprehensive metric tracking
- Optimal model saving
**Utility Functions** ([`utils.py`](utils.py)):
- [`train_model()`](utils.py:10): Full training loop with validation
- [`test_model()`](utils.py:169): Model evaluation and metric output
- [`plot_training_history()`](utils.py:232): Training process visualization
## 🚀 Quick Start
### Dependency Installation
bash
pip install torch torchvision
pip install pandas numpy matplotlib
pip install scikit-learn tqdm
pip install opencv-python pillow
If you encounter missing dependency errors during runtime, install the corresponding package via pip.
### Usage
1. **Data Preparation**:
bash
# Run the data organization notebook
jupyter notebook data_get.ipynb
2. **Feature Extraction**:
bash
# Extract ViT features from hyperspectral data and PNG images
jupyter notebook feature_extract_vit.ipynb
3. **Model Training**:
Control training mode and fusion method via command-line arguments.
**Multimodal Training (Default)**:
bash
python train.py --mode multimodal --fusion_method late
# Other fusion methods: --fusion_method early, --fusion_method attention
**Single-modal Training**:
**ViT-only Features**:
bash
python train.py --mode vit_only
**PNG-only Features**:
bash
python train.py --mode png_only
**Other Parameters**:
You can also specify `batch_size`, `num_epochs`, `lr`, `weight_decay`, `patience`, etc.:
bash
python train.py --mode multimodal --batch_size 64 --num_epochs 200 --lr 1e-3
**Model Comparison Script**:
Use the shell script to quickly compare different modes and fusion strategies:
bash
bash compare_models.sh
This script will sequentially run training for multimodal (late, early, attention fusion), vit_only, and png_only modes. All results are logged to TensorBoard for easy comparison.
**K-fold Cross Validation**:
Use the K-fold cross validation script to evaluate model performance and overfitting:
bash
# Multimodal K-fold validation
python kfold_validation.py --mode multimodal --fusion_method late --k_folds 5 --num_epochs 20
# Single-modal K-fold validation
python kfold_validation.py --mode vit_only --k_folds 5 --num_epochs 20
python kfold_validation.py --mode png_only --k_folds 5 --num_epochs 20
This script supports:
- Training-validation accuracy difference analysis
- Variance evaluation across folds
- Automatic overfitting detection and grading
- Detailed results saved as CSV and summary files
**Fully Automated K-fold Evaluation**:
Use the shell script to automatically run K-fold validation for all modes and fusion strategies:
bash
bash run_all_kfold_experiments.sh
This script will sequentially execute:
- Multimodal (late, early, attention fusion)
- ViT-only
- PNG-only
All results are saved as separate CSV/TXT files and TensorBoard logs for comprehensive comparison.
### TensorBoard Logging
Training metrics (loss, accuracy) are automatically logged to TensorBoard, and the optimal model is saved in the corresponding log directory.
To view TensorBoard logs:
bash
tensorboard --logdir runs
Then access `http://localhost:6006` in your browser.
### Data Format
The project expects the following hyperspectral data formats:
- **`.float` files**: Raw hyperspectral data (520×696×128)
- **`.hdr` files**: Header files containing wavelength information
- **`.png` files**: RGB visualization of hyperspectral data
Example data structure:
new_data/
├── type1/
│ ├── a1.float
│ ├── a1.hdr
│ ├── a1.png
│ └── ...
├── type2/
│ ├── h1.float
│ ├── h1.hdr
│ ├── h1.png
│ └── ...
├── vit_features/
│ ├── a1_features.pkl
│ └── ...
├── png_features/
│ ├── a1_png_features.pkl
│ └── ...
└── metadata.csv
### Metadata Format
The `metadata.csv` file contains the following columns:
- `type`: Sample category (type1/type2)
- `float_data_path`: Path to hyperspectral .float file
- `hdr_data_path`: Path to header file
- `png_data_path`: Path to RGB visualization .png file
- `feature_path`: Path to hyperspectral ViT features (128×768)
- `png_feature_path`: Path to PNG ViT features (1×768)
## 📊 Model Performance
The training pipeline provides comprehensive evaluation metrics:
- **Training/Validation loss and accuracy curves**
- **Confusion matrix** (classification performance)
- **Classification report** (precision, recall, F1 score)
- **Early stopping mechanism** (prevent overfitting)
## 🔬 Technical Details
### Hyperspectral Data Processing
- **Dimensions**: 520×696 spatial × 128 spectral bands
- **Data Type**: 32-bit float
- **Header Offset**: Skip 32,768 bytes when loading
- **Wavelength Range**: Extracted from .hdr files
### Feature Extraction Strategy
- **ViT Model**: Pre-trained ViT-B/16 from torchvision
- **Hyperspectral Features**: Treat each band as an RGB image for processing → 128×768 features
- **PNG Features**: Directly process RGB visualizations → 1×768 features
- **Feature Dimensions**:
- Hyperspectral: 768 per band, total 128×768
- PNG: 768 per image
- **Normalization**: Both types of features are normalized using ImageNet statistics
### Multimodal Data Processing
The project supports multiple input configurations:
- **Multimodal**: Fuse hyperspectral and PNG features
- **`vit_only`**: Only use hyperspectral features extracted by ViT
- **`png_only`**: Only use PNG features extracted by ViT
See the "Multimodal Spectral Transformer" section for detailed fusion strategies.
### Model Hyperparameters
- **Input Dimension**: 768 (ViT features)
- **Model Dimension**: 256
- **Number of Attention Heads**: 4
- **Encoder Layers**: 2
- **Feed-forward Dimension**: 1024
- **Batch Size**: 32
- **Learning Rate**: 5e-4
- **Weight Decay**: 1e-4
## 📁 File Descriptions
| File | Description |
|------|-------------|
| [`train.py`](train.py) | Main training script, including data loading and model training |
| [`models.py`](models.py) | SpectralTransformer model architecture |
| [`utils.py`](utils.py) | Training utilities, evaluation and visualization functions |
| [`data_get.ipynb`](data_get.ipynb) | Data preprocessing and metadata generation |
| [`feature_extract_vit.ipynb`](feature_extract_vit.ipynb) | ViT feature extraction pipeline |
| [`tech_pipeline.png`](tech_pipeline.png) | Technical workflow visualization diagram |
## 🎯 Key Innovations
1. **Spectrum-aware Architecture**: Transformer customized for hyperspectral data
2. **Multi-scale Feature Extraction**: ViT features capture spatial patterns of each band
3. **Efficient Processing**: Pre-extracted features accelerate training
4. **Comprehensive Evaluation**: Detailed metrics and visualization for model analysis
## 📈 Future Outlook
- Support for multi-class classification
- Integration of spatial-spectral attention mechanisms
- Real-time inference pipeline
- Advanced data augmentation
- Model ensemble methods
## 🤝 Contribution Notes
This project demonstrates a complete pipeline for hyperspectral image classification, with modular design that facilitates extension and modification of individual components.
## 📄 License
This project belongs to a research program for hyperspectral image analysis and classification.
提供机构:
maas
创建时间:
2025-06-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个高光谱图像分类项目,使用ViT进行特征提取和自定义Transformer模型进行分类,支持多模态数据处理和多种融合策略,适用于科研和深度学习应用。
以上内容由遇见数据集搜集并总结生成



