five

Dataset for "Classification of Seismic Events in Mainland China Based on Spectrograms and Model Interpretability"

收藏
科学数据银行2025-12-21 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=35afda3bf71d4925a97a5f9057b26cc9
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset DescriptionThis dataset contains seismic waveform data collected from China Digital Seismograph Network for earthquake event classification and machine learning research. The dataset supports the ResWaveQuake model described in the associated publication on time-frequency based earthquake event classification in mainland China.Data OverviewThe dataset spans from January 2013 to May 2024, covering mainland China and adjacent regions. It includes 2,870 seismic events, containing approximately 100,000 associated waveform traces, categorized into:- Natural earthquakes (eq): 1,158 events (926 training, 232 testing)- Explosions (ep): 972 events (778 training, 194 testing)- Collapses/subsidences (cl): 740 events (592 training, 148 testing)The dataset is organized using event-level stratified sampling with approximately 80% allocated for training (2,296 events) and 20% for testing (574 events), while maintaining complete event structure.Data Characteristics and ProcessingData Format and StorageAll waveform data are stored in HDF5 format for efficient access and AI model training. The original MiniSEED format data have been converted and organized by event category, with each HDF5 file containing all events of a specific category and split (e.g., train_eq.h5, test_ep.h5).Waveform Specifications- Sampling rate: 100 Hz- Data length: 200 seconds per record (20,000 samples)- Time window: 50 seconds before P-wave arrival plus 150 seconds after- Components: Three-component data (BHE, BHN, BHZ) representing east-west, north-south, and vertical directions- Component order: Standardized to ENZ order (East, North, Vertical) in HDF5 files- Frequency range: Band-pass filtered to 0.1-25 Hz- Signal-to-noise ratio: ≥ 2 for all records- Normalization: Maximum amplitude normalization applied (data range: [-1, 1])Quality AssuranceQuality assurance measures include:- Retaining only events with ≥1 valid waveform records- Systematic quality control using STA/LTA detection for noise samples- Consistent preprocessing standards across all regional data- Manual expert annotations for all P-wave arrivals- Support for incomplete component data: Stations with missing components (BHE, BHN, or BHZ) are included with NaN-filled missing components, allowing maximum data retention while maintaining consistent data structureIncomplete Component HandlingThe dataset includes stations with incomplete three-component data:- Missing components are filled with NaN values- Each station dataset includes an `available_components` attribute indicating which components are present- The `has_missing_components` attribute flags stations with incomplete data- Missing component indices are stored in the `missing_component_indices` attribute- This approach maximizes data retention while maintaining consistent data structure for machine learning applicationsData CollectionData were collected from 632 unique seismic stations across mainland China, covering:- Epicentral distances: 0-800 km- Magnitude range: 0-5- Geographic coverage: Mainland ChinaHDF5 File StructureThe HDF5 files are organized hierarchically:category_name/          # Category group (eq, ep, or cl) event_0/            # Event group (anonymized event ID)  station_0/          # Station dataset (shape: [3, 20000])   - Attributes:    * station_id: station ID (e.g., NX.001)    * network: Network code (e.g., NX)    * shape: Data shape [3, 20000]    * components: Component list ['BHE', 'BHN', 'BHZ']    * component_order: Component order 'ENZ'    * available_components: List of available components (e.g., ['BHE', 'BHN', 'BHZ'])    * has_missing_components: Boolean flag indicating if any components are missing    * missing_component_indices: List of indices for missing components (0=BHE, 1=BHN, 2=BHZ)   - Data: numpy array of shape (3, 20000), missing components filled with NaN - Group attributes:  * num_events: Number of events in this category  * total_waveforms: Total number of waveforms  * category: Category name (eq, ep, or cl)  * split: Dataset split (train or test)  * component_order: Global component order 'ENZ'  * num_unique_stations: Number of unique stations  * num_complete_component_stations: Number of stations with all three components  * num_incomplete_component_stations: Number of stations with missing componentsData Source and AcknowledgmentThe seismic data are provided by International Earthquake Science Data Center at Institute of Geophysics, China Earthquake Administration (Doi:10.11998/IESDC). The data were produced by China Earthquake Networks Center, AH, BJ, CQ, FJ, GD, GS, GX, GZ, HA, HB, HE, HI, HL, HN, JL, JS, JX, LN, NM, NX, QH, SC, SD, SH, SN, SX, TJ, XJ, XZ, YN, ZJ Seismic Networks, China Earthquake Administration.
提供机构:
Yuanyuan Fu; Yongjie Chen; Guangxi Miinzu University
创建时间:
2025-12-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作