five

LeBabyOx/EEGParquet

收藏
Hugging Face2026-04-05 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/LeBabyOx/EEGParquet
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - tabular-classification tags: - eeg - neuroscience - biomedical-signal-processing - time-series - biology pretty_name: EEGParquet-Benchmark size_categories: - 1M<n<10M --- # 🧠 EEGParquet-Benchmark ## 📌 Overview This dataset contains electroencephalography (EEG) recordings processed and stored in a structured format for machine learning and signal analysis tasks. It is designed to support research in brain-computer interfaces (BCI), neurological disorder detection, and time-series modeling. The dataset is stored in Parquet format to enable efficient large-scale processing and seamless integration with modern ML pipelines. ## 🎯 Intended Use This dataset can be used for token classification on EEG sequences, brain signal decoding, sleep stage classification, seizure detection, time-series forecasting, and representation learning in biomedical signal processing. ## 🧾 Dataset Structure The dataset is organized into multiple Parquet files, typically per subject or recording session: ``` /data ├── chb01_01_features.parquet ├── chb01_02_features.parquet ├── ... ``` Each file contains time-series EEG data with the following fields: - `timestamp`: Time index of the signal - `channel_*`: EEG channel values (e.g., Fp1, Fp2, etc.) - `label` (optional): Annotation per timestep for supervised tasks ## 📊 Features - timestamp: Time index of the signal - channel_*: EEG electrode readings across multiple channels - label: Token/class label (if available) ## 🧪 Processing Details The dataset was constructed using a sliding window segmentation approach: - Window size: 2 seconds - Step size: 1 second (50% overlap) - Sampling frequency: variable per recording (standardized during processing) Each window is labeled as seizure (1) if it overlaps with annotated seizure intervals. Extracted features per channel include: - Statistical: Mean, Standard Deviation, Variance - Information-theoretic: Shannon Entropy - Spectral: Band power across Delta (0.5–4 Hz), Theta (4–8 Hz), Alpha (8–13 Hz), and Beta (13–30 Hz) Power spectral density is computed using Welch’s method, with optional GPU acceleration for large-scale processing. ## 📈 Data Size The dataset contains approximately 1M–10M timesteps and is optimized for fast I/O and scalable training workflows. ## 🧪 Example Usage ```python import pandas as pd df = pd.read_parquet("data/subject_01.parquet") print(df.head()) ``` ## ⚠️ Limitations EEG signals are inherently noisy and subject-dependent. Label quality may vary depending on the annotation source. This dataset is intended for research purposes and should not be used directly for clinical diagnosis without proper validation. ## 🔐 Ethics & Privacy This dataset is intended for research and educational use only. Users are responsible for ensuring compliance with applicable regulations and ethical guidelines when using this data. ## 📚 Citation If you use this dataset, please cite both the original dataset and this processed version: ### Original Dataset (CHB-MIT EEG) ``` @article{goldberger2000physiobank, title={PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals}, author={Goldberger, Ary L. and Amaral, Luis A. N. and Glass, Leon and Hausdorff, Jeffrey M. and Ivanov, Plamen Ch. and Mark, Roger G. and Mietus, Joseph E. and Moody, George B. and Peng, Chung-Kang and Stanley, H. Eugene}, journal={Circulation}, volume={101}, number={23}, pages={e215--e220}, year={2000} } ``` ### This Dataset (EEGParquet-Benchmark) ``` @dataset{eegparquet_benchmark_2026, title={EEGParquet-Benchmark: Windowed and Feature-Enriched EEG Dataset for Seizure Detection}, author={Daffa Tarigan}, year={2026}, note={Derived from the CHB-MIT Scalp EEG Database with 2-second sliding windows (1-second overlap), bandpass filtering (0.5--40 Hz), and statistical + spectral feature extraction}, publisher={Hugging Face}, url={https://huggingface.co/datasets/LeBabyOx/EEGParquet} } ```
提供机构:
LeBabyOx
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作