dust-systems/psg-audio

Name: dust-systems/psg-audio
Creator: dust-systems
Published: 2026-03-02 01:09:36
License: 暂无描述

Hugging Face2026-03-02 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/dust-systems/psg-audio

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - audio-classification language: - zh tags: - sleep-staging - polysomnography - audio - sleep-apnea - edf pretty_name: PSG-Audio size_categories: - 100K<n<1M --- # PSG-Audio: Polysomnography with Simultaneous Audio Recordings A **partial mirror** of the PSG-Audio dataset (Korompili et al., 2021) — full-night polysomnography recordings from subjects with obstructive sleep apnea, including a bedside microphone channel suitable for sound-based sleep staging research. > **Note:** This copy contains approximately 59% of the original dataset by file count (~60% by size). See [Completeness](#completeness) below for details on what is included and what is missing. ## Dataset Description Each subject has: - **EDF files** (European Data Format): Multi-channel PSG recordings, typically split into ~5 one-hour parts. 20 channels including EEG, EOG, EMG, ECG, airflow, SpO2, snore sensor, and a 48kHz bedside microphone. - **RML files**: XML-based expert sleep stage annotations. Each 30-second epoch is labeled as Wake, N1, N2, N3, or REM. ### Key Statistics | Attribute | Value | |-----------|-------| | Total size | ~586 GB | | File count | 1,682 (of 2,851 in original) | | Audio channel | `Mic` (channel 18), 48kHz sample rate | | Annotation format | 30-second epochs, 5-class (Wake/N1/N2/N3/REM) | | Population | Obstructive sleep apnea patients | ## Completeness The original PSG-Audio dataset (~986 GB, 2,851 files across V1/V2/V3) is hosted on [scidb.cn](https://www.scidb.cn/en/detail?dataSetId=778740145531650048) behind time-limited FTP credentials. This mirror was created from a partial transfer before those credentials expired. ### What's included | Directory | Files here | Notes | |-----------|-----------|-------| | V2/APNEA_RML | 175 | Near-complete | | V2/APNEA_RML_clean | 182 | Near-complete | | V3/APNEA_RML | 287 | Complete | | V3/APNEA_EDF | ~750 files | Partial — 556 EDF files missing | | V1/APNEA_RML | 14 | Partial | | V1/APNEA_RML_clean | 26 | Partial | | V3/APNEA_RML_clean | 10 | Partial | ### What's missing (1,171 files) | Directory | Missing | Notes | |-----------|---------|-------| | V3/APNEA_EDF | 556 | Raw multi-channel PSG recordings | | V3/APNEA_RML_clean | 184 | Cleaned annotations | | V1/APNEA_RML_clean | 184 | V1 cleaned annotations | | V1/APNEA_RML | 178 | V1 raw annotations | | V2/APNEA_RML | 37 | V2 raw annotations | | V2/APNEA_RML_clean | 30 | V2 cleaned annotations | If you have access to the full dataset and would like to contribute the missing files, please open a discussion. ## Usage ### Reading EDF files (Python) ```python import pyedflib edf = pyedflib.EdfReader("00001001_1.edf") channel_labels = edf.getSignalLabels() mic_idx = next(i for i, l in enumerate(channel_labels) if "mic" in l.lower()) audio = edf.readSignal(mic_idx) sample_rate = edf.getSampleFrequency(mic_idx) # 48000 edf.close() ``` ### Reading annotations (Python) ```python from lxml import etree tree = etree.parse("00001001.rml") events = tree.xpath("//ScoredEvent[Type='Stages|Stages']") for event in events: start = float(event.find("Start").text) duration = float(event.find("Duration").text) stage = event.find("Type").text # e.g., "NREM2", "Wake", "REM" ``` ### Audio channel warning The EDF files contain a `Snore` channel (channel 10, 500Hz) that is **not** raw audio — it's a derived sensor signal. The correct audio channel is `Mic` (channel 18, 48kHz). Any channel below 8kHz sample rate is not a microphone. ## Source Originally hosted at [scidb.cn](https://www.scidb.cn/en/detail?dataSetId=778740145531650048) with time-limited FTP access. This is a partial permanent mirror for reproducibility. Paper: [PSG-Audio, a scored polysomnography dataset with simultaneous audio recordings for sleep apnea studies](https://www.nature.com/articles/s41597-021-00977-w) (Scientific Data, 2021) ## Citation ```bibtex @article{korompili2021psgaudio, title={PSG-Audio, a scored polysomnography dataset with simultaneous audio recordings for sleep apnea studies}, author={Korompili, Georgia and others}, journal={Scientific Data}, volume={8}, year={2021}, doi={10.11922/sciencedb.00345} } ``` ## License CC BY 4.0

提供机构：

dust-systems

5,000+

优质数据集

54 个

任务类型

进入经典数据集