dust-systems/psg-audio
收藏Hugging Face2026-03-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/dust-systems/psg-audio
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- audio-classification
language:
- zh
tags:
- sleep-staging
- polysomnography
- audio
- sleep-apnea
- edf
pretty_name: PSG-Audio
size_categories:
- 100K<n<1M
---
# PSG-Audio: Polysomnography with Simultaneous Audio Recordings
A **partial mirror** of the PSG-Audio dataset (Korompili et al., 2021) — full-night polysomnography recordings from subjects with obstructive sleep apnea, including a bedside microphone channel suitable for sound-based sleep staging research.
> **Note:** This copy contains approximately 59% of the original dataset by file count (~60% by size). See [Completeness](#completeness) below for details on what is included and what is missing.
## Dataset Description
Each subject has:
- **EDF files** (European Data Format): Multi-channel PSG recordings, typically split into ~5 one-hour parts. 20 channels including EEG, EOG, EMG, ECG, airflow, SpO2, snore sensor, and a 48kHz bedside microphone.
- **RML files**: XML-based expert sleep stage annotations. Each 30-second epoch is labeled as Wake, N1, N2, N3, or REM.
### Key Statistics
| Attribute | Value |
|-----------|-------|
| Total size | ~586 GB |
| File count | 1,682 (of 2,851 in original) |
| Audio channel | `Mic` (channel 18), 48kHz sample rate |
| Annotation format | 30-second epochs, 5-class (Wake/N1/N2/N3/REM) |
| Population | Obstructive sleep apnea patients |
## Completeness
The original PSG-Audio dataset (~986 GB, 2,851 files across V1/V2/V3) is hosted on [scidb.cn](https://www.scidb.cn/en/detail?dataSetId=778740145531650048) behind time-limited FTP credentials. This mirror was created from a partial transfer before those credentials expired.
### What's included
| Directory | Files here | Notes |
|-----------|-----------|-------|
| V2/APNEA_RML | 175 | Near-complete |
| V2/APNEA_RML_clean | 182 | Near-complete |
| V3/APNEA_RML | 287 | Complete |
| V3/APNEA_EDF | ~750 files | Partial — 556 EDF files missing |
| V1/APNEA_RML | 14 | Partial |
| V1/APNEA_RML_clean | 26 | Partial |
| V3/APNEA_RML_clean | 10 | Partial |
### What's missing (1,171 files)
| Directory | Missing | Notes |
|-----------|---------|-------|
| V3/APNEA_EDF | 556 | Raw multi-channel PSG recordings |
| V3/APNEA_RML_clean | 184 | Cleaned annotations |
| V1/APNEA_RML_clean | 184 | V1 cleaned annotations |
| V1/APNEA_RML | 178 | V1 raw annotations |
| V2/APNEA_RML | 37 | V2 raw annotations |
| V2/APNEA_RML_clean | 30 | V2 cleaned annotations |
If you have access to the full dataset and would like to contribute the missing files, please open a discussion.
## Usage
### Reading EDF files (Python)
```python
import pyedflib
edf = pyedflib.EdfReader("00001001_1.edf")
channel_labels = edf.getSignalLabels()
mic_idx = next(i for i, l in enumerate(channel_labels) if "mic" in l.lower())
audio = edf.readSignal(mic_idx)
sample_rate = edf.getSampleFrequency(mic_idx) # 48000
edf.close()
```
### Reading annotations (Python)
```python
from lxml import etree
tree = etree.parse("00001001.rml")
events = tree.xpath("//ScoredEvent[Type='Stages|Stages']")
for event in events:
start = float(event.find("Start").text)
duration = float(event.find("Duration").text)
stage = event.find("Type").text # e.g., "NREM2", "Wake", "REM"
```
### Audio channel warning
The EDF files contain a `Snore` channel (channel 10, 500Hz) that is **not** raw audio — it's a derived sensor signal. The correct audio channel is `Mic` (channel 18, 48kHz). Any channel below 8kHz sample rate is not a microphone.
## Source
Originally hosted at [scidb.cn](https://www.scidb.cn/en/detail?dataSetId=778740145531650048) with time-limited FTP access. This is a partial permanent mirror for reproducibility.
Paper: [PSG-Audio, a scored polysomnography dataset with simultaneous audio recordings for sleep apnea studies](https://www.nature.com/articles/s41597-021-00977-w) (Scientific Data, 2021)
## Citation
```bibtex
@article{korompili2021psgaudio,
title={PSG-Audio, a scored polysomnography dataset with simultaneous audio
recordings for sleep apnea studies},
author={Korompili, Georgia and others},
journal={Scientific Data},
volume={8},
year={2021},
doi={10.11922/sciencedb.00345}
}
```
## License
CC BY 4.0
提供机构:
dust-systems



