Synthetic Pathology Dataset

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14974649

下载链接

链接失效反馈

官方服务：

资源简介：

A synthetic dataset was generated to mimic realistic distributions of voice parameters (e.g., pitch, jitter,shimmer, harmonic-to-noise ratio, age, and a continuous disease severity score). The pathologicallabels were derived based on domain-inspired thresholds, ensuring a challenging classification task. we assess the thresholds applied to generate synthetic pathology labels, evaluatingtheir alignment with clinical contexts.• Jitter (> 0.05): Jitter measures frequency variation in voice signals. Healthy voices typicallyexhibit jitter below 1–2%, while the 0.05 (5%) threshold exceeds clinical norms but maydetect pronounced pathology, assuming proper scaling.• Shimmer (> 0.08): Shimmer reflects amplitude variation, normally below 3–5% in healthyvoices. The 0.08 (8%) threshold is above typical ranges, suitable for severe cases butpotentially missing subtle issues.• HNR (< 15): Harmonic-to-Noise Ratio (HNR) indicates harmonic versus noise balance.Healthy voices often exceed 20 dB, while <15 dB aligns with pathological noisiness, makingthis threshold clinically plausible.• Age (> 70): Age is a risk factor for voice decline, but >70 as a pathology marker is overlysimplistic. It may act as a proxy in synthetic data, though not diagnostic in practice.• Disease Severity (> 0.7): This synthetic parameter, likely on a 0–1 scale, uses a 0.7 cutoffto denote severity. While arbitrary, it is reasonable for synthetic data but lacks direct clinicalgrounding.

创建时间：

2025-03-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集