Synthetic Pathology Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14974649
下载链接
链接失效反馈官方服务:
资源简介:
A synthetic dataset was generated to mimic realistic distributions of voice parameters (e.g., pitch, jitter,shimmer, harmonic-to-noise ratio, age, and a continuous disease severity score). The pathologicallabels were derived based on domain-inspired thresholds, ensuring a challenging classification task.
we assess the thresholds applied to generate synthetic pathology labels, evaluatingtheir alignment with clinical contexts.• Jitter (> 0.05): Jitter measures frequency variation in voice signals. Healthy voices typicallyexhibit jitter below 1–2%, while the 0.05 (5%) threshold exceeds clinical norms but maydetect pronounced pathology, assuming proper scaling.• Shimmer (> 0.08): Shimmer reflects amplitude variation, normally below 3–5% in healthyvoices. The 0.08 (8%) threshold is above typical ranges, suitable for severe cases butpotentially missing subtle issues.• HNR (< 15): Harmonic-to-Noise Ratio (HNR) indicates harmonic versus noise balance.Healthy voices often exceed 20 dB, while <15 dB aligns with pathological noisiness, makingthis threshold clinically plausible.• Age (> 70): Age is a risk factor for voice decline, but >70 as a pathology marker is overlysimplistic. It may act as a proxy in synthetic data, though not diagnostic in practice.• Disease Severity (> 0.7): This synthetic parameter, likely on a 0–1 scale, uses a 0.7 cutoffto denote severity. While arbitrary, it is reasonable for synthetic data but lacks direct clinicalgrounding.
创建时间:
2025-03-05



