Supplementary Material for: Digital vocal biomarker of smoking status using ecological audio recordings: results from the Colive Voice study
收藏DataCite Commons2025-05-01 更新2024-08-19 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Digital_vocal_biomarker_of_smoking_status_using_ecological_audio_recordings_results_from_the_Colive_Voice_study/26404006/1
下载链接
链接失效反馈官方服务:
资源简介:
Introduction
The complex health, social, and economic consequences of tobacco smoking underscore the importance of incorporating reliable and scalable data collection on smoking status and habits into research across various disciplines. Given that smoking impacts voice production, we aimed to develop a gender and language-specific vocal biomarker of smoking status.
Methods
Leveraging data from the Colive Voice study, we used statistical analysis methods to quantify the effects of smoking on voice characteristics. Various voice feature extraction methods combined with machine learning algorithms were then used to produce a gender and language-specific (English and French) digital vocal biomarker to differentiate smokers from never-smokers.
Results
A total of 1332 participants were included after propensity score matching (mean age = 43.6 (13.65), 64.41% are female, 56.68% are English speakers, 50% are smokers and 50% are never-smokers). We observed differences in voice features distribution: for women, the fundamental frequency F0, the formants F1, F2 and F3 frequencies and the harmonics to noise ratio (NHR) were lower in smokers compared to never-smokers (P<0.05) while for men no significant disparities were noted between the two groups. The accuracy and AUC of smoking status prediction reached 0.71 and 0.76 respectively for the female participants, and 0.65 and 0.68 respectively for the male participants.
Conclusion
We have shown that voice features are impacted by smoking. We have developed a novel digital vocal biomarker that can be used in clinical and epidemiological research to assess smoking status in a rapid, scalable and accurate manner using ecological audio recordings.
引言
烟草吸烟所引发的复杂健康、社会与经济后果,凸显了在跨学科研究中纳入针对吸烟状态与吸烟习惯的可靠且可规模化的数据收集工作的重要性。鉴于吸烟会对发声产生影响,本研究旨在开发一种针对吸烟状态的性别与语言特异性语音生物标志物(vocal biomarker)。
研究方法
本研究依托Colive Voice研究的数据集,采用统计分析方法量化吸烟对语音特征的影响。随后结合多种语音特征提取方法与机器学习算法,开发出适配性别与语言场景(英语及法语)的数字化语音生物标志物(digital vocal biomarker),以区分吸烟者与从未吸烟者。
研究结果
经倾向得分匹配(propensity score matching)后,最终纳入1332名参与者,其平均年龄为43.6岁(标准差13.65),女性占比64.41%,英语使用者占比56.68%,吸烟者与从未吸烟者各占50%。研究观察到语音特征的分布存在显著差异:对于女性群体,吸烟者的基频(fundamental frequency F0)、共振峰(formants)F1、F2、F3频率以及谐波噪声比(harmonics to noise ratio, NHR)均低于从未吸烟者(P<0.05);而男性群体中两组未观察到显著差异。女性参与者的吸烟状态预测准确率与曲线下面积(Area Under Curve, AUC)分别为0.71与0.76,男性参与者则分别为0.65与0.68。
结论
本研究证实吸烟会对语音特征产生影响,并开发出一种新型数字化语音生物标志物。该标志物可应用于临床与流行病学研究,通过自然场景音频以快速、可规模化且精准的方式评估吸烟状态。
提供机构:
Karger Publishers
创建时间:
2024-07-30



