CommonPhone-SE

Name: CommonPhone-SE
Creator: maas
Published: 2025-12-05 16:55:33
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/BSC-LT/CommonPhone-SE

下载链接

链接失效反馈

官方服务：

资源简介：

# CommonPhone-SE  Multilingual, age and gender balanced subset for speech enhancement benchmark. ## Dataset Details ### Dataset Description  Commonphone-SE is a benchmark dataset derived from Commonphone. It contains audio samples from 7 languages in the age range from 18 to 80. It aims to provide a speaker diverse dataset to benchmark speech enhancement algorithms in real world conditions. - **Curated by:** LangTech Lab members from the speech team. - **Language(s) (NLP):** CA, DE, EN, IT, FR, RU, ES - **License:** cc0-1.0 ### Dataset Sources  CommonPhone-SE is a subset of CommonPhone. - **Repository:** https://zenodo.org/records/5846137 - **Paper :** https://arxiv.org/abs/2201.05912 ### Languages Catalan(CA), Deutsch(DE), English(EN), Italian(IT), French(FR), Russian(RU), Spanish(ES) ## Uses  The goal of this dataset is to evaluate the generalization capabilities of speech enhancement models in a real world multilingual and diverse dataset. ## Dataset Structure  The dataset consists of a single split, providing audios, transcriptions and demographic information ``` Dataset({ features: ['filename', 'gender', 'age', 'language', 'text', 'audio'], num_rows: 5242 }) ``` Each data point is structured as: ``` {'filename': 'common_voice_ca_31498257', 'gender': 'female', 'age': 'fifties', 'language': 'ca', 'text': 'lieutenant monroe va resultar ferit durant la batalla i va servir posteriorment al congrés', 'audio': {'path': 'Commonphone-SE/common_voice_ca_31498257.wav', 'array': array([0., 0., 0., ..., 0., 0., 0.]), 'sampling_rate': 16000}} ``` ## Dataset Creation ### Curation Rationale  The sampling rationale was to select audios that remain difficult for state of the art enhancement models, both in terms of speech quality metrics and content preservation, hence, we selected the worst 40 examples w.r.t. to UTMOS, SCOREQ and WIL per each language, age band and gender. Finally, the duplicates were dropped to arrive at a final evaluation dataset of 8.24 hours. ### Source Data  Crowdsourced audios recorded by volunters for CommonVoice that were selected in the CommonPhone dataset. #### Data Collection and Processing  #### Who are the source data producers?  Common Phone is maintained and distributed by speech researchers at the Pattern Recognition Lab of Friedrich-Alexander-University Erlangen-Nuremberg (FAU) #### Personal and Sensitive Information  Like for Common Voice, you must not make any attempt to identify speakers that contributed to CommonPhone-SE. ## Bias, Risks, and Limitations  The dataset was built trying to mitigate the bias on gender and age variables, however, it can still be biased towards the degradations found in the commonvoice corpus. Althoug this dataset has a lot of diversity the style is only reading speech. ## Citation  If you use the dataset please cite **BibTeX:** ``` @inproceedings{giraldo25_interspeech, title = {{Evaluating Speech Enhancement Performance Across Demographics and Language}}, author = {{Jose Giraldo and Alex Peiró-Lilja and Carme Armentano-Oller and Rodolfo Zevallos and Cristina España-Bonet}}, year = {{2025}}, booktitle = {{Interspeech 2025}}, pages = {{1353--1357}}, doi = {{10.21437/Interspeech.2025-1760}}, issn = {{2958-1796}}, } ``` ## Dataset Card Authors ## Funding This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/) and also by the Ministerio para la Transformación Digital y de la Función Pública and Plan de Recuperación,Transformación y Resiliencia - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337 ## Dataset Card Contact langtech@bsc.es

# CommonPhone-SE  用于语音增强（speech enhancement）基准测试的多语言、年龄与性别均衡子集。 ## 数据集详情（Dataset Details） ### 数据集描述（Dataset Description）  CommonPhone-SE是源自CommonPhone的语音增强基准数据集。该数据集包含7种语言的音频样本，参与者年龄覆盖18至80岁区间，旨在构建具备说话人多样性的数据集，用于真实场景下的语音增强算法基准测试。 - **整理方：** 语音团队所属LangTech Lab成员。 - **涉及语言（自然语言处理）：** 加泰罗尼亚语（Catalan, CA）、德语（Deutsch, DE）、英语（English, EN）、意大利语（Italian, IT）、法语（French, FR）、俄语（Russian, RU）、西班牙语（Spanish, ES） - **许可协议：** CC0 1.0 ### 数据集来源（Dataset Sources）  CommonPhone-SE是CommonPhone的子集。 - **代码仓库：** https://zenodo.org/records/5846137 - **相关论文：** https://arxiv.org/abs/2201.05912 ### 语言覆盖（Languages）加泰罗尼亚语（Catalan, CA）、德语（Deutsch, DE）、英语（English, EN）、意大利语（Italian, IT）、法语（French, FR）、俄语（Russian, RU）、西班牙语（Spanish, ES) ## 数据集用途（Uses）  本数据集旨在于真实场景下的多语言多样化数据集中，评估语音增强模型的泛化能力。 ## 数据集结构（Dataset Structure）  本数据集仅包含单一划分，提供音频、转录文本与人口统计信息。 Dataset({ features: ['filename', 'gender', 'age', 'language', 'text', 'audio'], num_rows: 5242 }) 每条数据的结构如下： {'filename': 'common_voice_ca_31498257', 'gender': 'female', 'age': 'fifties', 'language': 'ca', 'text': 'lieutenant monroe va resultar ferit durant la batalla i va servir posteriorment al congrés', 'audio': {'path': 'Commonphone-SE/common_voice_ca_31498257.wav', 'array': array([0., 0., 0., ..., 0., 0., 0.]), 'sampling_rate': 16000}} ## 数据集创建（Dataset Creation） ### 整理逻辑（Curation Rationale）  采样逻辑为挑选出对于当前最优（state-of-the-art）增强模型仍具有挑战性的音频样本，兼顾语音质量指标与内容保留度；具体而言，我们针对每种语言、年龄组与性别，筛选出UTMOS、SCOREQ和WIL指标表现最差的40条样本。最终剔除重复样本，得到总时长8.24小时的最终评估数据集。 ### 源数据（Source Data）  由志愿者为CommonVoice录制的众包音频，且已被纳入CommonPhone数据集。 #### 数据收集与处理（Data Collection and Processing）  #### 源数据生产者（Who are the source data producers?）  CommonPhone由埃尔朗根-纽伦堡弗里德里希-亚历山大大学（Friedrich-Alexander-University Erlangen-Nuremberg, FAU）模式识别实验室的语音研究人员维护与分发。 #### 个人与敏感信息（Personal and Sensitive Information）  与CommonVoice一致，请勿尝试识别向CommonPhone-SE贡献音频的说话人身份。 ## 偏差、风险与局限性（Bias, Risks, and Limitations）  本数据集在构建时试图缓解性别与年龄维度的偏差，但仍可能受CommonVoice语料库中存在的音频劣化影响而带有偏差。尽管本数据集具备较高多样性，但仅涵盖朗读式语音。 ## 引用（Citation）  若您使用本数据集，请引用以下文献： **BibTeX:** @inproceedings{giraldo25_interspeech, title = {{Evaluating Speech Enhancement Performance Across Demographics and Language}}, author = {{Jose Giraldo and Alex Peiró-Lilja and Carme Armentano-Oller and Rodolfo Zevallos and Cristina España-Bonet}}, year = {{2025}}, booktitle = {{Interspeech 2025}}, pages = {{1353--1357}}, doi = {{10.21437/Interspeech.2025-1760}}, issn = {{2958-1796}}, } ## 数据集卡片作者（Dataset Card Authors） ## 资助信息（Funding）本项目获加泰罗尼亚政府通过[Aina项目](https://projecteaina.cat/)资助，同时获西班牙数字化与公共职能部及复苏、转型与韧性计划（由欧盟下一代EU基金支持）旗下ILENIA项目资助，项目编号2022/TL22/00215337。 ## 数据集卡片联系人（Dataset Card Contact） langtech@bsc.es

提供机构：

maas

创建时间：

2025-10-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集